[ 
https://issues.apache.org/jira/browse/HBASE-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234843#comment-14234843
 ] 

Ted Yu commented on HBASE-12632:
--------------------------------

Ping [~enis], [~apurtell] for inclusion in 1.0 and 0.98, respectively.

> ThrottledInputStream/ExportSnapshot does not throttle
> -----------------------------------------------------
>
>                 Key: HBASE-12632
>                 URL: https://issues.apache.org/jira/browse/HBASE-12632
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.99.2
>            Reporter: Tobi Vollebregt
>         Attachments: 12632-v1.txt
>
>
> I just transferred a ton of data using ExportSnapshot with bandwidth 
> throttling from one Hadoop cluster to another Hadoop cluster, and discovered 
> that ThrottledInputStream does not limit bandwidth.
> The problem is that ThrottledInputStream sleeps once, for a fixed time (50 
> ms), at the start of each read call, disregarding the actual amount of data 
> read.
> ExportSnapshot defaults to a buffer size as big as the block size of the 
> outputFs:
> {code:java}
>       // Use the default block size of the outputFs if bigger
>       int defaultBlockSize = Math.max((int) outputFs.getDefaultBlockSize(), 
> BUFFER_SIZE);
>       bufferSize = conf.getInt(CONF_BUFFER_SIZE, defaultBlockSize);
>       LOG.info("Using bufferSize=" + 
> StringUtils.humanReadableInt(bufferSize));
> {code}
> In my case, this was 256MB.
> Hence, the ExportSnapshot mapper will attempt to read up to 256 MB at a time, 
> each time sleeping only 50ms. Thus, in the worst case where each call to read 
> fills the 256 MB buffer in negligible time, the ThrottledInputStream cannot 
> reduce the bandwidth to under (256 MB) / (5 ms) = 5 GB/s.
> Even in a more realistic case where read returns about 1 MB per call, it 
> still cannot throttle the bandwidth to under 20 MB/s.
> The issue is exacerbated by the fact that you need to set a low limit because 
> the total bandwidth per host depends on the number of mapper slots as well.
> A simple solution would change the if in throttle to a while, so that it 
> keeps sleeping for 50 ms until the rate is finally low enough:
> {code:java}
>   private void throttle() throws IOException {
>     while (getBytesPerSec() > maxBytesPerSec) {
>       try {
>         Thread.sleep(SLEEP_DURATION_MS);
>         totalSleepTime += SLEEP_DURATION_MS;
>       } catch (InterruptedException e) {
>         throw new IOException("Thread aborted", e);
>       }
>     }
>   }
> {code}
> This issue affects the ThrottledInputStream in hadoop as well.
> Another way to see this is that for big enough buffer sizes, 
> ThrottledInputStream will be throttling only the number of read calls to 20 
> per second, disregarding the number of bytes read. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to