[ https://issues.apache.org/jira/browse/HBASE-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234840#comment-14234840 ]
Tobi Vollebregt commented on HBASE-12632: ----------------------------------------- I actually managed to do a test of the throttling with a while loop today: It looks a lot better than without Before: It continuously saturated the host NIC by doing about 120 MB/s (1 GBit NIC), when limited to 64 MB/s (16 mappers times 4 MB/s per mapper). After: It still peaks to really high values but the 5 min moving average of out octets on my test host settled down to approx 75 MB/s, when I limited it to 64 MB/s (16 mappers times 4 MB/s per mapper). I'm assuming the discrepancy is just unthrottled activity induced by my test. So, +1 to the patch. > ThrottledInputStream/ExportSnapshot does not throttle > ----------------------------------------------------- > > Key: HBASE-12632 > URL: https://issues.apache.org/jira/browse/HBASE-12632 > Project: HBase > Issue Type: Bug > Components: mapreduce > Affects Versions: 0.99.2 > Reporter: Tobi Vollebregt > Attachments: 12632-v1.txt > > > I just transferred a ton of data using ExportSnapshot with bandwidth > throttling from one Hadoop cluster to another Hadoop cluster, and discovered > that ThrottledInputStream does not limit bandwidth. > The problem is that ThrottledInputStream sleeps once, for a fixed time (50 > ms), at the start of each read call, disregarding the actual amount of data > read. > ExportSnapshot defaults to a buffer size as big as the block size of the > outputFs: > {code:java} > // Use the default block size of the outputFs if bigger > int defaultBlockSize = Math.max((int) outputFs.getDefaultBlockSize(), > BUFFER_SIZE); > bufferSize = conf.getInt(CONF_BUFFER_SIZE, defaultBlockSize); > LOG.info("Using bufferSize=" + > StringUtils.humanReadableInt(bufferSize)); > {code} > In my case, this was 256MB. > Hence, the ExportSnapshot mapper will attempt to read up to 256 MB at a time, > each time sleeping only 50ms. Thus, in the worst case where each call to read > fills the 256 MB buffer in negligible time, the ThrottledInputStream cannot > reduce the bandwidth to under (256 MB) / (5 ms) = 5 GB/s. > Even in a more realistic case where read returns about 1 MB per call, it > still cannot throttle the bandwidth to under 20 MB/s. > The issue is exacerbated by the fact that you need to set a low limit because > the total bandwidth per host depends on the number of mapper slots as well. > A simple solution would change the if in throttle to a while, so that it > keeps sleeping for 50 ms until the rate is finally low enough: > {code:java} > private void throttle() throws IOException { > while (getBytesPerSec() > maxBytesPerSec) { > try { > Thread.sleep(SLEEP_DURATION_MS); > totalSleepTime += SLEEP_DURATION_MS; > } catch (InterruptedException e) { > throw new IOException("Thread aborted", e); > } > } > } > {code} > This issue affects the ThrottledInputStream in hadoop as well. > Another way to see this is that for big enough buffer sizes, > ThrottledInputStream will be throttling only the number of read calls to 20 > per second, disregarding the number of bytes read. -- This message was sent by Atlassian JIRA (v6.3.4#6332)