[ https://issues.apache.org/jira/browse/HBASE-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233844#comment-14233844 ]
Ted Yu commented on HBASE-12632: -------------------------------- Thanks for reporting this issue. Do you want to attach a patch ? > ThrottledInputStream/ExportSnapshot does not throttle > ----------------------------------------------------- > > Key: HBASE-12632 > URL: https://issues.apache.org/jira/browse/HBASE-12632 > Project: HBase > Issue Type: Bug > Components: mapreduce > Affects Versions: 0.99.2 > Reporter: Tobi Vollebregt > > I just transferred a ton of data using ExportSnapshot with bandwidth > throttling from one Hadoop cluster to another Hadoop cluster, and discovered > that ThrottledInputStream does not limit bandwidth. > The problem is that ThrottledInputStream sleeps once, for a fixed time (50 > ms), at the start of each read call, disregarding the actual amount of data > read. > ExportSnapshot defaults to a buffer size as big as the block size of the > outputFs: > {code:java} > // Use the default block size of the outputFs if bigger > int defaultBlockSize = Math.max((int) outputFs.getDefaultBlockSize(), > BUFFER_SIZE); > bufferSize = conf.getInt(CONF_BUFFER_SIZE, defaultBlockSize); > LOG.info("Using bufferSize=" + > StringUtils.humanReadableInt(bufferSize)); > {code} > In my case, this was 256MB. > Hence, the ExportSnapshot mapper will attempt to read up to 256 MB at a time, > each time sleeping only 50ms. Thus, in the worst case where each call to read > fills the 256 MB buffer in negligible time, the ThrottledInputStream cannot > reduce the bandwidth to under (256 MB) / (5 ms) = 5 GB/s. > Even in a more realistic case where read returns about 1 MB per call, it > still cannot throttle the bandwidth to under 20 MB/s. > The issue is exacerbated by the fact that you need to set a low limit because > the total bandwidth per host depends on the number of mapper slots as well. > A simple solution would change the if in throttle to a while, so that it > keeps sleeping for 50 ms until the rate is finally low enough: > {code:java} > private void throttle() throws IOException { > while (getBytesPerSec() > maxBytesPerSec) { > try { > Thread.sleep(SLEEP_DURATION_MS); > totalSleepTime += SLEEP_DURATION_MS; > } catch (InterruptedException e) { > throw new IOException("Thread aborted", e); > } > } > } > {code} > This issue affects the ThrottledInputStream in hadoop as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)