[ https://issues.apache.org/jira/browse/LUCENE-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500546#comment-17500546 ]
Vigya Sharma commented on LUCENE-10448: --------------------------------------- >From what I understand, the only case when MergeRateLimiter would not pause, >is if there has already been enough time since the last pause, to allow the >requested set of bytes to go through. In [~jpountz]'s example, for a rate of 10MB/s and an invocation after 50s, there would be no pause for anything <= 500MB. But once we skip a pause, we set {{lastNS = curNS}}. This resets {{lastNS}} to the most recent invocation, avoiding the case of skipping multiple consecutive pauses.. So If the call after 50s wanted to write, say 500MB, it would go through (and rightly so, because it is happening after sufficient delay), but an immediate subsequent call to write another 40MB would pause for 4s. This seems to be the working as intended by the design and [this code comment|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/MergeRateLimiter.java#L133-L134]. Perhaps, one gap in this design, is that there is no control on write bursts - if you have waited long enough, you can suddenly write a big chunk. This seems to get somewhat controlled by the {{RateLimitedIndexOutput}}'s {{writeByte|Int|Short|Long()}} APIs, that check for {{bytesSinceLastPause > currentMinPauseCheckBytes}} in each write. Typical rate limiting algos like [token bucket|https://en.wikipedia.org/wiki/Token_bucket], have an upper cap on bursts (which is the capacity of the bucket). They tend to deny or enqueue requests that exceed this burst. But I don't think we can do that to writes here. At best, we can pause before writing, which this code seems to do already. I'm not really sure I understand the gap here. What am I missing? --- PS: From the logs pasted in this JIRA, the burst writes seen are of expected size: with {{rate = 11.2 MB/s}}, and {{MIN_PAUSE_CHECK_MSEC = 25ms}}, these APIs would {{checkRate()}} every {{.025*11.2 = 0.28MB}}. > MergeRateLimiter doesn't always limit instant rate. > --------------------------------------------------- > > Key: LUCENE-10448 > URL: https://issues.apache.org/jira/browse/LUCENE-10448 > Project: Lucene - Core > Issue Type: Bug > Components: core/other > Affects Versions: 8.11.1 > Reporter: kkewwei > Priority: Major > > We can see the code in *MergeRateLimiter*: > {code:java} > private long maybePause(long bytes, long curNS) throws > MergePolicy.MergeAbortedException { > > double rate = mbPerSec; > double secondsToPause = (bytes / 1024. / 1024.) / rate; > long targetNS = lastNS + (long) (1000000000 * secondsToPause); > long curPauseNS = targetNS - curNS; > // We don't bother with thread pausing if the pause is smaller than 2 > msec. > if (curPauseNS <= MIN_PAUSE_NS) { > // Set to curNS, not targetNS, to enforce the instant rate, not > // the "averaged over all history" rate: > lastNS = curNS; > return -1; > } > ...... > } > {code} > If a Segment is been merged, *maybePause* is called in 7:00, lastNS=7:00, > then the *maybePause* is called in 7:05 again, so the value of > *targetNS=lastNS + (long) (1000000000 * secondsToPause)* must be smaller than > *curNS*, no matter how big the bytes is, we will return -1 and ignore to > pause. > I count the total times(callTimes) calling *maybePause* and ignored pause > times(ignorePauseTimes) and detail ignored bytes(detailBytes): > {code:java} > [2022-03-02T15:16:51,972][DEBUG][o.e.i.e.I.EngineMergeScheduler] [node1] > [index1][21] merge segment [_4h] done: took [26.8s], [123.6 MB], [61,219 > docs], [0s stopped], [24.4s throttled], [242.5 MB written], [11.2 MB/sec > throttle], [callTimes=857], [ignorePauseTimes=25], [detailBytes(mb) = > [0.28899956, 0.28140354, 0.28015518, 0.27990818, 0.2801447, 0.27991104, > 0.27990723, 0.27990913, 0.2799101, 0.28010082, 0.2799921, 0.2799673, > 0.28144264, 0.27991295, 0.27990818, 0.27993107, 0.2799387, 0.27998447, > 0.28002167, 0.27992058, 0.27998066, 0.28098202, 0.28125, 0.28125, 0.28125]] > {code} > There are 857 times calling *maybePause*, including 25 times which is ignored > to pause, we can see that the ignored detail bytes (such as 0.28125mb) are > not small. > As long as the interval between two *maybePause* calls is relatively long, > the pause action that should be executed will not be executed. > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org