[ https://issues.apache.org/jira/browse/LUCENE-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500982#comment-17500982 ]
Vigya Sharma edited comment on LUCENE-10448 at 3/4/22, 2:06 AM: ---------------------------------------------------------------- The only API which can lead to unexpected big write bursts seems to be the {{writeBytes(byte[] b, int offset, int length)}} API in RateLimitedIndexOutput. We could potentially add an upper bound on the bytes that writeBytes attempts to write in one shot, in RateLimitedIndexOutput - break the byte array in chunks and check for rate limiting between each chunk. Would that be desirable in the wider Lucene context? All other APIs check for rate before every write, so the instant burst rate is really determined by the configured {{mbPerSec}} and {{MIN_PAUSE_CHECK_MSEC}} values. I think this is what makes all the burst writes in this JIRA log ~0.28 MB. {quote}According to my statistics, the frequency of no-pause bytes is [2%-20%], {quote} What is the high instant burst rate you see during these no-pause writes? From the logs above, it should still be less than 11.2 MB/s. Maybe we should look at the burst write rate (in addition to/ instead of) the no-pause-write frequency? was (Author: vigyas): The only API which can lead to unexpected big write bursts seems to be the {{writeBytes(byte[] b, int offset, int length)}} API in RateLimitedIndexOutput. We could potentially add an upper bound on the bytes that writeBytes attempts to write in one shot, in RateLimitedIndexOutput - break the byte array in chunks and check for rate limiting between each chunk. Would that be desirable in the wider Lucene context? All other APIs check for rate before every write, so the instant burst rate is really determined by the configured {{mbPerSec}} and {{MIN_PAUSE_CHECK_MSEC}} values. I think this is what makes all the burst writes in this JIRA log ~0.28 MB. > According to my statistics, the frequency of no-pause bytes is [2%-20%], What is the high instant burst rate you see during these no-pause writes? From the logs above, it should still be less than 11.2 MB/s. Maybe we should look at the burst write rate (in addition to/ instead of) the no-pause-write frequency? > MergeRateLimiter doesn't always limit instant rate. > --------------------------------------------------- > > Key: LUCENE-10448 > URL: https://issues.apache.org/jira/browse/LUCENE-10448 > Project: Lucene - Core > Issue Type: Bug > Components: core/other > Affects Versions: 8.11.1 > Reporter: kkewwei > Priority: Major > > We can see the code in *MergeRateLimiter*: > {code:java} > private long maybePause(long bytes, long curNS) throws > MergePolicy.MergeAbortedException { > > double rate = mbPerSec; > double secondsToPause = (bytes / 1024. / 1024.) / rate; > long targetNS = lastNS + (long) (1000000000 * secondsToPause); > long curPauseNS = targetNS - curNS; > // We don't bother with thread pausing if the pause is smaller than 2 > msec. > if (curPauseNS <= MIN_PAUSE_NS) { > // Set to curNS, not targetNS, to enforce the instant rate, not > // the "averaged over all history" rate: > lastNS = curNS; > return -1; > } > ...... > } > {code} > If a Segment is been merged, *maybePause* is called in 7:00, lastNS=7:00, > then the *maybePause* is called in 7:05 again, so the value of > *targetNS=lastNS + (long) (1000000000 * secondsToPause)* must be smaller than > *curNS*, no matter how big the bytes is, we will return -1 and ignore to > pause. > I count the total times(callTimes) calling *maybePause* and ignored pause > times(ignorePauseTimes) and detail ignored bytes(detailBytes): > {code:java} > [2022-03-02T15:16:51,972][DEBUG][o.e.i.e.I.EngineMergeScheduler] [node1] > [index1][21] merge segment [_4h] done: took [26.8s], [123.6 MB], [61,219 > docs], [0s stopped], [24.4s throttled], [242.5 MB written], [11.2 MB/sec > throttle], [callTimes=857], [ignorePauseTimes=25], [detailBytes(mb) = > [0.28899956, 0.28140354, 0.28015518, 0.27990818, 0.2801447, 0.27991104, > 0.27990723, 0.27990913, 0.2799101, 0.28010082, 0.2799921, 0.2799673, > 0.28144264, 0.27991295, 0.27990818, 0.27993107, 0.2799387, 0.27998447, > 0.28002167, 0.27992058, 0.27998066, 0.28098202, 0.28125, 0.28125, 0.28125]] > {code} > There are 857 times calling *maybePause*, including 25 times which is ignored > to pause, we can see that the ignored detail bytes (such as 0.28125mb) are > not small. > As long as the interval between two *maybePause* calls is relatively long, > the pause action that should be executed will not be executed. > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org