[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844595#comment-16844595 ]
Adrien Grand commented on LUCENE-8757: -------------------------------------- [~atris] I think it is still not correct since the values of the docBase/maxDoc can only be seen by the current leaf collector while we need this check across all leaf collectors that are created from the same collector. Looking at the AssertingCollector again, it has a check that doc IDs are collected in doc ID order, so I wonder why this assertion didn't trip with the earlier version of your patch that sorted leaves by decreasing maxDoc. Maybe we just got lucky? Nevertheless I think it's worth adding another assertion that leaves are collected in the right order and that their doc ID space doesn't intersect as described above, eg. we could record a {{previousLeafMaxDoc}} at the same level as {{maxDoc}} in AssertinCollector, and then in {{getLeafCollector}} do something like {code} assert context.docBase >= previousLeafMaxDoc; // generally equal, but might be greater if some leaves are skipped previousLeafMaxDoc = context.docBase + context.reader().maxDoc(); {code} > Better Segment To Thread Mapping Algorithm > ------------------------------------------ > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Atri Sharma > Assignee: Simon Willnauer > Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org