[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844595#comment-16844595
 ] 

Adrien Grand commented on LUCENE-8757:
--------------------------------------

[~atris] I think it is still not correct since the values of the docBase/maxDoc 
can only be seen by the current leaf collector while we need this check across 
all leaf collectors that are created from the same collector.

Looking at the AssertingCollector again, it has a check that doc IDs are 
collected in doc ID order, so I wonder why this assertion didn't trip with the 
earlier version of your patch that sorted leaves by decreasing maxDoc. Maybe we 
just got lucky? Nevertheless I think it's worth adding another assertion that 
leaves are collected in the right order and that their doc ID space doesn't 
intersect as described above, eg. we could record a {{previousLeafMaxDoc}} at 
the same level as {{maxDoc}} in AssertinCollector, and then in 
{{getLeafCollector}} do something like

{code}
assert context.docBase >= previousLeafMaxDoc; // generally equal, but might be 
greater if some leaves are skipped
previousLeafMaxDoc = context.docBase + context.reader().maxDoc();
{code}

> Better Segment To Thread Mapping Algorithm
> ------------------------------------------
>
>                 Key: LUCENE-8757
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8757
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Atri Sharma
>            Assignee: Simon Willnauer
>            Priority: Major
>         Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to