[ 
https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17485556#comment-17485556
 ] 

Zach Chen commented on LUCENE-9662:
-----------------------------------

I've approved the null check PR. Thanks [~mdrob] !

For resolving this issue, I think so? So far the implementation has 
parallelized checking across segments, but within each segment it's still 
sequential. We initially started from parallelizing within each segment, but 
had found the speed-up to be limited as its dominated by checking the biggest 
parts within segment (typically the posting file checked by `testPostings`). We 
could potentially look into breaking that up to smaller pieces to increase 
parallelization, but not sure if it's worth the effort / complexity in code. 
What do you think [~mikemccand] ? 

> CheckIndex should be concurrent
> -------------------------------
>
>                 Key: LUCENE-9662
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9662
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Michael McCandless
>            Priority: Major
>          Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, 
> using a single core out of the 128 cores the box has.
> It seems like this is an embarrassingly parallel problem, if the index has 
> multiple segments, and would finish much more quickly on concurrent hardware 
> if we did "thread per segment".
> If wanted to get even further concurrency, each part of the Lucene index that 
> is checked is also independent, so it could be "thread per segment per part".



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to