[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent
[ https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17485556#comment-17485556 ] Zach Chen commented on LUCENE-9662: --- I've approved the null check PR. Thanks [~mdrob] ! For resolving this issue, I think so? So far the implementation has parallelized checking across segments, but within each segment it's still sequential. We initially started from parallelizing within each segment, but had found the speed-up to be limited as its dominated by checking the biggest parts within segment (typically the posting file checked by `testPostings`). We could potentially look into breaking that up to smaller pieces to increase parallelization, but not sure if it's worth the effort / complexity in code. What do you think [~mikemccand] ? > CheckIndex should be concurrent > --- > > Key: LUCENE-9662 > URL: https://issues.apache.org/jira/browse/LUCENE-9662 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > Time Spent: 20h 50m > Remaining Estimate: 0h > > I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, > using a single core out of the 128 cores the box has. > It seems like this is an embarrassingly parallel problem, if the index has > multiple segments, and would finish much more quickly on concurrent hardware > if we did "thread per segment". > If wanted to get even further concurrency, each part of the Lucene index that > is checked is also independent, so it could be "thread per segment per part". -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent
[ https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17485499#comment-17485499 ] Mike Drob commented on LUCENE-9662: --- 1) Changes here have been committed, should this be marked closed? 2) There are several usages of infoStream in CheckIndex that are not guarding against null, should we fix those in this issue or a new one? > CheckIndex should be concurrent > --- > > Key: LUCENE-9662 > URL: https://issues.apache.org/jira/browse/LUCENE-9662 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > Time Spent: 20h 50m > Remaining Estimate: 0h > > I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, > using a single core out of the 128 cores the box has. > It seems like this is an embarrassingly parallel problem, if the index has > multiple segments, and would finish much more quickly on concurrent hardware > if we did "thread per segment". > If wanted to get even further concurrency, each part of the Lucene index that > is checked is also independent, so it could be "thread per segment per part". -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent
[ https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412997#comment-17412997 ] ASF subversion and git services commented on LUCENE-9662: - Commit 965b54ce3fd450dd42758244870b6228e1b5e5d4 in lucene-solr's branch refs/heads/branch_8x from zacharymorn [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=965b54c ] LUCENE-9662: CheckIndex should be concurrent - parallelizing index check across segments (#2567) > CheckIndex should be concurrent > --- > > Key: LUCENE-9662 > URL: https://issues.apache.org/jira/browse/LUCENE-9662 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > Time Spent: 20.5h > Remaining Estimate: 0h > > I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, > using a single core out of the 128 cores the box has. > It seems like this is an embarrassingly parallel problem, if the index has > multiple segments, and would finish much more quickly on concurrent hardware > if we did "thread per segment". > If wanted to get even further concurrency, each part of the Lucene index that > is checked is also independent, so it could be "thread per segment per part". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent
[ https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412933#comment-17412933 ] ASF subversion and git services commented on LUCENE-9662: - Commit 7f8607b59e034541d62687374b86197797a30a4f in lucene's branch refs/heads/main from zacharymorn [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=7f8607b ] LUCENE-9662: Update concurrent index checking usage instructions and default thread count to CPU cores (#281) > CheckIndex should be concurrent > --- > > Key: LUCENE-9662 > URL: https://issues.apache.org/jira/browse/LUCENE-9662 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > Time Spent: 20h 20m > Remaining Estimate: 0h > > I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, > using a single core out of the 128 cores the box has. > It seems like this is an embarrassingly parallel problem, if the index has > multiple segments, and would finish much more quickly on concurrent hardware > if we did "thread per segment". > If wanted to get even further concurrency, each part of the Lucene index that > is checked is also independent, so it could be "thread per segment per part". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent
[ https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411880#comment-17411880 ] Michael McCandless commented on LUCENE-9662: {quote}it seems like it's ok for us to just backport and release these changes via 8.11 ? {quote} Yes, +1 – let's not try to rush this into 8.10. I think RC branch is now cut, so {{branch_8x}} means 8.11. > CheckIndex should be concurrent > --- > > Key: LUCENE-9662 > URL: https://issues.apache.org/jira/browse/LUCENE-9662 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > Time Spent: 19h > Remaining Estimate: 0h > > I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, > using a single core out of the 128 cores the box has. > It seems like this is an embarrassingly parallel problem, if the index has > multiple segments, and would finish much more quickly on concurrent hardware > if we did "thread per segment". > If wanted to get even further concurrency, each part of the Lucene index that > is checked is also independent, so it could be "thread per segment per part". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent
[ https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411723#comment-17411723 ] Zach Chen commented on LUCENE-9662: --- {quote}I think we should backport these changes, in general. They are not breaking – the switch to {{CheckIndexException}} still subclasses {{RuntimeException}}. There will be some Lucene users who are nervous about upgrading to 9.0 too soon, but would be maybe eager to upgrade to last 8.x release (if that's 8.10 or 8.11 or beyond). I think it's bad if we slow down our rate of backporting because a major release is coming ... let's try to review your backport commit carefully to see if it looks OK? {quote} Makes sense. I think my nervousness was also partly due to this change, when backported, might be a bit too close to the 8.10 branch cut window, but it seems like it's ok for us to just backport and release these changes via 8.11 ? For now I've created a PR for backporting them against 8x here https://github.com/apache/lucene-solr/pull/2567. The merge conflict resolution turned out to be less involved than I expected, but there was a failing test and I suspected some unintended code was introduced during merge. I will dig in a bit more to confirm the cause there. > CheckIndex should be concurrent > --- > > Key: LUCENE-9662 > URL: https://issues.apache.org/jira/browse/LUCENE-9662 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > Time Spent: 19h > Remaining Estimate: 0h > > I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, > using a single core out of the 128 cores the box has. > It seems like this is an embarrassingly parallel problem, if the index has > multiple segments, and would finish much more quickly on concurrent hardware > if we did "thread per segment". > If wanted to get even further concurrency, each part of the Lucene index that > is checked is also independent, so it could be "thread per segment per part". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent
[ https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411208#comment-17411208 ] Michael McCandless commented on LUCENE-9662: {quote}What do you think? Would you recommend we still try to backport these changes to 8x? {quote} I think we should backport these changes, in general. They are not breaking – the switch to {{CheckIndexException}} still subclasses {{RuntimeException}}. There will be some Lucene users who are nervous about upgrading to 9.0 too soon, but would be maybe eager to upgrade to last 8.x release (if that's 8.10 or 8.11 or beyond). I think it's bad if we slow down our rate of backporting because a major release is coming ... let's try to review your backport commit carefully to see if it looks OK? > CheckIndex should be concurrent > --- > > Key: LUCENE-9662 > URL: https://issues.apache.org/jira/browse/LUCENE-9662 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > Time Spent: 18h 20m > Remaining Estimate: 0h > > I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, > using a single core out of the 128 cores the box has. > It seems like this is an embarrassingly parallel problem, if the index has > multiple segments, and would finish much more quickly on concurrent hardware > if we did "thread per segment". > If wanted to get even further concurrency, each part of the Lucene index that > is checked is also independent, so it could be "thread per segment per part". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent
[ https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411204#comment-17411204 ] Michael McCandless commented on LUCENE-9662: {quote}To increase its concurrency for nightly benchmark, I assume a change can be made in [luceneutil|https://github.com/mikemccand/luceneutil/blob/0084387e001b426075eb828f43ad0c4e955e9280/src/python/nightlyBench.py#L695-L704] to pass in the flag? If so, I can open a PR for it as well! {quote} Ahh no need – I already did that, and added annotation to nightly benchmarks! Switching from 4 to 16 concurrent threads for the nightly {{CheckIndex}} benchmark [further sped it up from ~112 seconds down to ~77 seconds|https://home.apache.org/~mikemccand/lucenebench/checkIndexTime.html]: woot! > CheckIndex should be concurrent > --- > > Key: LUCENE-9662 > URL: https://issues.apache.org/jira/browse/LUCENE-9662 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > Time Spent: 18h 20m > Remaining Estimate: 0h > > I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, > using a single core out of the 128 cores the box has. > It seems like this is an embarrassingly parallel problem, if the index has > multiple segments, and would finish much more quickly on concurrent hardware > if we did "thread per segment". > If wanted to get even further concurrency, each part of the Lucene index that > is checked is also independent, so it could be "thread per segment per part". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent
[ https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17410031#comment-17410031 ] Zach Chen commented on LUCENE-9662: --- Hi [~mikemccand], I've tried to backport these changes to 8x earlier, but noticed that since changes in this PR touched many places in CheckIndex (the replacement of *RuntimeException* with *CheckIndexException* in particular), and some earlier commits that also touched on CheckIndex were not backported to 8x since they were intended for 9.0 release, the backporting I was trying resulted into many merge conflicts. Although some of the conflicts were easy to resolve, I'm a bit concerned that I may introduce subtle bugs when resolving conflicts for others since I may not be familiar with those. What do you think? Would you recommend we still try to backport these changes to 8x? > CheckIndex should be concurrent > --- > > Key: LUCENE-9662 > URL: https://issues.apache.org/jira/browse/LUCENE-9662 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > Time Spent: 18h 10m > Remaining Estimate: 0h > > I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, > using a single core out of the 128 cores the box has. > It seems like this is an embarrassingly parallel problem, if the index has > multiple segments, and would finish much more quickly on concurrent hardware > if we did "thread per segment". > If wanted to get even further concurrency, each part of the Lucene index that > is checked is also independent, so it could be "thread per segment per part". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent
[ https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409182#comment-17409182 ] Zach Chen commented on LUCENE-9662: --- {quote}Of course, this is on [ridiculously concurrent (256 cores with hyperthreading) hardware|https://blog.mikemccandless.com/2021/01/apache-lucene-performance-on-128-core.html], but still it is only using the default 4 concurrent threads right? I'll add an annotation, and increase its concurrency some! {quote} Yes it's indeed capped at 4 threads by default, and the result was indeed impressive with just a few more threads! On my not-so-fast 6 cores macbook pro, I got about 73% processing time reduction when using '-threadCount 12' versus sequential. To increase its concurrency for nightly benchmark, I assume a change can be made in [luceneutil|https://github.com/mikemccand/luceneutil/blob/0084387e001b426075eb828f43ad0c4e955e9280/src/python/nightlyBench.py#L695-L704] to pass in the flag? If so, I can open a PR for it as well! {quote}Hmm, it looks like we didn't fix the {{Usage: ...}} output to advertise the new {{-threadCount}} option. [~zacharymorn] could you open a quick followup PR? Thanks! {quote} Ah yes sorry for missing that. I've opened a PR for updating it https://github.com/apache/lucene/pull/281 > CheckIndex should be concurrent > --- > > Key: LUCENE-9662 > URL: https://issues.apache.org/jira/browse/LUCENE-9662 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > Time Spent: 18h 10m > Remaining Estimate: 0h > > I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, > using a single core out of the 128 cores the box has. > It seems like this is an embarrassingly parallel problem, if the index has > multiple segments, and would finish much more quickly on concurrent hardware > if we did "thread per segment". > If wanted to get even further concurrency, each part of the Lucene index that > is checked is also independent, so it could be "thread per segment per part". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent
[ https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408766#comment-17408766 ] Michael McCandless commented on LUCENE-9662: Hmm, it looks like we didn't fix the {{Usage: ...}} output to advertise the new {{-threadCount}} option. [~zacharymorn] could you open a quick followup PR? Thanks! > CheckIndex should be concurrent > --- > > Key: LUCENE-9662 > URL: https://issues.apache.org/jira/browse/LUCENE-9662 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > Time Spent: 18h > Remaining Estimate: 0h > > I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, > using a single core out of the 128 cores the box has. > It seems like this is an embarrassingly parallel problem, if the index has > multiple segments, and would finish much more quickly on concurrent hardware > if we did "thread per segment". > If wanted to get even further concurrency, each part of the Lucene index that > is checked is also independent, so it could be "thread per segment per part". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent
[ https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408746#comment-17408746 ] Michael McCandless commented on LUCENE-9662: Whoa, look [how much faster {{CheckIndex}} got|https://home.apache.org/~mikemccand/lucenebench/checkIndexTime.html] in the nightly benchmarks! From ~235 seconds to ~110. Of course, this is on [ridiculously concurrent (256 cores with hyperthreading) hardware|https://blog.mikemccandless.com/2021/01/apache-lucene-performance-on-128-core.html], but still it is only using the default 4 concurrent threads right? I'll add an annotation, and increase its concurrency some! > CheckIndex should be concurrent > --- > > Key: LUCENE-9662 > URL: https://issues.apache.org/jira/browse/LUCENE-9662 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > Time Spent: 18h > Remaining Estimate: 0h > > I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, > using a single core out of the 128 cores the box has. > It seems like this is an embarrassingly parallel problem, if the index has > multiple segments, and would finish much more quickly on concurrent hardware > if we did "thread per segment". > If wanted to get even further concurrency, each part of the Lucene index that > is checked is also independent, so it could be "thread per segment per part". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent
[ https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408546#comment-17408546 ] ASF subversion and git services commented on LUCENE-9662: - Commit 34232430f200a0941de683d9035e08a4cbec9df4 in lucene's branch refs/heads/main from zacharymorn [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=3423243 ] LUCENE-9662: fix test failure from merging away soft-deletes (#276) > CheckIndex should be concurrent > --- > > Key: LUCENE-9662 > URL: https://issues.apache.org/jira/browse/LUCENE-9662 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > Time Spent: 17h 40m > Remaining Estimate: 0h > > I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, > using a single core out of the 128 cores the box has. > It seems like this is an embarrassingly parallel problem, if the index has > multiple segments, and would finish much more quickly on concurrent hardware > if we did "thread per segment". > If wanted to get even further concurrency, each part of the Lucene index that > is checked is also independent, so it could be "thread per segment per part". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent
[ https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407763#comment-17407763 ] ASF subversion and git services commented on LUCENE-9662: - Commit 424192e1704664dc0ebc55109feaad5990b945cb in lucene's branch refs/heads/main from zacharymorn [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=424192e ] LUCENE-9662: CheckIndex should be concurrent - parallelizing index check across segments (#128) > CheckIndex should be concurrent > --- > > Key: LUCENE-9662 > URL: https://issues.apache.org/jira/browse/LUCENE-9662 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > Time Spent: 16h 40m > Remaining Estimate: 0h > > I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, > using a single core out of the 128 cores the box has. > It seems like this is an embarrassingly parallel problem, if the index has > multiple segments, and would finish much more quickly on concurrent hardware > if we did "thread per segment". > If wanted to get even further concurrency, each part of the Lucene index that > is checked is also independent, so it could be "thread per segment per part". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent
[ https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340017#comment-17340017 ] Zach Chen commented on LUCENE-9662: --- Hi [~mikemccand], I've taken a stab at this and created a WIP PR [https://github.com/apache/lucene/pull/128] with some nocommit comments. Could you please take a look and let me know your thoughts? > CheckIndex should be concurrent > --- > > Key: LUCENE-9662 > URL: https://issues.apache.org/jira/browse/LUCENE-9662 > Project: Lucene - Core > Issue Type: Bug >Reporter: Michael McCandless >Priority: Major > > I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, > using a single core out of the 128 cores the box has. > It seems like this is an embarrassingly parallel problem, if the index has > multiple segments, and would finish much more quickly on concurrent hardware > if we did "thread per segment". > If wanted to get even further concurrency, each part of the Lucene index that > is checked is also independent, so it could be "thread per segment per part". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org