[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent

2022-02-01 Thread Zach Chen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17485556#comment-17485556
 ] 

Zach Chen commented on LUCENE-9662:
---

I've approved the null check PR. Thanks [~mdrob] !

For resolving this issue, I think so? So far the implementation has 
parallelized checking across segments, but within each segment it's still 
sequential. We initially started from parallelizing within each segment, but 
had found the speed-up to be limited as its dominated by checking the biggest 
parts within segment (typically the posting file checked by `testPostings`). We 
could potentially look into breaking that up to smaller pieces to increase 
parallelization, but not sure if it's worth the effort / complexity in code. 
What do you think [~mikemccand] ? 

> CheckIndex should be concurrent
> ---
>
> Key: LUCENE-9662
> URL: https://issues.apache.org/jira/browse/LUCENE-9662
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, 
> using a single core out of the 128 cores the box has.
> It seems like this is an embarrassingly parallel problem, if the index has 
> multiple segments, and would finish much more quickly on concurrent hardware 
> if we did "thread per segment".
> If wanted to get even further concurrency, each part of the Lucene index that 
> is checked is also independent, so it could be "thread per segment per part".



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent

2022-02-01 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17485499#comment-17485499
 ] 

Mike Drob commented on LUCENE-9662:
---

1) Changes here have been committed, should this be marked closed?
2) There are several usages of infoStream in CheckIndex that are not guarding 
against null, should we fix those in this issue or a new one?

> CheckIndex should be concurrent
> ---
>
> Key: LUCENE-9662
> URL: https://issues.apache.org/jira/browse/LUCENE-9662
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, 
> using a single core out of the 128 cores the box has.
> It seems like this is an embarrassingly parallel problem, if the index has 
> multiple segments, and would finish much more quickly on concurrent hardware 
> if we did "thread per segment".
> If wanted to get even further concurrency, each part of the Lucene index that 
> is checked is also independent, so it could be "thread per segment per part".



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent

2021-09-10 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412997#comment-17412997
 ] 

ASF subversion and git services commented on LUCENE-9662:
-

Commit 965b54ce3fd450dd42758244870b6228e1b5e5d4 in lucene-solr's branch 
refs/heads/branch_8x from zacharymorn
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=965b54c ]

LUCENE-9662: CheckIndex should be concurrent  - parallelizing index check 
across segments  (#2567)



> CheckIndex should be concurrent
> ---
>
> Key: LUCENE-9662
> URL: https://issues.apache.org/jira/browse/LUCENE-9662
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 20.5h
>  Remaining Estimate: 0h
>
> I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, 
> using a single core out of the 128 cores the box has.
> It seems like this is an embarrassingly parallel problem, if the index has 
> multiple segments, and would finish much more quickly on concurrent hardware 
> if we did "thread per segment".
> If wanted to get even further concurrency, each part of the Lucene index that 
> is checked is also independent, so it could be "thread per segment per part".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent

2021-09-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412933#comment-17412933
 ] 

ASF subversion and git services commented on LUCENE-9662:
-

Commit 7f8607b59e034541d62687374b86197797a30a4f in lucene's branch 
refs/heads/main from zacharymorn
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=7f8607b ]

LUCENE-9662: Update concurrent index checking usage instructions and default 
thread count to CPU cores (#281)



> CheckIndex should be concurrent
> ---
>
> Key: LUCENE-9662
> URL: https://issues.apache.org/jira/browse/LUCENE-9662
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 20h 20m
>  Remaining Estimate: 0h
>
> I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, 
> using a single core out of the 128 cores the box has.
> It seems like this is an embarrassingly parallel problem, if the index has 
> multiple segments, and would finish much more quickly on concurrent hardware 
> if we did "thread per segment".
> If wanted to get even further concurrency, each part of the Lucene index that 
> is checked is also independent, so it could be "thread per segment per part".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent

2021-09-08 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411880#comment-17411880
 ] 

Michael McCandless commented on LUCENE-9662:


{quote}it seems like it's ok for us to just backport and release these changes 
via 8.11 ?
{quote}
Yes, +1 – let's not try to rush this into 8.10.  I think RC branch is now cut, 
so {{branch_8x}} means 8.11.

> CheckIndex should be concurrent
> ---
>
> Key: LUCENE-9662
> URL: https://issues.apache.org/jira/browse/LUCENE-9662
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 19h
>  Remaining Estimate: 0h
>
> I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, 
> using a single core out of the 128 cores the box has.
> It seems like this is an embarrassingly parallel problem, if the index has 
> multiple segments, and would finish much more quickly on concurrent hardware 
> if we did "thread per segment".
> If wanted to get even further concurrency, each part of the Lucene index that 
> is checked is also independent, so it could be "thread per segment per part".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent

2021-09-08 Thread Zach Chen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411723#comment-17411723
 ] 

Zach Chen commented on LUCENE-9662:
---

{quote}I think we should backport these changes, in general.  They are not 
breaking – the switch to {{CheckIndexException}} still subclasses 
{{RuntimeException}}.  There will be some Lucene users who are nervous about 
upgrading to 9.0 too soon, but would be maybe eager to upgrade to last 8.x 
release (if that's 8.10 or 8.11 or beyond).  I think it's bad if we slow down 
our rate of backporting because a major release is coming ... let's try to 
review your backport commit carefully to see if it looks OK?
{quote}
Makes sense. I think my nervousness was also partly due to this change, when 
backported, might be a bit too close to the 8.10 branch cut window, but it 
seems like it's ok for us to just backport and release these changes via 8.11 ?

For now I've created a PR for backporting them against 8x here 
https://github.com/apache/lucene-solr/pull/2567. The merge conflict resolution 
turned out to be less involved than I expected, but there was a failing test 
and I suspected some unintended code was introduced during merge. I will dig in 
a bit more to confirm the cause there.  

> CheckIndex should be concurrent
> ---
>
> Key: LUCENE-9662
> URL: https://issues.apache.org/jira/browse/LUCENE-9662
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 19h
>  Remaining Estimate: 0h
>
> I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, 
> using a single core out of the 128 cores the box has.
> It seems like this is an embarrassingly parallel problem, if the index has 
> multiple segments, and would finish much more quickly on concurrent hardware 
> if we did "thread per segment".
> If wanted to get even further concurrency, each part of the Lucene index that 
> is checked is also independent, so it could be "thread per segment per part".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent

2021-09-07 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411208#comment-17411208
 ] 

Michael McCandless commented on LUCENE-9662:


{quote}What do you think? Would you recommend we still try to backport these 
changes to 8x?
{quote}
I think we should backport these changes, in general.  They are not breaking – 
the switch to {{CheckIndexException}} still subclasses {{RuntimeException}}.  
There will be some Lucene users who are nervous about upgrading to 9.0 too 
soon, but would be maybe eager to upgrade to last 8.x release (if that's 8.10 
or 8.11 or beyond).  I think it's bad if we slow down our rate of backporting 
because a major release is coming ... let's try to review your backport commit 
carefully to see if it looks OK?

> CheckIndex should be concurrent
> ---
>
> Key: LUCENE-9662
> URL: https://issues.apache.org/jira/browse/LUCENE-9662
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 18h 20m
>  Remaining Estimate: 0h
>
> I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, 
> using a single core out of the 128 cores the box has.
> It seems like this is an embarrassingly parallel problem, if the index has 
> multiple segments, and would finish much more quickly on concurrent hardware 
> if we did "thread per segment".
> If wanted to get even further concurrency, each part of the Lucene index that 
> is checked is also independent, so it could be "thread per segment per part".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent

2021-09-07 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411204#comment-17411204
 ] 

Michael McCandless commented on LUCENE-9662:


{quote}To increase its concurrency for nightly benchmark, I assume a change can 
be made in 
[luceneutil|https://github.com/mikemccand/luceneutil/blob/0084387e001b426075eb828f43ad0c4e955e9280/src/python/nightlyBench.py#L695-L704]
 to pass in the flag? If so, I can open a PR for it as well!
{quote}
Ahh no need – I already did that, and added annotation to nightly benchmarks! 
Switching from 4 to 16 concurrent threads for the nightly {{CheckIndex}} 
benchmark [further sped it up from ~112 seconds down to ~77 
seconds|https://home.apache.org/~mikemccand/lucenebench/checkIndexTime.html]: 
woot!

> CheckIndex should be concurrent
> ---
>
> Key: LUCENE-9662
> URL: https://issues.apache.org/jira/browse/LUCENE-9662
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 18h 20m
>  Remaining Estimate: 0h
>
> I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, 
> using a single core out of the 128 cores the box has.
> It seems like this is an embarrassingly parallel problem, if the index has 
> multiple segments, and would finish much more quickly on concurrent hardware 
> if we did "thread per segment".
> If wanted to get even further concurrency, each part of the Lucene index that 
> is checked is also independent, so it could be "thread per segment per part".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent

2021-09-04 Thread Zach Chen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17410031#comment-17410031
 ] 

Zach Chen commented on LUCENE-9662:
---

Hi [~mikemccand], I've tried to backport these changes to 8x earlier, but 
noticed that since changes in this PR touched many places in CheckIndex (the 
replacement of *RuntimeException* with *CheckIndexException* in particular), 
and some earlier commits that also touched on CheckIndex were not backported to 
8x since they were intended for 9.0 release, the backporting I was trying 
resulted into many merge conflicts. Although some of the conflicts were easy to 
resolve, I'm a bit concerned that I may introduce subtle bugs when resolving 
conflicts for others since I may not be familiar with those.

 

What do you think? Would you recommend we still try to backport these changes 
to 8x?

> CheckIndex should be concurrent
> ---
>
> Key: LUCENE-9662
> URL: https://issues.apache.org/jira/browse/LUCENE-9662
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 18h 10m
>  Remaining Estimate: 0h
>
> I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, 
> using a single core out of the 128 cores the box has.
> It seems like this is an embarrassingly parallel problem, if the index has 
> multiple segments, and would finish much more quickly on concurrent hardware 
> if we did "thread per segment".
> If wanted to get even further concurrency, each part of the Lucene index that 
> is checked is also independent, so it could be "thread per segment per part".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent

2021-09-02 Thread Zach Chen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409182#comment-17409182
 ] 

Zach Chen commented on LUCENE-9662:
---

{quote}Of course, this is on [ridiculously concurrent (256 cores with 
hyperthreading) 
hardware|https://blog.mikemccandless.com/2021/01/apache-lucene-performance-on-128-core.html],
 but still it is only using the default 4 concurrent threads right?  I'll add 
an annotation, and increase its concurrency some!
{quote}
Yes it's indeed capped at 4 threads by default, and the result was indeed 
impressive with just a few more threads! On my not-so-fast 6 cores macbook pro, 
I got about 73% processing time reduction when using '-threadCount 12' versus 
sequential. To increase its concurrency for nightly benchmark, I assume a 
change can be made in 
[luceneutil|https://github.com/mikemccand/luceneutil/blob/0084387e001b426075eb828f43ad0c4e955e9280/src/python/nightlyBench.py#L695-L704]
 to pass in the flag? If so, I can open a PR for it as well!
{quote}Hmm, it looks like we didn't fix the {{Usage: ...}} output to advertise 
the new {{-threadCount}} option.  [~zacharymorn] could you open a quick 
followup PR?  Thanks!
{quote}
Ah yes sorry for missing that. I've opened a PR for updating it 
https://github.com/apache/lucene/pull/281

> CheckIndex should be concurrent
> ---
>
> Key: LUCENE-9662
> URL: https://issues.apache.org/jira/browse/LUCENE-9662
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 18h 10m
>  Remaining Estimate: 0h
>
> I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, 
> using a single core out of the 128 cores the box has.
> It seems like this is an embarrassingly parallel problem, if the index has 
> multiple segments, and would finish much more quickly on concurrent hardware 
> if we did "thread per segment".
> If wanted to get even further concurrency, each part of the Lucene index that 
> is checked is also independent, so it could be "thread per segment per part".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent

2021-09-02 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408766#comment-17408766
 ] 

Michael McCandless commented on LUCENE-9662:


Hmm, it looks like we didn't fix the {{Usage: ...}} output to advertise the new 
{{-threadCount}} option.  [~zacharymorn] could you open a quick followup PR?  
Thanks!

> CheckIndex should be concurrent
> ---
>
> Key: LUCENE-9662
> URL: https://issues.apache.org/jira/browse/LUCENE-9662
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 18h
>  Remaining Estimate: 0h
>
> I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, 
> using a single core out of the 128 cores the box has.
> It seems like this is an embarrassingly parallel problem, if the index has 
> multiple segments, and would finish much more quickly on concurrent hardware 
> if we did "thread per segment".
> If wanted to get even further concurrency, each part of the Lucene index that 
> is checked is also independent, so it could be "thread per segment per part".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent

2021-09-02 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408746#comment-17408746
 ] 

Michael McCandless commented on LUCENE-9662:


Whoa, look [how much faster {{CheckIndex}} 
got|https://home.apache.org/~mikemccand/lucenebench/checkIndexTime.html] in the 
nightly benchmarks!  From ~235 seconds to ~110.

Of course, this is on [ridiculously concurrent (256 cores with hyperthreading) 
hardware|https://blog.mikemccandless.com/2021/01/apache-lucene-performance-on-128-core.html],
 but still it is only using the default 4 concurrent threads right?  I'll add 
an annotation, and increase its concurrency some!

> CheckIndex should be concurrent
> ---
>
> Key: LUCENE-9662
> URL: https://issues.apache.org/jira/browse/LUCENE-9662
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 18h
>  Remaining Estimate: 0h
>
> I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, 
> using a single core out of the 128 cores the box has.
> It seems like this is an embarrassingly parallel problem, if the index has 
> multiple segments, and would finish much more quickly on concurrent hardware 
> if we did "thread per segment".
> If wanted to get even further concurrency, each part of the Lucene index that 
> is checked is also independent, so it could be "thread per segment per part".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent

2021-09-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408546#comment-17408546
 ] 

ASF subversion and git services commented on LUCENE-9662:
-

Commit 34232430f200a0941de683d9035e08a4cbec9df4 in lucene's branch 
refs/heads/main from zacharymorn
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=3423243 ]

LUCENE-9662: fix test failure from merging away soft-deletes (#276)



> CheckIndex should be concurrent
> ---
>
> Key: LUCENE-9662
> URL: https://issues.apache.org/jira/browse/LUCENE-9662
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 17h 40m
>  Remaining Estimate: 0h
>
> I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, 
> using a single core out of the 128 cores the box has.
> It seems like this is an embarrassingly parallel problem, if the index has 
> multiple segments, and would finish much more quickly on concurrent hardware 
> if we did "thread per segment".
> If wanted to get even further concurrency, each part of the Lucene index that 
> is checked is also independent, so it could be "thread per segment per part".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent

2021-08-31 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407763#comment-17407763
 ] 

ASF subversion and git services commented on LUCENE-9662:
-

Commit 424192e1704664dc0ebc55109feaad5990b945cb in lucene's branch 
refs/heads/main from zacharymorn
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=424192e ]

LUCENE-9662: CheckIndex should be concurrent  - parallelizing index check 
across segments (#128)



> CheckIndex should be concurrent
> ---
>
> Key: LUCENE-9662
> URL: https://issues.apache.org/jira/browse/LUCENE-9662
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 16h 40m
>  Remaining Estimate: 0h
>
> I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, 
> using a single core out of the 128 cores the box has.
> It seems like this is an embarrassingly parallel problem, if the index has 
> multiple segments, and would finish much more quickly on concurrent hardware 
> if we did "thread per segment".
> If wanted to get even further concurrency, each part of the Lucene index that 
> is checked is also independent, so it could be "thread per segment per part".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9662) CheckIndex should be concurrent

2021-05-06 Thread Zach Chen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340017#comment-17340017
 ] 

Zach Chen commented on LUCENE-9662:
---

Hi [~mikemccand], I've taken a stab at this and created a WIP PR 
[https://github.com/apache/lucene/pull/128] with some nocommit comments. Could 
you please take a look and let me know your thoughts?

> CheckIndex should be concurrent
> ---
>
> Key: LUCENE-9662
> URL: https://issues.apache.org/jira/browse/LUCENE-9662
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Major
>
> I am watching a nightly benchmark run slowly run its {{CheckIndex}} step, 
> using a single core out of the 128 cores the box has.
> It seems like this is an embarrassingly parallel problem, if the index has 
> multiple segments, and would finish much more quickly on concurrent hardware 
> if we did "thread per segment".
> If wanted to get even further concurrency, each part of the Lucene index that 
> is checked is also independent, so it could be "thread per segment per part".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org