[
https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027639#comment-13027639
]
Uwe Schindler commented on LUCENE-3054:
---------------------------------------
I investigated what happens here:
The problem is indeed quickSort, but not undernormal circumstances. The problem
with quickSort (just google for stack overflow and quicksort) is that it only
works fine for arrays with many values. Once you only have few distinct values
and a large array, depending on the oreder it may happen that it splits into
two subarrays for next iteration, where one is very large and the other only
contains few items.
Attached is a patch, that shows the problem. It almost every time stack
overflows. Also quicksort is very *slow* for this case.
This is exactly what happens on PhraseQuery: we only have very few distinct
items and possibly a very huge array. To fix this, we should change PhraseQuery
to use mergeSort instead. Mergesort is also much faster in this case, as it
always splits the array in the center. So the number of iterations is limited.
For TermsHash/BytesRefHash its mostly also not a problem, as the values (the
terms are 100% distict, as only the hash is sorted).
But there may still be the slight chance this messes up. I propose to change
SorterTemplate to fall back to mergeSort once it checks that number of
iterations grows e.g. > 20 (have to test a little bit).
I will change that issue to higher priority and we also need to backport to 3.1.
> add assert to sorts catch broken comparators in tests
> -----------------------------------------------------
>
> Key: LUCENE-3054
> URL: https://issues.apache.org/jira/browse/LUCENE-3054
> Project: Lucene - Java
> Issue Type: Task
> Affects Versions: 3.1
> Reporter: Robert Muir
> Attachments: LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these
> broken comparators.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]