[ 
https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027639#comment-13027639
 ] 

Uwe Schindler commented on LUCENE-3054:
---------------------------------------

I investigated what happens here:

The problem is indeed quickSort, but not undernormal circumstances. The problem 
with quickSort (just google for stack overflow and quicksort) is that it only 
works fine for arrays with many values. Once you only have few distinct values 
and a large array, depending on the oreder it may happen that it splits into 
two subarrays for next iteration, where one is very large and the other only 
contains few items.

Attached is a patch, that shows the problem. It almost every time stack 
overflows. Also quicksort is very *slow* for this case.

This is exactly what happens on PhraseQuery: we only have very few distinct 
items and possibly a very huge array. To fix this, we should change PhraseQuery 
to use mergeSort instead. Mergesort is also much faster in this case, as it 
always splits the array in the center. So the number of iterations is limited.

For TermsHash/BytesRefHash its mostly also not a problem, as the values (the 
terms are 100% distict, as only the hash is sorted).

But there may still be the slight chance this messes up. I propose to change 
SorterTemplate to fall back to mergeSort once it checks that number of 
iterations grows e.g. > 20 (have to test a little bit).

I will change that issue to higher priority and we also need to backport to 3.1.

> add assert to sorts catch broken comparators in tests
> -----------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>         Attachments: LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these 
> broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to