Re: lucene 2.9 sorting algorithm

John Wang Thu, 22 Oct 2009 19:46:40 -0700

For some reason I guess this didn't go thru and caused all the confusion.

||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change||
|log|<all>|1000000|rand string|10|91.76|108.63|{color:green}18.4%{color}|
|log|<all>|1000000|rand string|25|92.39|106.79|{color:green}15.6%{color}|
|log|<all>|1000000|rand string|50|91.30|104.02|{color:green}13.9%{color}|
|log|<all>|1000000|rand string|500|86.16|63.27|{color:red}-26.6%{color}|
|log|<all>|1000000|rand string|1000|76.92|64.85|{color:red}-15.7%{color}|
|log|<all>|1000000|country|10|92.42|108.78|{color:green}17.7%{color}|
|log|<all>|1000000|country|25|92.60|106.26|{color:green}14.8%{color}|
|log|<all>|1000000|country|50|92.64|103.76|{color:green}12.0%{color}|
|log|<all>|1000000|country|500|83.92|50.30|{color:red}-40.1%{color}|
|log|<all>|1000000|country|1000|74.78|46.59|{color:red}-37.7%{color}|
|log|<all>|1000000|rand int|10|114.03|114.85|{color:green}0.7%{color}|
|log|<all>|1000000|rand int|25|113.77|112.92|{color:red}-0.7%{color}|
|log|<all>|1000000|rand int|50|113.36|109.56|{color:red}-3.4%{color}|
|log|<all>|1000000|rand int|500|103.90|66.29|{color:red}-36.2%{color}|
|log|<all>|1000000|rand int|1000|89.52|70.67|{color:red}-21.1%{color}|


On Thu, Oct 22, 2009 at 7:43 PM, John Wang <john.w...@gmail.com> wrote:

> Mike:
>        I did just post with what I saw, feel free to read and comment on
> it.
>
>        I am simply trying to work with Michael on this and trying to
> understand the code.
>
>        As I have expressed previously, I have seen a difference between 1.5
> and 1.6 that is significant. Since Mike has posted some numbers on jdk 1.6,
> I was hoping to eliminate all variables relating to the index and
> environment and see if he sees the same thing.
>
>         I guess I should be more clear in the email.
>
> -John
>
> On Thu, Oct 22, 2009 at 7:39 PM, Mark Miller <markrmil...@gmail.com>wrote:
>
>> I am patient :) And I'm not speaking for Mike, I'm speaking for me. I'm
>> wondering what your seeing. Asking Mike to rerun the tests without
>> giving any further info (you didn't even say that your seeing something
>> different) is unfair to the rest of us ;)
>>
>> Giving 0 info along with your request just makes 0 sense to me and I
>> said as much.
>>
>> John Wang wrote:
>> > Mark:
>> >
>> >        Please be patient with me. I am seeing a difference and was
>> > wondering if Mike would see the same thing. I thought Michael would be
>> > willing to because he expressed interest in understanding what the
>> > performance discrepancies are.
>> >
>> >        Again, it is only a request. It is perfectly fine if Michael
>> > refuses to. But it would be great if Michael speaks for himself.
>> >
>> > Thanks
>> >
>> > -John
>> >
>> > On Thu, Oct 22, 2009 at 7:29 PM, Mark Miller <markrmil...@gmail.com
>> > <mailto:markrmil...@gmail.com>> wrote:
>> >
>> >     Why? What might he find? Whats with the cryptic request?
>> >
>> >     Why would Java 1.5 perform better than 1.6? It erases 20 and 40%
>> >     gains?
>> >
>> >     I know point 2 certainly doesn't. Cards on the table?
>> >
>> >     John Wang wrote:
>> >     > Hey Michael:
>> >     >
>> >     >        Would you mind rerunning the test you have with jdk1.5?
>> >     >
>> >     >        Also, if you would, change the comparator method to avoid
>> >     > brachning for int and string comparators, e.g.
>> >     >
>> >     >
>> >     >       return index.order[i.doc] - index.order[j.doc];
>> >     >
>> >     >
>> >     > Thanks
>> >     >
>> >     >
>> >     > -John
>> >     >
>> >     >
>> >     > On Thu, Oct 22, 2009 at 2:38 AM, Michael McCandless
>> >     > <luc...@mikemccandless.com <mailto:luc...@mikemccandless.com>
>> >     <mailto:luc...@mikemccandless.com
>> >     <mailto:luc...@mikemccandless.com>>> wrote:
>> >     >
>> >     >     On Thu, Oct 22, 2009 at 2:17 AM, John Wang
>> >     <john.w...@gmail.com <mailto:john.w...@gmail.com>
>> >     >     <mailto:john.w...@gmail.com <mailto:john.w...@gmail.com>>>
>> >     wrote:
>> >     >
>> >     >     >      I have been playing with the patch, and I think I
>> >     have some
>> >     >     information
>> >     >     > that you might like.
>> >     >     >      Let me spend sometime and gather some more numbers and
>> >     >     update in jira.
>> >     >
>> >     >     Excellent!
>> >     >
>> >     >     >      say bottom has ords 23, 45, 76, each corresponding to a
>> >     >     string. When
>> >     >     > moving to the next segment, you need to make bottom to
>> >     have ords
>> >     >     that can be
>> >     >     > comparable to other docs in this new segment, so you would
>> >     need
>> >     >     to find the
>> >     >     > new ords for the values in 23,45 and 76, don't you? To
>> >     find it,
>> >     >     assuming the
>> >     >     > values are s1,s2,s3, you would do a bin. search on the new
>> val
>> >     >     array, and
>> >     >     > find index for s1,s2,s3.
>> >     >
>> >     >     It's that inversion (from ord->Comparable in first seg, and
>> >     >     Comparable->ord in second seg) that I'm trying to avoid (w/
>> >     this new
>> >     >     proposal).
>> >     >
>> >     >     > Which is 3 bin searches per convert, I am not sure
>> >     >     > how you can short circuit it. Are you suggesting we call
>> >     >     Comparable on
>> >     >     > compareBottom until some doc beats it?
>> >     >
>> >     >     I'm saying on seg transition you indeed get the Comparable
>> >     for current
>> >     >     bottom, but, don't attempt to invert it.  Instead, as seg 2
>> >     finds a
>> >     >     hit, you get that hit's Comparables and compare to bottom.
>> >      If it
>> >     >     beats bottom, it goes into the queue.  If it does not, you
>> >     use the ord
>> >     >     (in seg 2's ord space) to "learn" a bottom in the ord space
>> >     of seg 2.
>> >     >
>> >     >     > That would hurt performance I lot though, no?
>> >     >
>> >     >     Yeah I think likely it would, since we're talking about a
>> binary
>> >     >     search on transition VS having to do possibly many
>> >     >     upgrade-to-Comparable and compare-Comparabls to slowly learn
>> the
>> >     >     equivalent ord in the new segment.  I was proposing it for
>> >     cases where
>> >     >     inversion is very difficult.  But realistically, since you
>> >     must keep
>> >     >     around the ful ord -> Comparable for every segment anyway
>> >     (in order to
>> >     >     merge in the end), inversion shouldn't ever actually be
>> >     "difficult" --
>> >     >     it'd just be a binary search on presumably in-RAM storage.
>> >     >
>> >     >     Mike
>> >     >
>> >     >
>> >
>> ---------------------------------------------------------------------
>> >     >     To unsubscribe, e-mail:
>> >     java-dev-unsubscr...@lucene.apache.org
>> >     <mailto:java-dev-unsubscr...@lucene.apache.org>
>> >     >     <mailto:java-dev-unsubscr...@lucene.apache.org
>> >     <mailto:java-dev-unsubscr...@lucene.apache.org>>
>> >     >     For additional commands, e-mail:
>> >     java-dev-h...@lucene.apache.org
>> >     <mailto:java-dev-h...@lucene.apache.org>
>> >     >     <mailto:java-dev-h...@lucene.apache.org
>> >     <mailto:java-dev-h...@lucene.apache.org>>
>> >     >
>> >     >
>> >
>> >
>> >     --
>> >     - Mark
>> >
>> >     http://www.lucidimagination.com
>> >
>> >
>> >
>> >
>> >
>> ---------------------------------------------------------------------
>> >     To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> >     <mailto:java-dev-unsubscr...@lucene.apache.org>
>> >     For additional commands, e-mail: java-dev-h...@lucene.apache.org
>> >     <mailto:java-dev-h...@lucene.apache.org>
>> >
>> >
>>
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>
>>
>

Re: lucene 2.9 sorting algorithm

Reply via email to