[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

John Wang (JIRA) Fri, 23 Oct 2009 00:06:25 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769116#action_12769116
 ]


John Wang commented on LUCENE-1997:
-----------------------------------

I think I found the reason for the discrepancy: 32 vs 64 bit:

32-bit, run
jwang-mn:benchmark jwang$ python -u sortBench.py -report john3

||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change||
|log|<all>|1000000|rand string|10|92.24|103.65|{color:green}12.4%{color}|
|log|<all>|1000000|rand string|25|91.88|102.06|{color:green}11.1%{color}|
|log|<all>|1000000|rand string|50|91.72|99.07|{color:green}8.0%{color}|
|log|<all>|1000000|rand string|100|106.26|90.61|{color:red}-14.7%{color}|
|log|<all>|1000000|rand string|500|86.38|59.88|{color:red}-30.7%{color}|
|log|<all>|1000000|rand string|1000|74.88|39.93|{color:red}-46.7%{color}|
|log|<all>|1000000|country|10|92.33|103.79|{color:green}12.4%{color}|
|log|<all>|1000000|country|25|92.27|101.60|{color:green}10.1%{color}|
|log|<all>|1000000|country|50|91.58|99.14|{color:green}8.3%{color}|
|log|<all>|1000000|country|100|100.76|82.25|{color:red}-18.4%{color}|
|log|<all>|1000000|country|500|75.18|48.65|{color:red}-35.3%{color}|
|log|<all>|1000000|country|1000|67.68|32.67|{color:red}-51.7%{color}|
|log|<all>|1000000|rand int|10|88.14|101.93|{color:green}15.6%{color}|
|log|<all>|1000000|rand int|25|95.02|96.14|{color:green}1.2%{color}|
|log|<all>|1000000|rand int|50|96.54|89.61|{color:red}-7.2%{color}|
|log|<all>|1000000|rand int|100|88.58|92.06|{color:green}3.9%{color}|
|log|<all>|1000000|rand int|500|103.60|62.25|{color:red}-39.9%{color}|
|log|<all>|1000000|rand int|1000|92.36|40.84|{color:red}-55.8%{color}|

64bit run:
jwang-mn:benchmark jwang$ python -u sortBench.py -report john4

||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change||
|log|<all>|1000000|rand string|10|119.59|107.52|{color:red}-10.1%{color}|
|log|<all>|1000000|rand string|25|119.25|105.05|{color:red}-11.9%{color}|
|log|<all>|1000000|rand string|50|117.22|101.99|{color:red}-13.0%{color}|
|log|<all>|1000000|rand string|100|95.78|86.19|{color:red}-10.0%{color}|
|log|<all>|1000000|rand string|500|76.05|54.71|{color:red}-28.1%{color}|
|log|<all>|1000000|rand string|1000|68.37|38.94|{color:red}-43.0%{color}|
|log|<all>|1000000|country|10|119.68|108.12|{color:red}-9.7%{color}|
|log|<all>|1000000|country|25|119.10|105.72|{color:red}-11.2%{color}|
|log|<all>|1000000|country|50|115.85|99.70|{color:red}-13.9%{color}|
|log|<all>|1000000|country|100|97.44|91.03|{color:red}-6.6%{color}|
|log|<all>|1000000|country|500|78.92|40.97|{color:red}-48.1%{color}|
|log|<all>|1000000|country|1000|68.48|30.43|{color:red}-55.6%{color}|
|log|<all>|1000000|rand int|10|121.64|108.82|{color:red}-10.5%{color}|
|log|<all>|1000000|rand int|25|121.68|113.92|{color:red}-6.4%{color}|
|log|<all>|1000000|rand int|50|120.80|110.45|{color:red}-8.6%{color}|
|log|<all>|1000000|rand int|100|101.36|95.68|{color:red}-5.6%{color}|
|log|<all>|1000000|rand int|500|90.15|60.29|{color:red}-33.1%{color}|
|log|<all>|1000000|rand int|1000|80.23|40.67|{color:red}-49.3%{color}|



> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
>                 Key: LUCENE-1997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
>     segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
>     random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

Reply via email to