[GitHub] [lucene-solr] jgq2008303393 commented on issue #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance

2019-10-08 Thread GitBox
jgq2008303393 commented on issue #884: LUCENE-8980: optimise 
SegmentTermsEnum.seekExact performance
URL: https://github.com/apache/lucene-solr/pull/884#issuecomment-539489308
 
 
   > This is a nice, simple, high impact improvement -- thanks @jgq2008303393!
   
   It's my pleasure. Nice to meet you, @mikemccand. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jgq2008303393 commented on issue #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance

2019-09-20 Thread GitBox
jgq2008303393 commented on issue #884: LUCENE-8980: optimise 
SegmentTermsEnum.seekExact performance
URL: https://github.com/apache/lucene-solr/pull/884#issuecomment-533763308
 
 
   Hi, @dsmiley, please help to look again : )


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jgq2008303393 commented on issue #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance

2019-09-20 Thread GitBox
jgq2008303393 commented on issue #884: LUCENE-8980: optimise 
SegmentTermsEnum.seekExact performance
URL: https://github.com/apache/lucene-solr/pull/884#issuecomment-533567444
 
 
   Hi, @atris. The complete results of wikimedium10m.result are 
[here](https://gist.github.com/jgq2008303393/44768d69a843c7b421e765bbab9360fd.js).
  The following table is the result of the last run:
   
   |TaskQPS  |baseline|StdDevQPS|my_modified_version| StdDev 
|Pct_diff(percent_diff)|
   | --- | :: | :-: | :---: | :: | 
:--: |
   |OrHighNotLow | 293.93 | (5.8%)  |  286.46   | (6.6%) | 
-2.5%(-14% - 10%)|
   |   OrHighNotHigh | 258.18 | (3.7%)  |  252.41   | (5.0%) | 
-2.2%(-10% -  6%)|
   |   OrHighLow | 206.52 | (6.2%)  |  202.55   | (6.2%) | 
-1.9%(-13% - 11%)|
   |   MedPhrase |  16.41 | (4.1%)  |   16.12   | (2.6%) | 
-1.7%( -8% -  5%)|
   | LowTerm | 608.71 | (5.7%)  |  599.21   | (4.4%) | 
-1.6%(-10% -  9%)|
   | Prefix3 |  37.96 | (2.8%)  |   37.51   | (3.8%) | 
-1.2%( -7% -  5%)|
   |   OrNotHighHigh | 255.49 | (5.5%)  |  252.63   | (6.1%) | 
-1.1%(-12% - 11%)|
   | MedSloppyPhrase |  13.71 | (3.5%)  |   13.58   | (3.7%) | 
-1.0%( -7% -  6%)|
   |HighSloppyPhrase |  17.00 | (3.3%)  |   16.84   | (3.7%) | 
-0.9%( -7% -  6%)|
   |  OrHighHigh |  19.02 | (2.6%)  |   18.85   | (2.7%) | 
-0.9%( -6% -  4%)|
   | MedTerm | 564.56 | (4.6%)  |  559.38   | (2.9%) | 
-0.9%( -8% -  6%)|
   |OrNotHighLow | 294.29 | (4.9%)  |  291.86   | (4.2%) | 
-0.8%( -9% -  8%)|
   |  AndHighLow | 303.17 | (3.7%)  |  300.72   | (4.5%) | 
-0.8%( -8% -  7%)|
   | AndHighHigh |  28.24 | (2.1%)  |   28.01   | (2.7%) | 
-0.8%( -5% -  4%)|
   |Wildcard |  64.64 | (3.9%)  |   64.21   | (4.0%) | 
-0.7%( -8% -  7%)|
   |HighSpanNear |  15.14 | (2.8%)  |   15.04   | (2.5%) | 
-0.7%( -5% -  4%)|
   |HighTerm | 431.22 | (3.9%)  |  428.68   | (2.9%) | 
-0.6%( -7% -  6%)|
   | LowSloppyPhrase |  19.29 | (2.2%)  |   19.18   | (2.9%) | 
-0.6%( -5% -  4%)|
   | LowSpanNear |  64.32 | (2.3%)  |   63.99   | (2.0%) | 
-0.5%( -4% -  3%)|
   |  Fuzzy2 |  34.51 |(12.8%)  |   34.34   |(11.9%) | 
-0.5%(-22% - 27%)|
   | MedSpanNear |  51.51 | (2.3%)  |   51.28   | (1.6%) | 
-0.4%( -4% -  3%)|
   |   HighTermDayOfYearSort |  51.45 | (6.6%)  |   51.24   | (7.5%) | 
-0.4%(-13% - 14%)|
   |OrHighNotMed | 306.95 | (5.1%)  |  306.03   | (3.2%) | 
-0.3%( -8% -  8%)|
   |BrowseDateTaxoFacets |   1.48 | (0.6%)  |1.47   | (1.2%) | 
-0.2%( -1% -  1%)|
   |   BrowseMonthSSDVFacets |   6.15 | (1.1%)  |6.14   | (3.6%) | 
-0.2%( -4% -  4%)|
   |  HighPhrase | 186.86 | (6.2%)  |  186.64   | (3.7%) | 
-0.1%( -9% - 10%)|
   | Respell |  48.69 | (4.1%)  |   48.65   | (4.0%) | 
-0.1%( -7% -  8%)|
   |  AndHighMed |  65.66 | (3.0%)  |   65.74   | (3.2%) |  
0.1%( -5% -  6%)|
   |HighIntervalsOrdered |   6.68 | (1.5%)  |6.69   | (1.7%) |  
0.1%( -3% -  3%)|
   |   LowPhrase | 219.11 | (5.7%)  |  220.24   | (3.5%) |  
0.5%( -8% - 10%)|
   |   OrHighMed |  68.05 | (4.5%)  |   68.44   | (3.1%) |  
0.6%( -6% -  8%)|
   |OrNotHighMed | 272.89 | (5.7%)  |  274.77   | (4.1%) |  
0.7%( -8% - 11%)|
   |  IntNRQ |  37.58 |(23.8%)  |   37.96   |(24.2%) |  
1.0%(-37% - 64%)|
   |BrowseDayOfYearSSDVFacets|   5.34 | (4.2%)  |5.40   | (2.9%) |  
1.2%( -5% -  8%)|
   |   HighTermMonthSort |  34.82 |(11.7%)  |   35.81   |(14.9%) |  
2.9%(-21% - 33%)|
   |   BrowseMonthTaxoFacets |4781.41 | (3.9%)  | 4931.19   | (2.7%) |  
3.1%( -3% - 10%)|
   |  Fuzzy1 |  35.98 | (9.7%)  |   37.42   | (8.0%) |  
4.0%(-12% - 23%)|
   |BrowseDayOfYearTaxoFacets|4688.64 | (3.6%)  | 4878.52   | (3.6%) |  
4.0%( -3% - 11%)|
   |PKLookup |  72.93 | (4.7%)  |   95.23   | (3.3%) | 
30.6%( 21% - 40%)|
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [lucene-solr] jgq2008303393 commented on issue #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance

2019-09-20 Thread GitBox
jgq2008303393 commented on issue #884: LUCENE-8980: optimise 
SegmentTermsEnum.seekExact performance
URL: https://github.com/apache/lucene-solr/pull/884#issuecomment-533528757
 
 
   Thanks for your reply, @atris. 
   
   The lueneutil tool promt that the valid data source is as follows:
   `'wikibigall', 'wikimedium10m', 'wikimedium10k', 'wikibig10k', 
'wikibig100k', 'wikimedium2m', 'wikimedium1m', 'memeall', 'wikimedium500k', 
'wikimediumall', 'wikimedium5m', 'euromedium', 'wikibig1m'`
   
   I want to make sure which source do you want to run?  
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jgq2008303393 commented on issue #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance

2019-09-20 Thread GitBox
jgq2008303393 commented on issue #884: LUCENE-8980: optimise 
SegmentTermsEnum.seekExact performance
URL: https://github.com/apache/lucene-solr/pull/884#issuecomment-533510238
 
 
   We have done more performance test using _luceneutil_ tool. And the complete 
test results are 
[here](https://gist.github.com/jgq2008303393/42d536f44b4845c01329a402202273eb.js).
   
   The _lueneutil_ tool will repeatedly execute the _wikimedium10k_ 20 times. 
The following table is the result of the last run. As shown in the table below, 
most of the indicators are basically stable, while the _PKLookup_ indicator has 
a performance improvement of 58.7%. The _Get_ and _Bulk_ API of Elasticsearch 
will also take benefit of this enhancement.
   
   
   |   TaskQPS   |baseline|StdDevQPS|my_modified_version|  StdDev   
|Pct_diff(percent_diff)|
   | --- | :: | :-: | :---: | :---: 
| :--: |
   |HighIntervalsOrdered | 303.36 | (12.5%) |  283.86   |  (16.9%)  
|  -6.4%(-31% -  26%)  |
   |   MedPhrase | 404.26 | (12.3%) |  382.64   |  (10.5%)  
|  -5.3%(-25% -  19%)  |
   | LowTerm |2302.28 |  (8.7%) | 2180.74   |  (11.8%)  
|  -5.3%(-23% -  16%)  |
   |  AndHighMed | 618.78 | (10.1%) |  586.61   |  (11.8%)  
|  -5.2%(-24% -  18%)  |
   |BrowseDayOfYearSSDVFacets|1042.68 | (10.1%) |  992.82   |  (10.7%)  
|  -4.8%(-23% -  17%)  |
   |HighSpanNear | 263.62 | (12.9%) |  256.07   |  (14.9%)  
|  -2.9%(-27% -  28%)  |
   |Wildcard | 221.10 | (16.2%) |  215.32   |  (11.9%)  
|  -2.6%(-26% -  30%)  |
   | LowSpanNear | 656.60 |  (7.9%) |  639.77   |  (11.3%)  
|  -2.6%(-20% -  18%)  |
   |  Fuzzy1 | 135.61 |  (9.1%) |  132.26   |  (10.4%)  
|  -2.5%(-20% -  18%)  |
   | AndHighHigh | 409.88 | (10.9%) |  399.79   |  (12.6%)  
|  -2.5%(-23% -  23%)  |
   |  OrHighHigh | 318.45 | (12.9%) |  312.43   |  (12.2%)  
|  -1.9%(-23% -  26%)  |
   |  AndHighLow | 937.17 | (10.2%) |  921.71   |  (11.4%)  
|  -1.6%(-21% -  22%)  |
   |   LowPhrase | 385.06 | (12.3%) |  379.83   |  (10.8%)  
|  -1.4%(-21% -  24%)  |
   |  IntNRQ | 618.69 | (14.1%) |  610.58   |  (10.6%)  
|  -1.3%(-22% -  27%)  |
   |   HighTermMonthSort |1178.14 |  (9.5%) | 1164.48   |  (12.6%)  
|  -1.2%(-21% -  23%)  |
   |  Fuzzy2 |  46.95 | (16.2%) |   46.57   |  (15.6%)  
|  -0.8%(-28% -  36%)  |
   |   OrHighLow | 633.64 |  (9.6%) |  629.21   |   (9.9%)  
|  -0.7%(-18% -  20%)  |
   |   BrowseMonthSSDVFacets |1157.34 | (12.1%) | 1155.63   |  (13.5%)  
|  -0.1%(-23% -  29%)  |
   | Prefix3 | 297.40 | (12.1%) |  298.16   |  (12.7%)  
|   0.3%(-21% -  28%)  |
   | MedSpanNear | 434.56 | (10.0%) |  437.02   |  (11.4%)  
|   0.6%(-19% -  24%)  |
   | MedTerm |2158.68 |  (8.8%) | 2177.67   |  (11.1%)  
|   0.9%(-17% -  22%)  |
   |HighSloppyPhrase | 320.36 | (10.0%) |  323.46   |  (14.6%)  
|   1.0%(-21% -  28%)  |
   |BrowseDateTaxoFacets |2065.89 | (13.7%) | 2088.22   |  (13.2%)  
|   1.1%(-22% -  32%)  |
   | Respell | 187.05 | (12.2%) |  189.48   |  (10.1%)  
|   1.3%(-18% -  26%)  |
   | MedSloppyPhrase | 583.45 | (11.3%) |  592.32   |   (9.9%)  
|   1.5%(-17% -  25%)  |
   |HighTerm |1114.87 | (12.0%) | 1131.89   |  (12.8%)  
|   1.5%(-20% -  29%)  |
   |   HighTermDayOfYearSort | 408.17 | (13.1%) |  416.13   |   (9.3%)  
|   1.9%(-18% -  27%)  |
   |BrowseDayOfYearTaxoFacets|5460.05 |  (8.5%) | 5591.96   |   (8.0%)  
|   2.4%(-13% -  20%)  |
   |   BrowseMonthTaxoFacets |5490.18 |  (8.0%) | 5654.03   |   (9.3%)  
|   3.0%(-13% -  22%)  |
   | LowSloppyPhrase | 562.96 | (10.1%) |  583.91   |   (9.5%)  
|   3.7%(-14% -  25%)  |
   |  HighPhrase | 221.20 | (11.9%) |  229.85   |  (12.2%)  
|   3.9%(-17% -  31%)  |
   |   OrHighMed | 352.09 | (12.3%) |  369.39   |   (9.4%)  
|   4.9%(-14% -  30%)  |
   |PKLookup |  85.19 | (18.1%) |  135.38   |  (22.7%)  
|  58.9%( 15% - 121%)  |


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org


[GitHub] [lucene-solr] jgq2008303393 commented on issue #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance

2019-09-19 Thread GitBox
jgq2008303393 commented on issue #884: LUCENE-8980: optimise 
SegmentTermsEnum.seekExact performance
URL: https://github.com/apache/lucene-solr/pull/884#issuecomment-533150625
 
 
   Thanks a lot, @dsmiley. 
   
   I read the code you have shared. The design ideas between 
`uniformsplit.BlockReader` and this PR are similar. `uniformsplit.BlockReader` 
cuts off segments by the result of `BlockReader.seekBlock()`, while this PR 
cuts off segments directly according to the stored min/maxTerm metrics.
   
   We use Elasticsearch to support many time-series scenarios such as logs, 
APM, Metric, etc. And users typically add data using sequential IDs to ensure 
data uniqueness. As you said, this PR would be very noticeable in those 
scenarios since most segments will be cutted off directly.
   
   We will use _luceneutil_ to supply more performance test results according 
to your suggestions. Thanks again.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jgq2008303393 commented on issue #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance

2019-09-18 Thread GitBox
jgq2008303393 commented on issue #884: LUCENE-8980: optimise 
SegmentTermsEnum.seekExact performance
URL: https://github.com/apache/lucene-solr/pull/884#issuecomment-532538959
 
 
   ping@jpountz @mikemccand


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org