[GitHub] [lucene-solr] jgq2008303393 commented on issue #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance
jgq2008303393 commented on issue #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance URL: https://github.com/apache/lucene-solr/pull/884#issuecomment-539489308 > This is a nice, simple, high impact improvement -- thanks @jgq2008303393! It's my pleasure. Nice to meet you, @mikemccand. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jgq2008303393 commented on issue #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance
jgq2008303393 commented on issue #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance URL: https://github.com/apache/lucene-solr/pull/884#issuecomment-533763308 Hi, @dsmiley, please help to look again : ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jgq2008303393 commented on issue #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance
jgq2008303393 commented on issue #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance URL: https://github.com/apache/lucene-solr/pull/884#issuecomment-533567444 Hi, @atris. The complete results of wikimedium10m.result are [here](https://gist.github.com/jgq2008303393/44768d69a843c7b421e765bbab9360fd.js). The following table is the result of the last run: |TaskQPS |baseline|StdDevQPS|my_modified_version| StdDev |Pct_diff(percent_diff)| | --- | :: | :-: | :---: | :: | :--: | |OrHighNotLow | 293.93 | (5.8%) | 286.46 | (6.6%) | -2.5%(-14% - 10%)| | OrHighNotHigh | 258.18 | (3.7%) | 252.41 | (5.0%) | -2.2%(-10% - 6%)| | OrHighLow | 206.52 | (6.2%) | 202.55 | (6.2%) | -1.9%(-13% - 11%)| | MedPhrase | 16.41 | (4.1%) | 16.12 | (2.6%) | -1.7%( -8% - 5%)| | LowTerm | 608.71 | (5.7%) | 599.21 | (4.4%) | -1.6%(-10% - 9%)| | Prefix3 | 37.96 | (2.8%) | 37.51 | (3.8%) | -1.2%( -7% - 5%)| | OrNotHighHigh | 255.49 | (5.5%) | 252.63 | (6.1%) | -1.1%(-12% - 11%)| | MedSloppyPhrase | 13.71 | (3.5%) | 13.58 | (3.7%) | -1.0%( -7% - 6%)| |HighSloppyPhrase | 17.00 | (3.3%) | 16.84 | (3.7%) | -0.9%( -7% - 6%)| | OrHighHigh | 19.02 | (2.6%) | 18.85 | (2.7%) | -0.9%( -6% - 4%)| | MedTerm | 564.56 | (4.6%) | 559.38 | (2.9%) | -0.9%( -8% - 6%)| |OrNotHighLow | 294.29 | (4.9%) | 291.86 | (4.2%) | -0.8%( -9% - 8%)| | AndHighLow | 303.17 | (3.7%) | 300.72 | (4.5%) | -0.8%( -8% - 7%)| | AndHighHigh | 28.24 | (2.1%) | 28.01 | (2.7%) | -0.8%( -5% - 4%)| |Wildcard | 64.64 | (3.9%) | 64.21 | (4.0%) | -0.7%( -8% - 7%)| |HighSpanNear | 15.14 | (2.8%) | 15.04 | (2.5%) | -0.7%( -5% - 4%)| |HighTerm | 431.22 | (3.9%) | 428.68 | (2.9%) | -0.6%( -7% - 6%)| | LowSloppyPhrase | 19.29 | (2.2%) | 19.18 | (2.9%) | -0.6%( -5% - 4%)| | LowSpanNear | 64.32 | (2.3%) | 63.99 | (2.0%) | -0.5%( -4% - 3%)| | Fuzzy2 | 34.51 |(12.8%) | 34.34 |(11.9%) | -0.5%(-22% - 27%)| | MedSpanNear | 51.51 | (2.3%) | 51.28 | (1.6%) | -0.4%( -4% - 3%)| | HighTermDayOfYearSort | 51.45 | (6.6%) | 51.24 | (7.5%) | -0.4%(-13% - 14%)| |OrHighNotMed | 306.95 | (5.1%) | 306.03 | (3.2%) | -0.3%( -8% - 8%)| |BrowseDateTaxoFacets | 1.48 | (0.6%) |1.47 | (1.2%) | -0.2%( -1% - 1%)| | BrowseMonthSSDVFacets | 6.15 | (1.1%) |6.14 | (3.6%) | -0.2%( -4% - 4%)| | HighPhrase | 186.86 | (6.2%) | 186.64 | (3.7%) | -0.1%( -9% - 10%)| | Respell | 48.69 | (4.1%) | 48.65 | (4.0%) | -0.1%( -7% - 8%)| | AndHighMed | 65.66 | (3.0%) | 65.74 | (3.2%) | 0.1%( -5% - 6%)| |HighIntervalsOrdered | 6.68 | (1.5%) |6.69 | (1.7%) | 0.1%( -3% - 3%)| | LowPhrase | 219.11 | (5.7%) | 220.24 | (3.5%) | 0.5%( -8% - 10%)| | OrHighMed | 68.05 | (4.5%) | 68.44 | (3.1%) | 0.6%( -6% - 8%)| |OrNotHighMed | 272.89 | (5.7%) | 274.77 | (4.1%) | 0.7%( -8% - 11%)| | IntNRQ | 37.58 |(23.8%) | 37.96 |(24.2%) | 1.0%(-37% - 64%)| |BrowseDayOfYearSSDVFacets| 5.34 | (4.2%) |5.40 | (2.9%) | 1.2%( -5% - 8%)| | HighTermMonthSort | 34.82 |(11.7%) | 35.81 |(14.9%) | 2.9%(-21% - 33%)| | BrowseMonthTaxoFacets |4781.41 | (3.9%) | 4931.19 | (2.7%) | 3.1%( -3% - 10%)| | Fuzzy1 | 35.98 | (9.7%) | 37.42 | (8.0%) | 4.0%(-12% - 23%)| |BrowseDayOfYearTaxoFacets|4688.64 | (3.6%) | 4878.52 | (3.6%) | 4.0%( -3% - 11%)| |PKLookup | 72.93 | (4.7%) | 95.23 | (3.3%) | 30.6%( 21% - 40%)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [lucene-solr] jgq2008303393 commented on issue #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance
jgq2008303393 commented on issue #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance URL: https://github.com/apache/lucene-solr/pull/884#issuecomment-533528757 Thanks for your reply, @atris. The lueneutil tool promt that the valid data source is as follows: `'wikibigall', 'wikimedium10m', 'wikimedium10k', 'wikibig10k', 'wikibig100k', 'wikimedium2m', 'wikimedium1m', 'memeall', 'wikimedium500k', 'wikimediumall', 'wikimedium5m', 'euromedium', 'wikibig1m'` I want to make sure which source do you want to run? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jgq2008303393 commented on issue #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance
jgq2008303393 commented on issue #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance URL: https://github.com/apache/lucene-solr/pull/884#issuecomment-533510238 We have done more performance test using _luceneutil_ tool. And the complete test results are [here](https://gist.github.com/jgq2008303393/42d536f44b4845c01329a402202273eb.js). The _lueneutil_ tool will repeatedly execute the _wikimedium10k_ 20 times. The following table is the result of the last run. As shown in the table below, most of the indicators are basically stable, while the _PKLookup_ indicator has a performance improvement of 58.7%. The _Get_ and _Bulk_ API of Elasticsearch will also take benefit of this enhancement. | TaskQPS |baseline|StdDevQPS|my_modified_version| StdDev |Pct_diff(percent_diff)| | --- | :: | :-: | :---: | :---: | :--: | |HighIntervalsOrdered | 303.36 | (12.5%) | 283.86 | (16.9%) | -6.4%(-31% - 26%) | | MedPhrase | 404.26 | (12.3%) | 382.64 | (10.5%) | -5.3%(-25% - 19%) | | LowTerm |2302.28 | (8.7%) | 2180.74 | (11.8%) | -5.3%(-23% - 16%) | | AndHighMed | 618.78 | (10.1%) | 586.61 | (11.8%) | -5.2%(-24% - 18%) | |BrowseDayOfYearSSDVFacets|1042.68 | (10.1%) | 992.82 | (10.7%) | -4.8%(-23% - 17%) | |HighSpanNear | 263.62 | (12.9%) | 256.07 | (14.9%) | -2.9%(-27% - 28%) | |Wildcard | 221.10 | (16.2%) | 215.32 | (11.9%) | -2.6%(-26% - 30%) | | LowSpanNear | 656.60 | (7.9%) | 639.77 | (11.3%) | -2.6%(-20% - 18%) | | Fuzzy1 | 135.61 | (9.1%) | 132.26 | (10.4%) | -2.5%(-20% - 18%) | | AndHighHigh | 409.88 | (10.9%) | 399.79 | (12.6%) | -2.5%(-23% - 23%) | | OrHighHigh | 318.45 | (12.9%) | 312.43 | (12.2%) | -1.9%(-23% - 26%) | | AndHighLow | 937.17 | (10.2%) | 921.71 | (11.4%) | -1.6%(-21% - 22%) | | LowPhrase | 385.06 | (12.3%) | 379.83 | (10.8%) | -1.4%(-21% - 24%) | | IntNRQ | 618.69 | (14.1%) | 610.58 | (10.6%) | -1.3%(-22% - 27%) | | HighTermMonthSort |1178.14 | (9.5%) | 1164.48 | (12.6%) | -1.2%(-21% - 23%) | | Fuzzy2 | 46.95 | (16.2%) | 46.57 | (15.6%) | -0.8%(-28% - 36%) | | OrHighLow | 633.64 | (9.6%) | 629.21 | (9.9%) | -0.7%(-18% - 20%) | | BrowseMonthSSDVFacets |1157.34 | (12.1%) | 1155.63 | (13.5%) | -0.1%(-23% - 29%) | | Prefix3 | 297.40 | (12.1%) | 298.16 | (12.7%) | 0.3%(-21% - 28%) | | MedSpanNear | 434.56 | (10.0%) | 437.02 | (11.4%) | 0.6%(-19% - 24%) | | MedTerm |2158.68 | (8.8%) | 2177.67 | (11.1%) | 0.9%(-17% - 22%) | |HighSloppyPhrase | 320.36 | (10.0%) | 323.46 | (14.6%) | 1.0%(-21% - 28%) | |BrowseDateTaxoFacets |2065.89 | (13.7%) | 2088.22 | (13.2%) | 1.1%(-22% - 32%) | | Respell | 187.05 | (12.2%) | 189.48 | (10.1%) | 1.3%(-18% - 26%) | | MedSloppyPhrase | 583.45 | (11.3%) | 592.32 | (9.9%) | 1.5%(-17% - 25%) | |HighTerm |1114.87 | (12.0%) | 1131.89 | (12.8%) | 1.5%(-20% - 29%) | | HighTermDayOfYearSort | 408.17 | (13.1%) | 416.13 | (9.3%) | 1.9%(-18% - 27%) | |BrowseDayOfYearTaxoFacets|5460.05 | (8.5%) | 5591.96 | (8.0%) | 2.4%(-13% - 20%) | | BrowseMonthTaxoFacets |5490.18 | (8.0%) | 5654.03 | (9.3%) | 3.0%(-13% - 22%) | | LowSloppyPhrase | 562.96 | (10.1%) | 583.91 | (9.5%) | 3.7%(-14% - 25%) | | HighPhrase | 221.20 | (11.9%) | 229.85 | (12.2%) | 3.9%(-17% - 31%) | | OrHighMed | 352.09 | (12.3%) | 369.39 | (9.4%) | 4.9%(-14% - 30%) | |PKLookup | 85.19 | (18.1%) | 135.38 | (22.7%) | 58.9%( 15% - 121%) | This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jgq2008303393 commented on issue #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance
jgq2008303393 commented on issue #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance URL: https://github.com/apache/lucene-solr/pull/884#issuecomment-533150625 Thanks a lot, @dsmiley. I read the code you have shared. The design ideas between `uniformsplit.BlockReader` and this PR are similar. `uniformsplit.BlockReader` cuts off segments by the result of `BlockReader.seekBlock()`, while this PR cuts off segments directly according to the stored min/maxTerm metrics. We use Elasticsearch to support many time-series scenarios such as logs, APM, Metric, etc. And users typically add data using sequential IDs to ensure data uniqueness. As you said, this PR would be very noticeable in those scenarios since most segments will be cutted off directly. We will use _luceneutil_ to supply more performance test results according to your suggestions. Thanks again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jgq2008303393 commented on issue #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance
jgq2008303393 commented on issue #884: LUCENE-8980: optimise SegmentTermsEnum.seekExact performance URL: https://github.com/apache/lucene-solr/pull/884#issuecomment-532538959 ping@jpountz @mikemccand This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org