[jira] [Updated] (SOLR-14137) Boosting by date (and perhaps others) shows a steady decline 6.6->8.3

Erick Erickson (Jira) Tue, 24 Dec 2019 16:08:40 -0800


     [ 
https://issues.apache.org/jira/browse/SOLR-14137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Erick Erickson updated SOLR-14137:
----------------------------------
    Description: 
Moving a user's list discussion over here.

{color:#000000}The very short form is that from Solr 6.6.1 to Solr 8.3.1, the 
throughput for date boosting in my tests dropped by 40+%{color}

{color:#000000}I’ve been hearing about slowdowns in successive Solr releases 
with boost functions, so I dug into it a bit. The test setup is just a 
boost-by-date with an additional big OR clause of 100 random words so I’d be 
sure to hit a bunch of docs. I figured that if there were few hits, the signal 
would be lost in the noise, but I didn’t look at the actual hit counts.{color}

{color:#000000}I saw several Solr JIRAs about this subject, but they were 
slightly different, although quite possibly the same underlying issue. So I 
tried to get this down to a very specific form of a query.{color}

{color:#000000}I’ve also seen some cases in the wild where the response was 
proportional to the number of segments, thus my optimize experiments.{color}

{color:#000000}Here are the results, explanation below. O stands for optimized 
to one segment. I spot checked pdate against 6.6, 7.1 and 8.3 and they weren’t 
significantly different performance wise from tdate. All have docValues 
enabled. I ran these against a multiValued=“false” field. All the tests pegged 
all my CPUs. Jmeter is being run on a different machine than Solr. Only one 
Solr was running for any test.{color}

{color:#000000}Solr version   queries/min   {color}
{color:#000000}6.6.1              3,400          {color}
{color:#000000}6.6.1 O           4,800          {color}

{color:#000000}7.1                 2,800           {color}
{color:#000000}7.1 O             4,200           {color}

{color:#000000}7.7.1              2,400           {color}
{color:#000000}7.7.1 O          3,500            {color}

{color:#000000}8.3.1             2,000            {color}
{color:#000000}8.3.1 O          2,600            {color}


{color:#000000}The tests I’ve been running just index 20M docs into a single 
core, then run the exact same 10,000 queries against them from jmeter with 24 
threads. Spot checks showed no hits on the queryResultCache.{color}

{color:#000000}A query looks like this: {color}
{color:#000000}rows=0&\{!boost b=recip(ms(NOW, 
INSERT_FIELD_HERE),3.16e-11,1,1)}text_txt:(campaigners OR adjourned OR 
anyplace…97 more random words){color}

{color:#000000}There is no faceting. No grouping. No sorting.{color}

{color:#000000}I fill in INSERT_FIELD_HERE through jmeter magic. I’m running 
the exact same queries for every test.{color}

{color:#000000}One wildcard is that I did regenerate the index for each major 
revision, and the chose random words from the same list of words, as well as 
random times (bounded in the same range though) so the docs are not completely 
identical. The index was in the native format for that major version even if 
slightly different between versions. I ran the test once, then ran it again 
after optimizing the index.{color}

{color:#000000}I haven’t dug any farther, if anyone’s interested I can throw a 
profiler at, say, 8.3 and see what I can see, although I’m not going to have 
time to dive into this any time soon. I’d be glad to run some tests though. I 
saved the queries and the indexes so running a test would  only take a few 
minutes.{color}

{color:#000000}While I concentrated on date fields, the docs have date, int, 
and long fields, both docValues=true and docValues=false, each variant with 
multiValued=true and multiValued=false and both Trie and Point (where possible) 
variants as well as a pretty simple text field.{color}

  was:
Moving a user's list discussion over here.

{color:#000000}The very short form is that from Solr 6.6.1 to Solr 8.3.1, the 
throughput for date boosting in my tests dropped by 40+%{color}

{color:#000000}I’ve been hearing about slowdowns in successive Solr releases 
with boost functions, so I dug into it a bit. The test setup is just a 
boost-by-date with an additional big OR clause of 100 random words so I’d be 
sure to hit a bunch of docs. I figured that if there were few hits, the signal 
would be lost in the noise, but I didn’t look at the actual hit counts.{color}

{color:#000000}I saw several Solr JIRAs about this subject, but they were 
slightly different, although quite possibly the same underlying issue. So I 
tried to get this down to a very specific form of a query.{color}

{color:#000000}I’ve also seen some cases in the wild where the response was 
proportional to the number of segments, thus my optimize experiments.{color}

{color:#000000}Here are the results, explanation below. O stands for optimized 
to one segment. I spot checked pdate against 7x and 8x and they weren’t 
significantly different performance wise from tdate. All have docValues 
enabled. I ran these against a multiValued=“false” field. All the tests pegged 
all my CPUs. Jmeter is being run on a different machine than Solr. Only one 
Solr was running for any test.{color}

{color:#000000}Solr version   queries/min   {color}
{color:#000000}6.6.1              3,400          {color}
{color:#000000}6.6.1 O           4,800          {color}

{color:#000000}7.1                 2,800           {color}
{color:#000000}7.1 O             4,200           {color}

{color:#000000}7.7.1              2,400           {color}
{color:#000000}7.7.1 O          3,500            {color}

{color:#000000}8.3.1             2,000            {color}
{color:#000000}8.3.1 O          2,600            {color}


{color:#000000}The tests I’ve been running just index 20M docs into a single 
core, then run the exact same 10,000 queries against them from jmeter with 24 
threads. Spot checks showed no hits on the queryResultCache.{color}

{color:#000000}A query looks like this: {color}
{color:#000000}rows=0&\{!boost b=recip(ms(NOW, 
INSERT_FIELD_HERE),3.16e-11,1,1)}text_txt:(campaigners OR adjourned OR 
anyplace…97 more random words){color}

{color:#000000}There is no faceting. No grouping. No sorting.{color}

{color:#000000}I fill in INSERT_FIELD_HERE through jmeter magic. I’m running 
the exact same queries for every test.{color}

{color:#000000}One wildcard is that I did regenerate the index for each major 
revision, and the chose random words from the same list of words, as well as 
random times (bounded in the same range though) so the docs are not completely 
identical. The index was in the native format for that major version even if 
slightly different between versions. I ran the test once, then ran it again 
after optimizing the index.{color}

{color:#000000}I haven’t dug any farther, if anyone’s interested I can throw a 
profiler at, say, 8.3 and see what I can see, although I’m not going to have 
time to dive into this any time soon. I’d be glad to run some tests though. I 
saved the queries and the indexes so running a test would  only take a few 
minutes.{color}

{color:#000000}While I concentrated on date fields, the docs have date, int, 
and long fields, both docValues=true and docValues=false, each variant with 
multiValued=true and multiValued=false and both Trie and Point (where possible) 
variants as well as a pretty simple text field.{color}


> Boosting by date (and perhaps others) shows a steady decline 6.6->8.3
> ---------------------------------------------------------------------
>
>                 Key: SOLR-14137
>                 URL: https://issues.apache.org/jira/browse/SOLR-14137
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Erick Erickson
>            Priority: Major
>         Attachments: Screen Shot 2019-12-19 at 2.35.41 PM.png, Screen Shot 
> 2019-12-19 at 3.09.37 PM.png, Screen Shot 2019-12-19 at 3.31.16 PM.png
>
>
> Moving a user's list discussion over here.
> {color:#000000}The very short form is that from Solr 6.6.1 to Solr 8.3.1, the 
> throughput for date boosting in my tests dropped by 40+%{color}
> {color:#000000}I’ve been hearing about slowdowns in successive Solr releases 
> with boost functions, so I dug into it a bit. The test setup is just a 
> boost-by-date with an additional big OR clause of 100 random words so I’d be 
> sure to hit a bunch of docs. I figured that if there were few hits, the 
> signal would be lost in the noise, but I didn’t look at the actual hit 
> counts.{color}
> {color:#000000}I saw several Solr JIRAs about this subject, but they were 
> slightly different, although quite possibly the same underlying issue. So I 
> tried to get this down to a very specific form of a query.{color}
> {color:#000000}I’ve also seen some cases in the wild where the response was 
> proportional to the number of segments, thus my optimize experiments.{color}
> {color:#000000}Here are the results, explanation below. O stands for 
> optimized to one segment. I spot checked pdate against 6.6, 7.1 and 8.3 and 
> they weren’t significantly different performance wise from tdate. All have 
> docValues enabled. I ran these against a multiValued=“false” field. All the 
> tests pegged all my CPUs. Jmeter is being run on a different machine than 
> Solr. Only one Solr was running for any test.{color}
> {color:#000000}Solr version   queries/min   {color}
> {color:#000000}6.6.1              3,400          {color}
> {color:#000000}6.6.1 O           4,800          {color}
> {color:#000000}7.1                 2,800           {color}
> {color:#000000}7.1 O             4,200           {color}
> {color:#000000}7.7.1              2,400           {color}
> {color:#000000}7.7.1 O          3,500            {color}
> {color:#000000}8.3.1             2,000            {color}
> {color:#000000}8.3.1 O          2,600            {color}
> {color:#000000}The tests I’ve been running just index 20M docs into a single 
> core, then run the exact same 10,000 queries against them from jmeter with 24 
> threads. Spot checks showed no hits on the queryResultCache.{color}
> {color:#000000}A query looks like this: {color}
> {color:#000000}rows=0&\{!boost b=recip(ms(NOW, 
> INSERT_FIELD_HERE),3.16e-11,1,1)}text_txt:(campaigners OR adjourned OR 
> anyplace…97 more random words){color}
> {color:#000000}There is no faceting. No grouping. No sorting.{color}
> {color:#000000}I fill in INSERT_FIELD_HERE through jmeter magic. I’m running 
> the exact same queries for every test.{color}
> {color:#000000}One wildcard is that I did regenerate the index for each major 
> revision, and the chose random words from the same list of words, as well as 
> random times (bounded in the same range though) so the docs are not 
> completely identical. The index was in the native format for that major 
> version even if slightly different between versions. I ran the test once, 
> then ran it again after optimizing the index.{color}
> {color:#000000}I haven’t dug any farther, if anyone’s interested I can throw 
> a profiler at, say, 8.3 and see what I can see, although I’m not going to 
> have time to dive into this any time soon. I’d be glad to run some tests 
> though. I saved the queries and the indexes so running a test would  only 
> take a few minutes.{color}
> {color:#000000}While I concentrated on date fields, the docs have date, int, 
> and long fields, both docValues=true and docValues=false, each variant with 
> multiValued=true and multiValued=false and both Trie and Point (where 
> possible) variants as well as a pretty simple text field.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14137) Boosting by date (and perhaps others) shows a steady decline 6.6->8.3

Reply via email to