Re: Json Faceting Performance Issues on solr v8.7.0
Ah! that's significant. The latency is likely due to building the OrdinalMap (which maps segment ords to global ords) ... "dvhash" (assuming the relevant fields are not multivalued) will very likely work; "dvhash" doesn't map to global ords, so doesn't need to build the OrdinalMap (which gets built the first time it's needed per-field per-searcher). If "dvhash" doesn't work for some reason (multivalued fields, needs to work over broader domains, etc.?) you could probably achieve a decent result by configuring a static warming query (newSearcher) to issue a request that facets on the relevant fields. That will delay the opening of each new searcher, but will ensure that user requests don't block. SOLR-15008 _was_ actually pretty similar, with the added wrinkle of involving distributed (multi-shard) requests (and iirc "dvhash" wouldn't have worked in that case?) On Fri, Feb 5, 2021 at 8:00 PM mmb1234 wrote: > > Does this happen on a warm searcher (are subsequent requests with no > intervening updates _ever_ fast?)? > > Subsequent response times very fast if searcher remains open. As a control > test, I faceted on the same field that I used in the q param. > > 1. Start solr > > 2. Execute q=resultId:x&rows=0 > => 500ms > > 3. Execute q=resultId:x&rows=0&json.facet-on-resultId > => 40,000ms > > 4. Execute q=resultId:x&rows=0&json.facet-on-resultId > => 150ms > > 5. Execute q=processId:x&rows=0&json.facet-on-processId > => 2,500ms > > 6. Execute q=processId:x&rows=0&json.facet-on-processId > => 200ms > > > curl > ' > http://localhost:8983/solr/TestCollection_shard1_replica_t3/query?q=processId:-xxx-xxx-xxx-x&rows=0 > ' > -d ' > json.facet={ > categories:{ > "type": "terms", > "field" : "processId", > "limit" : 1 > } > } > > > > -- > Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: Json Faceting Performance Issues on solr v8.7.0
> Does this happen on a warm searcher (are subsequent requests with no intervening updates _ever_ fast?)? Subsequent response times very fast if searcher remains open. As a control test, I faceted on the same field that I used in the q param. 1. Start solr 2. Execute q=resultId:x&rows=0 => 500ms 3. Execute q=resultId:x&rows=0&json.facet-on-resultId => 40,000ms 4. Execute q=resultId:x&rows=0&json.facet-on-resultId => 150ms 5. Execute q=processId:x&rows=0&json.facet-on-processId => 2,500ms 6. Execute q=processId:x&rows=0&json.facet-on-processId => 200ms curl 'http://localhost:8983/solr/TestCollection_shard1_replica_t3/query?q=processId:-xxx-xxx-xxx-x&rows=0' -d ' json.facet={ categories:{ "type": "terms", "field" : "processId", "limit" : 1 } } -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Json Faceting Performance Issues on solr v8.7.0
Apologies, I missed deducing from the request url that you're already talking strictly about single-shard requests (so everything I was suggesting about shards.preference etc. is not applicable). "dvhash" is still worth a try though, esp. with `numFound` being 943 (out of 185 million!). Does this happen on a warm searcher (are subsequent requests with no intervening updates _ever_ fast?)? On Fri, Feb 5, 2021 at 6:13 PM mmb1234 wrote: > Ok. I'll try that. Meanwhile query on resultId is subsecond response. But > the > immediate next query for faceting takes 40+secs. The core has 185million > docs and 63GB index size. > > curl > ' > http://localhost:8983/solr/TestCollection_shard1_replica_t3/query?q=resultId:x&rows=0 > ' > { > "responseHeader":{ > "zkConnected":true, > "status":0, > "QTime":558, > "params":{ > "q":"resultId:x", > "cache":"false", > "rows":"0"}}, > "response":{"numFound":943,"start":0,"numFoundExact":true,"docs":[] > }} > > > curl > ' > http://localhost:8983/solr/TestCollection_shard1_replica_t3/query?q=resultId:x&rows=0 > ' > -d ' > json.facet={ > categories:{ > "type": "terms", > "field" : "resultId", > "limit" : 1 > } > }' > { > "responseHeader":{ > "zkConnected":true, > "status":0, > "QTime":43834, > "params":{ > "q":"resultId:x", > "json.facet":"{\ncategories:{\n \"type\": \"terms\",\n > \"field\" : \"resultId\",\n \"limit\" : 1\n}\n}", > "cache":"false", > "rows":"0"}}, > "response":{"numFound":943,"start":0,"numFoundExact":true,"docs":[] > }, > "facets":{ > "count":943, > "categories":{ > "buckets":[{ > "val":"x", > "count":943}]}}} > > > > -- > Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: Json Faceting Performance Issues on solr v8.7.0
Ok. I'll try that. Meanwhile query on resultId is subsecond response. But the immediate next query for faceting takes 40+secs. The core has 185million docs and 63GB index size. curl 'http://localhost:8983/solr/TestCollection_shard1_replica_t3/query?q=resultId:x&rows=0' { "responseHeader":{ "zkConnected":true, "status":0, "QTime":558, "params":{ "q":"resultId:x", "cache":"false", "rows":"0"}}, "response":{"numFound":943,"start":0,"numFoundExact":true,"docs":[] }} curl 'http://localhost:8983/solr/TestCollection_shard1_replica_t3/query?q=resultId:x&rows=0' -d ' json.facet={ categories:{ "type": "terms", "field" : "resultId", "limit" : 1 } }' { "responseHeader":{ "zkConnected":true, "status":0, "QTime":43834, "params":{ "q":"resultId:x", "json.facet":"{\ncategories:{\n \"type\": \"terms\",\n \"field\" : \"resultId\",\n \"limit\" : 1\n}\n}", "cache":"false", "rows":"0"}}, "response":{"numFound":943,"start":0,"numFoundExact":true,"docs":[] }, "facets":{ "count":943, "categories":{ "buckets":[{ "val":"x", "count":943}]}}} -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Json Faceting Performance Issues on solr v8.7.0
`resultId` sounds like it might be a relatively high-cardinality field (lots of unique values)? What's your number of shards, and replicas per shard? SOLR-15008 (note: not a bug) describes a situation that may be fundamentally similar to yours (though to be sure it's impossible to say for sure without more information): https://issues.apache.org/jira/browse/SOLR-15008?focusedCommentId=17236213#comment-17236213 In particular, the explanation and troubleshooting advice on the linked comment might be relevant? "dvhash" is _not_ mentioned on that SOLR-15008, but if the `processId` main query significantly reduces the domain -- or more specifically, if `resultId` is high-cardinality overall, but the cardinality of `resultId` values _associated with a particular query_ is low -- you might consider trying `"method"="dvhash"` (which should bypass OrdinalMap creation and array allocation, if either/both of those contribute to the latency you're finding). Michael On Fri, Feb 5, 2021 at 4:42 PM mmb1234 wrote: > Hello, > > I am seeing very slow response from json faceting against a single core > (though core is shard leader in a collection). > > Fields processId and resultId are non-multivalued, indexed and docvalues > string (not text). > > Soft Commit = 5sec (opensearcher=true) and Hard Commit = 10sec because new > docs are constantly being indexed with 95% new and 5% overwritten > (overwrite=true; no atomic update). Caches are not considered useful due to > commit frequency. > > Solr is v8.7.0 on openjdk11. > > Is there any way to improve json facet QTime? > > ## query only > curl > ' > http://localhost:8983/solr/TestCollection_shard1_replica_t3/query?q=processId:-xxx-xxx-xxx-x&rows=0 > ' > -d ' > { > "responseHeader":{ > "zkConnected":true, > "status":0, > "QTime":552, > "params":{ > "q":"processId:-xxx-xxx-xxx-x", > "cache":"false", > "rows":"0"}}, > "response":{"numFound":231311,"start":0,"numFoundExact":true,"docs":[] > }} > > ## json facet takes 46secs > curl > ' > http://localhost:8983/solr/TestCollection_shard1_replica_t3/query?q=processId:-xxx-xxx-xxx-x&rows=0 > ' > -d ' > json.facet={ > categories:{ > "type": "terms", > "field" : "resultId", > "limit" : 1 > } > }' > { > "responseHeader":{ > "zkConnected":true, > "status":0, > "QTime":46972, > "params":{ > "q":"processId:-xxx-xxx-xxx-x", > "json.facet":"{categories:{ \"type\": \"terms\", > \"field\" : \"resultId\", \"limit\" : 1}}", > "rows":"0"}}, > "response":{"numFound":231311,"start":0,"numFoundExact":true,"docs":[] > }, > "facets":{ > "count":231311, > "categories":{ > "buckets":[{ > "val":"x", > "count":943}]}}} > > > ## visualvm CPU sampling almost all time spent in lucene: > > org.apache.lucene.util.PriorityQueue.downHeap() 23,009 ms > > org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$TermsDict.next() > 13,268 ms > > > > -- > Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html >