[ https://issues.apache.org/jira/browse/SOLR-12343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16536494#comment-16536494 ]
Hoss Man commented on SOLR-12343: --------------------------------- Found one – it seems to be specific to the situation where {{overrequest==0}}, and the facet is nested under another facet? playing the with values of {{top_over}} and {{top_refine}} it doesn't seem to matter if parent facet is refined, but the key is wether the top facet also uses {{overrequest:0}} (fails) or {{overrequest:999}} (passes) {noformat} [junit4] 2> 9990 INFO (qtp1276305453-48) [ x:collection1] o.a.s.c.S.Request [collection1] webapp=/solr path=/select params={df=text&distrib=false&_facet_={}&fl=id&fl=score&shards.purpose=1048580&start=0&fsv=true&shard.url=127.0.0.1:47372/solr/collection1&rows=0&version=2&q=*:*&json.facet={+all:{+type:terms,+field:all_ss,+limit:1,+refine:true,+overrequest:0+++++++,+facet:{+++cat_count:{+type:terms,+field:cat_s,+limit:3,+overrequest:0+++++++++++++++,+refine:true,+sort:'count+asc'+},+++cat_price:{+type:terms,+field:cat_s,+limit:3,+overrequest:0+++++++++++++++,+refine:true,+sort:'sum_p+asc'++++++++++++++++,+facet:+{+sum_p:+'sum(price_i)'+}+}}+}+}&NOW=1531102182236&isShard=true&wt=javabin} hits=9 status=0 QTime=17 [junit4] 2> 9994 INFO (qtp1276305453-49) [ x:collection1] o.a.s.c.S.Request [collection1] webapp=/solr path=/select params={df=text&distrib=false&_facet_={"refine":{"all":{"_p":[["z_all",{"cat_count":{"_l":["A","B","C"]},"cat_price":{"_l":["A","B","C"]}}]]}}}&shards.purpose=2097152&shard.url=127.0.0.1:47372/solr/collection1&rows=0&version=2&q=*:*&json.facet={+all:{+type:terms,+field:all_ss,+limit:1,+refine:true,+overrequest:0+++++++,+facet:{+++cat_count:{+type:terms,+field:cat_s,+limit:3,+overrequest:0+++++++++++++++,+refine:true,+sort:'count+asc'+},+++cat_price:{+type:terms,+field:cat_s,+limit:3,+overrequest:0+++++++++++++++,+refine:true,+sort:'sum_p+asc'++++++++++++++++,+facet:+{+sum_p:+'sum(price_i)'+}+}}+}+}&NOW=1531102182236&isShard=true&facet=false&wt=javabin} hits=9 status=0 QTime=1 [junit4] 2> 9996 INFO (qtp1503674478-65) [ x:collection1] o.a.s.c.S.Request [collection1] webapp=/solr path=/select params={shards=127.0.0.1:54950/solr/collection1,127.0.0.1:47372/solr/collection1,127.0.0.1:52833/solr/collection1&shards=debugQuery&shards=true&q=*:*&json.facet={+all:{+type:terms,+field:all_ss,+limit:1,+refine:true,+overrequest:0+++++++,+facet:{+++cat_count:{+type:terms,+field:cat_s,+limit:3,+overrequest:0+++++++++++++++,+refine:true,+sort:'count+asc'+},+++cat_price:{+type:terms,+field:cat_s,+limit:3,+overrequest:0+++++++++++++++,+refine:true,+sort:'sum_p+asc'++++++++++++++++,+facet:+{+sum_p:+'sum(price_i)'+}+}}+}+}&indent=true&rows=0&wt=json&version=2.2} hits=19 status=0 QTime=25 [junit4] 2> 9997 ERROR (TEST-TestJsonFacetRefinement.testSortedFacetRefinementPushingNonRefinedBucketBackIntoTopN-seed#[775BF43EF8268D50]) [ ] o.a.s.SolrTestCaseHS query failed JSON validation. error=mismatch: 'X'!='C' @ facets/all/buckets/[0]/cat_count/buckets/[2]/val [junit4] 2> expected =facets=={ count: 19,all:{ buckets:[ { val:z_all, count: 19, cat_count:{ buckets:[ {val:A,count:1}, {val:B,count:1}, {val:X,count:4}, ] }, cat_price:{ buckets:[ {val:A,count:1,sum_p:1.0}, {val:B,count:1,sum_p:1.0}, {val:X,count:4,sum_p:4.0}, ] }} ] } } [junit4] 2> response = { [junit4] 2> "responseHeader":{ [junit4] 2> "status":0, [junit4] 2> "QTime":25}, [junit4] 2> "response":{"numFound":19,"start":0,"maxScore":1.0,"docs":[] [junit4] 2> }, [junit4] 2> "facets":{ [junit4] 2> "count":19, [junit4] 2> "all":{ [junit4] 2> "buckets":[{ [junit4] 2> "val":"z_all", [junit4] 2> "count":19, [junit4] 2> "cat_price":{ [junit4] 2> "buckets":[{ [junit4] 2> "val":"A", [junit4] 2> "count":1, [junit4] 2> "sum_p":1.0}, [junit4] 2> { [junit4] 2> "val":"B", [junit4] 2> "count":1, [junit4] 2> "sum_p":1.0}, [junit4] 2> { [junit4] 2> "val":"C", [junit4] 2> "count":6, [junit4] 2> "sum_p":6.0}]}, [junit4] 2> "cat_count":{ [junit4] 2> "buckets":[{ [junit4] 2> "val":"A", [junit4] 2> "count":1}, [junit4] 2> { [junit4] 2> "val":"B", [junit4] 2> "count":1}, [junit4] 2> { [junit4] 2> "val":"C", [junit4] 2> "count":6}]}}]}}} [junit4] 2> [junit4] 2> 10000 INFO (TEST-TestJsonFacetRefinement.testSortedFacetRefinementPushingNonRefinedBucketBackIntoTopN-seed#[775BF43EF8268D50]) [ ] o.a.s.SolrTestCaseJ4 ###Ending testSortedFacetRefinementPushingNonRefinedBucketBackIntoTopN [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestJsonFacetRefinement -Dtests.method=testSortedFacetRefinementPushingNonRefinedBucketBackIntoTopN -Dtests.seed=775BF43EF8268D50 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=pl-PL -Dtests.timezone=Africa/Bamako -Dtests.asserts=true -Dtests.file.encoding=US-ASCII [junit4] ERROR 4.32s | TestJsonFacetRefinement.testSortedFacetRefinementPushingNonRefinedBucketBackIntoTopN <<< [junit4] > Throwable #1: java.lang.RuntimeException: mismatch: 'X'!='C' @ facets/all/buckets/[0]/cat_count/buckets/[2]/val [junit4] > at __randomizedtesting.SeedInfo.seed([775BF43EF8268D50:DB8655EB2671818E]:0) [junit4] > at org.apache.solr.SolrTestCaseHS.matchJSON(SolrTestCaseHS.java:161) [junit4] > at org.apache.solr.SolrTestCaseHS.assertJQ(SolrTestCaseHS.java:143) [junit4] > at org.apache.solr.SolrTestCaseHS$Client$Tester.assertJQ(SolrTestCaseHS.java:255) [junit4] > at org.apache.solr.SolrTestCaseHS$Client.testJQ(SolrTestCaseHS.java:297) [junit4] > at org.apache.solr.search.facet.TestJsonFacetRefinement.testSortedFacetRefinementPushingNonRefinedBucketBackIntoTopN(TestJsonFacetRefinement.java:568) [junit4] > at java.lang.Thread.run(Thread.java:748) [junit4] 2> 10016 INFO (SUITE-TestJsonFacetRefinement-seed#[775BF43EF8268D50]-worker) [ ] o.e.j.s.Abs {noformat} ...i haven't worked through it yet to figure out the problem, but my initial impression is that i made this test too aggressive? I'm not sure it's safe to assert correct results with {{top_over=1}} ... but i'm not sure why it matters what the sub-facet overrequest is in that case? > JSON Field Facet refinement can return incorrect counts/stats for sorted > buckets > -------------------------------------------------------------------------------- > > Key: SOLR-12343 > URL: https://issues.apache.org/jira/browse/SOLR-12343 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Hoss Man > Assignee: Yonik Seeley > Priority: Major > Attachments: SOLR-12343.patch, SOLR-12343.patch, SOLR-12343.patch, > SOLR-12343.patch, SOLR-12343.patch > > > The way JSON Facet's simple refinement "re-sorts" buckets after refinement > can cause _refined_ buckets to be "bumped out" of the topN based on the > refined counts/stats depending on the sort - causing _unrefined_ buckets > originally discounted in phase#2 to bubble up into the topN and be returned > to clients *with inaccurate counts/stats* > The simplest way to demonstrate this bug (in some data sets) is with a > {{sort: 'count asc'}} facet: > * assume shard1 returns termX & termY in phase#1 because they have very low > shard1 counts > ** but *not* returned at all by shard2, because these terms both have very > high shard2 counts. > * Assume termX has a slightly lower shard1 count then termY, such that: > ** termX "makes the cut" off for the limit=N topN buckets > ** termY does not make the cut, and is the "N+1" known bucket at the end of > phase#1 > * termX then gets included in the phase#2 refinement request against shard2 > ** termX now has a much higher _known_ total count then termY > ** the coordinator now sorts termX "worse" in the sorted list of buckets > then termY > ** which causes termY to bubble up into the topN > * termY is ultimately included in the final result _with incomplete > count/stat/sub-facet data_ instead of termX > ** this is all indepenent of the possibility that termY may actually have a > significantly higher total count then termX across the entire collection > ** the key problem is that all/most of the other terms returned to the > client have counts/stats that are the cumulation of all shards, but termY > only has the contributions from shard1 > Important Notes: > * This scenerio can happen regardless of the amount of overrequest used. > Additional overrequest just increases the number of "extra" terms needed in > the index with "better" sort values then termX & termY in shard2 > * {{sort: 'count asc'}} is not just an exceptional/pathelogical case: > ** any function sort where additional data provided shards during refinement > can cause a bucket to "sort worse" can also cause this problem. > ** Examples: {{sum(price_i) asc}} , {{min(price_i) desc}} , {{avg(price_i) > asc|desc}} , etc... -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org