[jira] [Commented] (SOLR-11706) JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields
[ https://issues.apache.org/jira/browse/SOLR-11706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288088#comment-16288088 ] ASF subversion and git services commented on SOLR-11706: Commit 2990c88a927213177483b61fe8e6971df04fc3ed in lucene-solr's branch refs/heads/master from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2990c88 ] Beef up testing of json.facet 'refine:simple' when dealing with 'Long Tail' terms In an attempt to get more familiar with json.facet refinement, I set out to try and refactor/generalize/clone some of the existing facet.pivot refinement tests to assert that json.facet could produce the same results. This test is a baby step towards doing that: Cloning DistributedFacetPivotLongTailTest into DistributedFacetSimpleRefinementLongTailTest (with shared index building code). Along the way, I learned that the core logic of 'refine:simple' is actually quite different then how facet.field & facet.pivot work (see discussion in SOLR-11733), so they do *NOT* produce the same results in many "Long Tail" Sitautions. As a result, many of the logic/assertions inDistributedFacetSimpleRefinementLongTailTest are very differnet then their counter parts in DistributedFacetPivotLongTailTest, with detailed explanations in comments. Hopefully this test will prove useful down the road to anyone who might want to compare/contrast facet.pivot with json.facet, and to prevent regressions in 'refine:simple' if/when we add more complex refinement approaches in the future. There are also a few TODOs in the test related to some other small discrepencies between json.facet and stats.field that I opened along the way, indicating where the tests should be modified once those issues are addressed in json.facet... - SOLR-11706: support for multivalued numeric fields in stats - SOLR-11695: support for 'missing()' & 'num_vals()' (aka: 'count' from stats.field) numeric stats - SOLR-11725: switch from 'uncorrected stddev' to 'corrected stddev' > JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields > --- > > Key: SOLR-11706 > URL: https://issues.apache.org/jira/browse/SOLR-11706 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man > Attachments: SOLR-11706.patch > > > While trying to write some tests demonstrating equivalences between the > StatsComponent and the JSON FacetModule i discovered that the FacetModules > stat functions (min, max, etc...) don't seem to work on multivalued fields. > Based on the stack traces, i gather the problem is because the FacetModule > seems to rely exclusively on using the "Function" parsers to get a value > source -- apparently w/o any other method of accumulating numeric stats from > multivalued (numeric) DocValues? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11706) JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields
[ https://issues.apache.org/jira/browse/SOLR-11706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288084#comment-16288084 ] ASF subversion and git services commented on SOLR-11706: Commit 53f2d4aa3aa171d5f37284eba9ca56d987729796 in lucene-solr's branch refs/heads/branch_7x from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=53f2d4a ] Beef up testing of json.facet 'refine:simple' when dealing with 'Long Tail' terms In an attempt to get more familiar with json.facet refinement, I set out to try and refactor/generalize/clone some of the existing facet.pivot refinement tests to assert that json.facet could produce the same results. This test is a baby step towards doing that: Cloning DistributedFacetPivotLongTailTest into DistributedFacetSimpleRefinementLongTailTest (with shared index building code). Along the way, I learned that the core logic of 'refine:simple' is actually quite different then how facet.field & facet.pivot work (see discussion in SOLR-11733), so they do *NOT* produce the same results in many "Long Tail" Sitautions. As a result, many of the logic/assertions inDistributedFacetSimpleRefinementLongTailTest are very differnet then their counter parts in DistributedFacetPivotLongTailTest, with detailed explanations in comments. Hopefully this test will prove useful down the road to anyone who might want to compare/contrast facet.pivot with json.facet, and to prevent regressions in 'refine:simple' if/when we add more complex refinement approaches in the future. There are also a few TODOs in the test related to some other small discrepencies between json.facet and stats.field that I opened along the way, indicating where the tests should be modified once those issues are addressed in json.facet... - SOLR-11706: support for multivalued numeric fields in stats - SOLR-11695: support for 'missing()' & 'num_vals()' (aka: 'count' from stats.field) numeric stats - SOLR-11725: switch from 'uncorrected stddev' to 'corrected stddev' (cherry picked from commit 2990c88a927213177483b61fe8e6971df04fc3ed) > JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields > --- > > Key: SOLR-11706 > URL: https://issues.apache.org/jira/browse/SOLR-11706 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man > Attachments: SOLR-11706.patch > > > While trying to write some tests demonstrating equivalences between the > StatsComponent and the JSON FacetModule i discovered that the FacetModules > stat functions (min, max, etc...) don't seem to work on multivalued fields. > Based on the stack traces, i gather the problem is because the FacetModule > seems to rely exclusively on using the "Function" parsers to get a value > source -- apparently w/o any other method of accumulating numeric stats from > multivalued (numeric) DocValues? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11706) JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields
[ https://issues.apache.org/jira/browse/SOLR-11706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16278490#comment-16278490 ] Yonik Seeley commented on SOLR-11706: - bq. I'd prefer this interface and some of the related methods on FunctionValues that take arrays be deprecated out of Lucene. Yeah, I agree. I've always tried to avoid building on the array based methods because it felt like we needed something better for multiValued fields & functions. > JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields > --- > > Key: SOLR-11706 > URL: https://issues.apache.org/jira/browse/SOLR-11706 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man > Attachments: SOLR-11706.patch > > > While trying to write some tests demonstrating equivalences between the > StatsComponent and the JSON FacetModule i discovered that the FacetModules > stat functions (min, max, etc...) don't seem to work on multivalued fields. > Based on the stack traces, i gather the problem is because the FacetModule > seems to rely exclusively on using the "Function" parsers to get a value > source -- apparently w/o any other method of accumulating numeric stats from > multivalued (numeric) DocValues? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11706) JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields
[ https://issues.apache.org/jira/browse/SOLR-11706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16278002#comment-16278002 ] David Smiley commented on SOLR-11706: - I just want to point out that "multi-valued functions" in fact exist -- {{org.apache.lucene.queries.function.valuesource.MultiValueSource}}. I'm not a fan -- the API feels awkward to me, but there it is. We pretty much only _use_ it today for some legacy-ish spatial stuff. I'd prefer this interface and some of the related methods on FunctionValues that take arrays be deprecated out of Lucene. > JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields > --- > > Key: SOLR-11706 > URL: https://issues.apache.org/jira/browse/SOLR-11706 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man > Attachments: SOLR-11706.patch > > > While trying to write some tests demonstrating equivalences between the > StatsComponent and the JSON FacetModule i discovered that the FacetModules > stat functions (min, max, etc...) don't seem to work on multivalued fields. > Based on the stack traces, i gather the problem is because the FacetModule > seems to rely exclusively on using the "Function" parsers to get a value > source -- apparently w/o any other method of accumulating numeric stats from > multivalued (numeric) DocValues? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11706) JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields
[ https://issues.apache.org/jira/browse/SOLR-11706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16272999#comment-16272999 ] Hoss Man commented on SOLR-11706: - bq. ... I was pointing out how other stats could do the same thing. Oh, oh ... i'm sorry, i understand now: Some of the ground work has already been laid in MinMax, and similar work could be done in other aggs. Got it. > JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields > --- > > Key: SOLR-11706 > URL: https://issues.apache.org/jira/browse/SOLR-11706 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man > Attachments: SOLR-11706.patch > > > While trying to write some tests demonstrating equivalences between the > StatsComponent and the JSON FacetModule i discovered that the FacetModules > stat functions (min, max, etc...) don't seem to work on multivalued fields. > Based on the stack traces, i gather the problem is because the FacetModule > seems to rely exclusively on using the "Function" parsers to get a value > source -- apparently w/o any other method of accumulating numeric stats from > multivalued (numeric) DocValues? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11706) JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields
[ https://issues.apache.org/jira/browse/SOLR-11706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16272091#comment-16272091 ] Yonik Seeley commented on SOLR-11706: - I was just trying to point out that it's a "yeah, that's not implemented yet" rather than "what the heck is wrong... I'll dig into it" situation. bq. Well ... presumably, in the absence of any official documentation (yet) Here's what we have so far: https://builds.apache.org/view/L/view/Lucene/job/Solr-reference-guide-master/javadoc/json-facet-api.html#AggregationFunctions bq. I'm not really following what your point about MinMaxAgg is. If one goes about implementing support for avg(multivalued_field), then the first issue one will run up against is that the function parser will fail because of the generic value source check for single valued fields. min() and max() have already gotten around this issue, and I was pointing out how other stats could do the same thing. bq. I'm not really sure what it would mean to "care about ... multi-valued functions" – AFAIK we've never had any multivalued functions? .. are you just hypothosising that maybe someday we could? Yes, IMO we already need them. There are multiple ways to handle multi-valued fields and we don't support that well anywhere. > JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields > --- > > Key: SOLR-11706 > URL: https://issues.apache.org/jira/browse/SOLR-11706 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man > Attachments: SOLR-11706.patch > > > While trying to write some tests demonstrating equivalences between the > StatsComponent and the JSON FacetModule i discovered that the FacetModules > stat functions (min, max, etc...) don't seem to work on multivalued fields. > Based on the stack traces, i gather the problem is because the FacetModule > seems to rely exclusively on using the "Function" parsers to get a value > source -- apparently w/o any other method of accumulating numeric stats from > multivalued (numeric) DocValues? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11706) JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields
[ https://issues.apache.org/jira/browse/SOLR-11706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16271695#comment-16271695 ] Yonik Seeley commented on SOLR-11706: - Perhaps a bug at the user level, but more of a "not implemented yet" at the development level. bq. Based on the stack traces, i gather the problem is because the FacetModule seems to rely exclusively on using the "Function" parsers to get a value source – apparently w/o any other method of accumulating numeric stats from multivalued (numeric) DocValues? That was the original reason. As part of SOLR-11317 I added a bit of a hacky way to support a function or a bare field name (w/o trying to make the field name into a value source). min/max parsers currently use this: {code} addParser("agg_min", new ValueSourceParser() { @Override public ValueSource parse(FunctionQParser fp) throws SyntaxError { return new MinMaxAgg("min", fp.parseValueSource(FunctionQParser.FLAG_DEFAULT | FunctionQParser.FLAG_USE_FIELDNAME_SOURCE)); } }); {code} Now in MinMaxAgg, we deal with fields separately from functions and throw an exception for a multivalued field since there is no implementation yet: {code} if (sf.multiValued() || sf.getType().multiValuedFieldCache()) { vs = null; throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "min/max aggregations can't be used on multi-valued field " + field); {code} We could either: - use the same strategy for all the stats (fine if we only care about multi-valued fields and not multi-valued functions) - fix ValueSource so that it can be truly multi-valued and use that > JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields > --- > > Key: SOLR-11706 > URL: https://issues.apache.org/jira/browse/SOLR-11706 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man > > While trying to write some tests demonstrating equivalences between the > StatsComponent and the JSON FacetModule i discovered that the FacetModules > stat functions (min, max, etc...) don't seem to work on multivalued fields. > Based on the stack traces, i gather the problem is because the FacetModule > seems to rely exclusively on using the "Function" parsers to get a value > source -- apparently w/o any other method of accumulating numeric stats from > multivalued (numeric) DocValues? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11706) JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields
[ https://issues.apache.org/jira/browse/SOLR-11706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16271585#comment-16271585 ] Hoss Man commented on SOLR-11706: - Trivial steps to reproduce... {noformat} bin/solr -e techproducts ... curl -H 'Content-Type: application/json' --data-binary '[{"id":"x","foo_is":42,"foo_is":666},{"id":"y","foo_is":55}'] 'http://localhost:8983/solr/techproducts/update?commit=true' ... {noformat} Note that {{stats.field}} has no problems with {{foo_is}}... {noformat} curl 'http://localhost:8983/solr/techproducts/query?&stats=true&stats.field=foo_is&q=*:*&rows=0&omitHeader=true' { "response":{"numFound":34,"start":0,"docs":[] }, "stats":{ "stats_fields":{ "foo_is":{ "min":42.0, "max":666.0, "count":3, "missing":32, "sum":763.0, "sumOfSquares":448345.0, "mean":254.34, "stddev":356.5730406709589 {noformat} But the JSON FacetModule can't compute similar stats... {noformat} curl http://localhost:8983/solr/techproducts/query -d 'q=*:*&rows=0&omitHeader=true&json.facet= { min:"min(foo_is)", max:"max(foo_is)", sum:"sum(foo_is)", // count and missing not supported, see SOLR-11695 sumOfSquares:"sumsq(foo_is)", mean:"avg(foo_is)", stddev:"stddev(foo_is)" }' { "error":{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","org.apache.solr.common.SolrException"], "msg":"can not use FieldCache on multivalued field: foo_is", "code":400}} {noformat} stack trace from logs... {noformat} ERROR - 2017-11-29 21:40:30.417; [ x:techproducts] org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: can not use FieldCache on multivalued field: foo_is at org.apache.solr.schema.SchemaField.checkFieldCacheSource(SchemaField.java:190) at org.apache.solr.schema.IntPointField.getValueSource(IntPointField.java:149) at org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:384) at org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:237) at org.apache.solr.search.ValueSourceParser$86.parse(ValueSourceParser.java:977) at org.apache.solr.search.FunctionQParser.parseAgg(FunctionQParser.java:421) at org.apache.solr.search.facet.FacetParser.parseStringStat(FacetRequest.java:429) at org.apache.solr.search.facet.FacetParser.parseStringFacetOrStat(FacetRequest.java:422) at org.apache.solr.search.facet.FacetParser.parseFacetOrStat(FacetRequest.java:352) at org.apache.solr.search.facet.FacetParser.parseSubs(FacetRequest.java:332) at org.apache.solr.search.facet.FacetTopParser.parse(FacetRequest.java:601) at org.apache.solr.search.facet.FacetTopParser.parse(FacetRequest.java:590) at org.apache.solr.search.facet.FacetModule.prepare(FacetModule.java:102) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:269) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2484) {noformat} (some other testing suggests that this problem exists regardless of whether TrieInt or IntPoint fields are used ... i didn't explicitly test float/long/double/etc... but based on a quick glance at the code i don't see any reason why they wouldn't all be equally affected) > JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields > --- > > Key: SOLR-11706 > URL: https://issues.apache.org/jira/browse/SOLR-11706 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man > > While trying to write some tests demonstrating equivalences between the > StatsComponent and the JSON FacetModule i discovered that the FacetModules > stat functions (min, max, etc...) don't seem to work on multivalued fields. > Based on the stack traces, i gather the problem is because the FacetModule > seems to rely exclusively on using the "Function" parsers to get a value > source -- apparently w/o any other method of accumulating numeric stats from > multivalued (numeric) DocValues? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org