[jira] [Commented] (SOLR-11706) JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields

2017-12-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288088#comment-16288088
 ] 

ASF subversion and git services commented on SOLR-11706:


Commit 2990c88a927213177483b61fe8e6971df04fc3ed in lucene-solr's branch 
refs/heads/master from Chris Hostetter
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2990c88 ]

Beef up testing of json.facet 'refine:simple' when dealing with 'Long Tail' 
terms

In an attempt to get more familiar with json.facet refinement, I set out to try 
and refactor/generalize/clone
some of the existing facet.pivot refinement tests to assert that json.facet 
could produce the same results.
This test is a baby step towards doing that: Cloning 
DistributedFacetPivotLongTailTest into
DistributedFacetSimpleRefinementLongTailTest (with shared index building code).

Along the way, I learned that the core logic of 'refine:simple' is actually 
quite different then how facet.field
& facet.pivot work (see discussion in SOLR-11733), so they do *NOT* produce the 
same results in many "Long Tail"
Sitautions.  As a result, many of the logic/assertions 
inDistributedFacetSimpleRefinementLongTailTest are very
differnet then their counter parts in DistributedFacetPivotLongTailTest, with 
detailed explanations in comments.

Hopefully this test will prove useful down the road to anyone who might want to 
compare/contrast facet.pivot
with json.facet, and to prevent regressions in 'refine:simple' if/when we add 
more complex refinement
approaches in the future.

There are also a few TODOs in the test related to some other small 
discrepencies between json.facet and
stats.field that I opened along the way, indicating where the tests should be 
modified once those issues are
addressed in json.facet...

 - SOLR-11706: support for multivalued numeric fields in stats
 - SOLR-11695: support for 'missing()' & 'num_vals()' (aka: 'count' from 
stats.field) numeric stats
 - SOLR-11725: switch from 'uncorrected stddev' to 'corrected stddev'


> JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields
> ---
>
> Key: SOLR-11706
> URL: https://issues.apache.org/jira/browse/SOLR-11706
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
> Attachments: SOLR-11706.patch
>
>
> While trying to write some tests demonstrating equivalences between the 
> StatsComponent and the JSON FacetModule i discovered that the FacetModules 
> stat functions (min, max, etc...) don't seem to work on multivalued fields.
> Based on the stack traces, i gather the problem is because the FacetModule 
> seems to rely exclusively on using the "Function" parsers to get a value 
> source -- apparently w/o any other method of accumulating numeric stats from 
> multivalued (numeric) DocValues?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11706) JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields

2017-12-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288084#comment-16288084
 ] 

ASF subversion and git services commented on SOLR-11706:


Commit 53f2d4aa3aa171d5f37284eba9ca56d987729796 in lucene-solr's branch 
refs/heads/branch_7x from Chris Hostetter
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=53f2d4a ]

Beef up testing of json.facet 'refine:simple' when dealing with 'Long Tail' 
terms

In an attempt to get more familiar with json.facet refinement, I set out to try 
and refactor/generalize/clone
some of the existing facet.pivot refinement tests to assert that json.facet 
could produce the same results.
This test is a baby step towards doing that: Cloning 
DistributedFacetPivotLongTailTest into
DistributedFacetSimpleRefinementLongTailTest (with shared index building code).

Along the way, I learned that the core logic of 'refine:simple' is actually 
quite different then how facet.field
& facet.pivot work (see discussion in SOLR-11733), so they do *NOT* produce the 
same results in many "Long Tail"
Sitautions.  As a result, many of the logic/assertions 
inDistributedFacetSimpleRefinementLongTailTest are very
differnet then their counter parts in DistributedFacetPivotLongTailTest, with 
detailed explanations in comments.

Hopefully this test will prove useful down the road to anyone who might want to 
compare/contrast facet.pivot
with json.facet, and to prevent regressions in 'refine:simple' if/when we add 
more complex refinement
approaches in the future.

There are also a few TODOs in the test related to some other small 
discrepencies between json.facet and
stats.field that I opened along the way, indicating where the tests should be 
modified once those issues are
addressed in json.facet...

 - SOLR-11706: support for multivalued numeric fields in stats
 - SOLR-11695: support for 'missing()' & 'num_vals()' (aka: 'count' from 
stats.field) numeric stats
 - SOLR-11725: switch from 'uncorrected stddev' to 'corrected stddev'

(cherry picked from commit 2990c88a927213177483b61fe8e6971df04fc3ed)


> JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields
> ---
>
> Key: SOLR-11706
> URL: https://issues.apache.org/jira/browse/SOLR-11706
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
> Attachments: SOLR-11706.patch
>
>
> While trying to write some tests demonstrating equivalences between the 
> StatsComponent and the JSON FacetModule i discovered that the FacetModules 
> stat functions (min, max, etc...) don't seem to work on multivalued fields.
> Based on the stack traces, i gather the problem is because the FacetModule 
> seems to rely exclusively on using the "Function" parsers to get a value 
> source -- apparently w/o any other method of accumulating numeric stats from 
> multivalued (numeric) DocValues?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11706) JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields

2017-12-05 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16278490#comment-16278490
 ] 

Yonik Seeley commented on SOLR-11706:
-

bq. I'd prefer this interface and some of the related methods on FunctionValues 
that take arrays be deprecated out of Lucene.
Yeah, I agree. I've always tried to avoid building on the array based methods 
because it felt like we needed something better for multiValued fields & 
functions.

> JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields
> ---
>
> Key: SOLR-11706
> URL: https://issues.apache.org/jira/browse/SOLR-11706
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
> Attachments: SOLR-11706.patch
>
>
> While trying to write some tests demonstrating equivalences between the 
> StatsComponent and the JSON FacetModule i discovered that the FacetModules 
> stat functions (min, max, etc...) don't seem to work on multivalued fields.
> Based on the stack traces, i gather the problem is because the FacetModule 
> seems to rely exclusively on using the "Function" parsers to get a value 
> source -- apparently w/o any other method of accumulating numeric stats from 
> multivalued (numeric) DocValues?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11706) JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields

2017-12-04 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16278002#comment-16278002
 ] 

David Smiley commented on SOLR-11706:
-

I just want to point out that "multi-valued functions" in fact exist -- 
{{org.apache.lucene.queries.function.valuesource.MultiValueSource}}.  I'm not a 
fan -- the API feels awkward to me, but there it is.  We pretty much only _use_ 
it today for some legacy-ish spatial stuff.  I'd prefer this interface and some 
of the related methods on FunctionValues that take arrays be deprecated out of 
Lucene.

> JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields
> ---
>
> Key: SOLR-11706
> URL: https://issues.apache.org/jira/browse/SOLR-11706
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
> Attachments: SOLR-11706.patch
>
>
> While trying to write some tests demonstrating equivalences between the 
> StatsComponent and the JSON FacetModule i discovered that the FacetModules 
> stat functions (min, max, etc...) don't seem to work on multivalued fields.
> Based on the stack traces, i gather the problem is because the FacetModule 
> seems to rely exclusively on using the "Function" parsers to get a value 
> source -- apparently w/o any other method of accumulating numeric stats from 
> multivalued (numeric) DocValues?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11706) JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields

2017-11-30 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16272999#comment-16272999
 ] 

Hoss Man commented on SOLR-11706:
-

bq. ... I was pointing out how other stats could do the same thing.

Oh, oh ... i'm sorry, i understand now:  Some of the ground work has already 
been laid in MinMax, and similar work could be done in other aggs.  Got it.

> JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields
> ---
>
> Key: SOLR-11706
> URL: https://issues.apache.org/jira/browse/SOLR-11706
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
> Attachments: SOLR-11706.patch
>
>
> While trying to write some tests demonstrating equivalences between the 
> StatsComponent and the JSON FacetModule i discovered that the FacetModules 
> stat functions (min, max, etc...) don't seem to work on multivalued fields.
> Based on the stack traces, i gather the problem is because the FacetModule 
> seems to rely exclusively on using the "Function" parsers to get a value 
> source -- apparently w/o any other method of accumulating numeric stats from 
> multivalued (numeric) DocValues?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11706) JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields

2017-11-29 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16272091#comment-16272091
 ] 

Yonik Seeley commented on SOLR-11706:
-

I was just trying to point out that it's a "yeah, that's not implemented yet" 
rather than "what the heck is wrong... I'll dig into it" situation.

bq. Well ... presumably, in the absence of any official documentation (yet)

Here's what we have so far:
https://builds.apache.org/view/L/view/Lucene/job/Solr-reference-guide-master/javadoc/json-facet-api.html#AggregationFunctions

bq. I'm not really following what your point about MinMaxAgg is.

If one goes about implementing support for avg(multivalued_field), then the 
first issue one will run up against is that the function parser will fail 
because of the generic value source check for single valued fields.  min() and 
max() have already gotten around this issue, and I was pointing out how other 
stats could do the same thing.

bq.
I'm not really sure what it would mean to "care about ... multi-valued 
functions" – AFAIK we've never had any multivalued functions? .. are you just 
hypothosising that maybe someday we could?

Yes, IMO we already need them.  There are multiple ways to handle multi-valued 
fields and we don't support that well anywhere.


> JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields
> ---
>
> Key: SOLR-11706
> URL: https://issues.apache.org/jira/browse/SOLR-11706
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
> Attachments: SOLR-11706.patch
>
>
> While trying to write some tests demonstrating equivalences between the 
> StatsComponent and the JSON FacetModule i discovered that the FacetModules 
> stat functions (min, max, etc...) don't seem to work on multivalued fields.
> Based on the stack traces, i gather the problem is because the FacetModule 
> seems to rely exclusively on using the "Function" parsers to get a value 
> source -- apparently w/o any other method of accumulating numeric stats from 
> multivalued (numeric) DocValues?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11706) JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields

2017-11-29 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16271695#comment-16271695
 ] 

Yonik Seeley commented on SOLR-11706:
-

Perhaps a bug at the user level, but more of a "not implemented yet" at the 
development level.

bq. Based on the stack traces, i gather the problem is because the FacetModule 
seems to rely exclusively on using the "Function" parsers to get a value source 
– apparently w/o any other method of accumulating numeric stats from 
multivalued (numeric) DocValues?

That was the original reason.  As part of SOLR-11317 I added a bit of a hacky 
way to support a function or a bare field name (w/o trying to make the field 
name into a value source).
min/max parsers currently use this:
{code}
addParser("agg_min", new ValueSourceParser() {
  @Override
  public ValueSource parse(FunctionQParser fp) throws SyntaxError {
return new MinMaxAgg("min", 
fp.parseValueSource(FunctionQParser.FLAG_DEFAULT | 
FunctionQParser.FLAG_USE_FIELDNAME_SOURCE));
  }
});
{code}

Now in MinMaxAgg, we deal with fields separately from functions and throw an 
exception for a multivalued field since there is no implementation yet:
{code}
  if (sf.multiValued() || sf.getType().multiValuedFieldCache()) {
vs = null;
throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "min/max 
aggregations can't be used on multi-valued field " + field);
{code}

We could either:
 - use the same strategy for all the stats (fine if we only care about 
multi-valued fields and not multi-valued functions)
 - fix ValueSource so that it can be truly multi-valued and use that


> JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields
> ---
>
> Key: SOLR-11706
> URL: https://issues.apache.org/jira/browse/SOLR-11706
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>
> While trying to write some tests demonstrating equivalences between the 
> StatsComponent and the JSON FacetModule i discovered that the FacetModules 
> stat functions (min, max, etc...) don't seem to work on multivalued fields.
> Based on the stack traces, i gather the problem is because the FacetModule 
> seems to rely exclusively on using the "Function" parsers to get a value 
> source -- apparently w/o any other method of accumulating numeric stats from 
> multivalued (numeric) DocValues?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11706) JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields

2017-11-29 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16271585#comment-16271585
 ] 

Hoss Man commented on SOLR-11706:
-

Trivial steps to reproduce...

{noformat}
bin/solr -e techproducts
...
curl -H 'Content-Type: application/json' --data-binary 
'[{"id":"x","foo_is":42,"foo_is":666},{"id":"y","foo_is":55}'] 
'http://localhost:8983/solr/techproducts/update?commit=true'
...
{noformat}

Note that {{stats.field}} has no problems with {{foo_is}}...
{noformat}
curl 
'http://localhost:8983/solr/techproducts/query?&stats=true&stats.field=foo_is&q=*:*&rows=0&omitHeader=true'
{
  "response":{"numFound":34,"start":0,"docs":[]
  },
  "stats":{
"stats_fields":{
  "foo_is":{
"min":42.0,
"max":666.0,
"count":3,
"missing":32,
"sum":763.0,
"sumOfSquares":448345.0,
"mean":254.34,
"stddev":356.5730406709589
{noformat}

But the JSON FacetModule can't compute similar stats...
{noformat}
curl http://localhost:8983/solr/techproducts/query -d 
'q=*:*&rows=0&omitHeader=true&json.facet=
{ min:"min(foo_is)", max:"max(foo_is)", sum:"sum(foo_is)",
  // count and missing not supported, see SOLR-11695
  sumOfSquares:"sumsq(foo_is)", mean:"avg(foo_is)", stddev:"stddev(foo_is)"
}'
{
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"can not use FieldCache on multivalued field: foo_is",
"code":400}}
{noformat}

stack trace from logs...

{noformat}
ERROR - 2017-11-29 21:40:30.417; [   x:techproducts] 
org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: can 
not use FieldCache on multivalued field: foo_is
at 
org.apache.solr.schema.SchemaField.checkFieldCacheSource(SchemaField.java:190)
at 
org.apache.solr.schema.IntPointField.getValueSource(IntPointField.java:149)
at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:384)
at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:237)
at 
org.apache.solr.search.ValueSourceParser$86.parse(ValueSourceParser.java:977)
at 
org.apache.solr.search.FunctionQParser.parseAgg(FunctionQParser.java:421)
at 
org.apache.solr.search.facet.FacetParser.parseStringStat(FacetRequest.java:429)
at 
org.apache.solr.search.facet.FacetParser.parseStringFacetOrStat(FacetRequest.java:422)
at 
org.apache.solr.search.facet.FacetParser.parseFacetOrStat(FacetRequest.java:352)
at 
org.apache.solr.search.facet.FacetParser.parseSubs(FacetRequest.java:332)
at 
org.apache.solr.search.facet.FacetTopParser.parse(FacetRequest.java:601)
at 
org.apache.solr.search.facet.FacetTopParser.parse(FacetRequest.java:590)
at 
org.apache.solr.search.facet.FacetModule.prepare(FacetModule.java:102)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:269)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2484)
 
{noformat}


(some other testing suggests that this problem exists regardless of whether 
TrieInt or IntPoint fields are used ... i didn't explicitly test 
float/long/double/etc... but based on a quick glance at the code i don't see 
any reason why they wouldn't all be equally affected)

> JSON FacetModule can't compute stats (min,max,etc...) on multivalued fields
> ---
>
> Key: SOLR-11706
> URL: https://issues.apache.org/jira/browse/SOLR-11706
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>
> While trying to write some tests demonstrating equivalences between the 
> StatsComponent and the JSON FacetModule i discovered that the FacetModules 
> stat functions (min, max, etc...) don't seem to work on multivalued fields.
> Based on the stack traces, i gather the problem is because the FacetModule 
> seems to rely exclusively on using the "Function" parsers to get a value 
> source -- apparently w/o any other method of accumulating numeric stats from 
> multivalued (numeric) DocValues?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org