[ https://issues.apache.org/jira/browse/SOLR-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160658#comment-14160658 ]
Hoss Man commented on SOLR-6351: -------------------------------- I haven't had a chance to look at the recent patches, and i probably won't today, but i wanted to post some quick replies to a few comments/questions... ---- bq. Also reverted the hasValues logic to replace it with checking if current pivot has positive count. Although it does produce some stats entries with Infinity minimum/maximum and NaN mean. This is what I was asking about before, I think I misunderstood the answer, but it still seems error-prone to have such entries... You may be right ... lemme talk through my thinking on this and see where we wind up: Forgeting about pivots for a moment, and just think about the idea of computing some arbitrary stat over a set of documents. ie: you've got N documents that match some arbitrary query, and then you want to compute stats on the "price" and "popularity" fields .. what should happen if none of those documents has a "popularity" field at all? my thinking, was that: * the behavior when hanging stats off pivots should mirror that of regular stats * if you ask for a stats block , you should always get that block, so the client doesn't have to conditionally check if it's there before looking at the values. * the included stat values matter even if no doc has the stats.field, becuase one of the stats is in fact "missing" and that if you ask for stats, you should be able to look at that missing count. (and it should match up with your doc set size if the field is completely missing, etc...) but looking at an example of this now, i see that for simple field stats (w/o pivots), that's not even how it currently works -- consider this URL using hte example data... http://localhost:8983/solr/select?rows=0&q=name:utf&stats.field=foo_d&stats.field=popularity&stats=true * foo_d doesn't exist in the index. * popularity does exist in the index, but the one doc matching the query doesn't have a value in that field. I thought (and still think) that the "correct" behavior for this query would be to get a stats block back for those fields where things like min/max/mean are "null", count==0, and missing=1 ... but that's not how it currently works. so i guess the question is: is the current general stats behavior a "bug" that should be fixed, or is this the "correct" way to deal with stats when none of the documents have a value (and thus: the behavior of your "hasValue" logic was correct) ? i'm leaning towards" bug" ... but i'd like to think about it more and hear how others feel.. ---- bq. One more question around this, which applies for SOLR-6353 and SOLR-4212 as well. Should we have a syntax to apply stats/queries/ranges only at specific levels in the pivot hierarchy? It would reduce amount of computation and size of response for cases where you only need it at a specific level (usually last level I guess). That's a great question ... honestly it's not something i ever really thought about. One quick thing i will point out: the size of the response shouldn't really be a huge factor in our decisions here, because with SOLR-6349 (which i'll hopefully have a patch for in the next day or so) the response will only need to include the stats people actually care about, andsask for, so the typical result size should be much smaller. But you've got a good point about amount of computation done/returned at levels that people may not care about ... in my head, it seemed to make sense that the stats (and ranges, etc...) should be computed at every level just like the pivot count size -- but at the intermediate levels that count is a "free" computation based on the size of the subset, and but i suspect you are correct: may people may only care about having these new stats/ranges/query on the leaves in the common case. I'm not really following your suggested syntax though ... you seem to be saying that in the "stats" local param, commas would be used to delimit "levels" of the pivot (corresponding to the commas in the list of pivot fields) but then i'm not really clear what you mean about using "\*" (if that means all levels, how do you know what tag name to use? in the original examples i porposed, i was thinking that a comma seperated list could refer to multiple tag names, wimilar to how the "exlcusions" work -- ie.. {noformat} facet.pivot={!stats=prices,ratings}category,manufacturer facet.pivot={!stats=prices,pop}reseller stats.field={!key=avg_list_price tag=prices mean=true}list_price stats.field={!tag=ratings min=true max=true}user_rating stats.field={!tag=ratings min=true max=true}editors_rating stats.field={!tag=prices min=true max=true}sale_price stats.field={!tag=pop}weekly_tweets stats.field={!tag=pop}weekly_page_views {noformat} ...would result in the "category,manufacturer" pivot having stats on "avg_list_price, sale_price, user_rating, & editors_rating" while the "reseller" pivot would have stats on "avg_list_price, sale_price, weekly_tweets, & weekly_page_views" Thinking about it now though, if we support multiple tag names on stats.field, the same thing could be supported like this... {noformat} facet.pivot={!stats=cm_s}category,manufacturer facet.pivot={!stats=r_s}reseller stats.field={!key=avg_list_price tag=cm_s,r_s mean=true}list_price stats.field={!tag=cm_s min=true max=true}user_rating stats.field={!tag=cm_s min=true max=true}editors_rating stats.field={!tag=cm_s,r_s min=true max=true}sale_price stats.field={!tag=r_s}weekly_tweets stats.field={!tag=r_s}weekly_page_views {noformat} So ... if we did that, then we could start using "position" info in a comma seperated list of tag names to refer to where in the pivot "depth" those stats/ranges/queries should be computed ... the question i have is "should we" ? .. in the context of a facet.pivot param, will it be obvious to folks that there is a maping between the commas in these local params and hte commas in hte bod of the facet.pivot param, or will it confuse people who are use to seeing comma as just a way of delimiting multiple values in tag/ex params? my opinion: no freaking clue at the moment ... need to let it soak in my brain. > Let Stats Hang off of Pivots (via 'tag') > ---------------------------------------- > > Key: SOLR-6351 > URL: https://issues.apache.org/jira/browse/SOLR-6351 > Project: Solr > Issue Type: Sub-task > Reporter: Hoss Man > Attachments: SOLR-6351.patch, SOLR-6351.patch, SOLR-6351.patch, > SOLR-6351.patch, SOLR-6351.patch, SOLR-6351.patch > > > he goal here is basically flip the notion of "stats.facet" on it's head, so > that instead of asking the stats component to also do some faceting > (something that's never worked well with the variety of field types and has > never worked in distributed mode) we instead ask the PivotFacet code to > compute some stats X for each leaf in a pivot. We'll do this with the > existing {{stats.field}} params, but we'll leverage the {{tag}} local param > of the {{stats.field}} instances to be able to associate which stats we want > hanging off of which {{facet.pivot}} > Example... > {noformat} > facet.pivot={!stats=s1}category,manufacturer > stats.field={!key=avg_price tag=s1 mean=true}price > stats.field={!tag=s1 min=true max=true}user_rating > {noformat} > ...with the request above, in addition to computing the min/max user_rating > and mean price (labeled "avg_price") over the entire result set, the > PivotFacet component will also include those stats for every node of the tree > it builds up when generating a pivot of the fields "category,manufacturer" -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org