[ https://issues.apache.org/jira/browse/SOLR-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584299#comment-14584299 ]
Hoss Man commented on SOLR-4212: -------------------------------- Ok, I managed to review a little over half of the the current patch (trying to start with the small changes first and work my way up) -- comments below... ---- General feedback: This patch, on the whole, would be a lot easier to review -- and the final code a lot easier to make sense of -- if there were more class/method javadocs explaining what the purpose of everything was. I realize that could be said for a lot of the existing Solr code, but adding/refactoring classses/methods is the best time to try and improve the situation since that's when you're thining about it the most. ---- bq. "SimpleFacets.getFacetCounts() was removed because it would have been weird for SimpleFacets to create and use its own sub-class RangeFacetProcessor/DateFacetProcessor. However, in this patch I have replaced that bit of code with a static method in FacetComponent which should eliminate the repetition in MLTHandler. If you don't think it makes sense to keep this method where it is that's fine, but it is a public method designed for people writting custom request handlers to use to add faceting to their stuff -- so at the very least you should leave a deprecated stub in it's place that calls your new method. It also confuses the hell out of me that in the process of moving this method you made the method signature more complicated to use (more args to understand, more exceptions to deal with) and got rid of the javadocs so the new more complicated call signature is even harder to understand -- and I, frankly, can't make sense of why any of that is neccessary (even if should now live in a diff class because of the inheritence)... * why does the caller have to pass in an empty NamedList to be mutated? why can't the method just return a NamedList like the old method did? * why does the caller have to instantiate the date & range processors themselves? ** in both of the places this method is called, the processors are constructed just to pass to this method and then thrown away. ** So why can't this static method construct them internally (using the request, docset, params, and response builder from the SimpleFacets object that's also passed as an arg) and keep the method signature simple? * why does this new method catch & wrap SyntaxErrors but not IOExceptions like the old method? (If there reasons for these changes then great -- but they aren't clear just from reading the patch, and there's no javadocs to get guidance from) ---- bq. I removed the RangeFacetAccumulator class and replaced it with a LinkedHashMap of facetStr to RangeFacet. This RangeFacet class extends FacetBase just like PivotFacetValue and is responsible for aggregation of a single facet value. * I don't see a class called "RangeFacet" ... it looks like you talking about RangeFacetValue? ** FWIW: PivotFacetValue doesn't extend FacetBase because it models a single (value,count) pair in the context of a PivotFacetField (which may be hierarchical) ... you may be thinking of "PivotFacet" ** ... in which case, should this class _atually_ be named "RangeFacet" ? ** either way: can we please beef up the javadocs of this class to be crystal clear about what it's modeling -- at first glance I thought it was just modeling a facet.range *param* (with the field, start, end, calculator) before I realized it not only models the field + the indiviual buckets, but also the values in those buckets (but not the other params) * Isn't RangeFacetValue.fieldName redundent here? - it's already handled by FacetBase.facetOn. * I don't think there is any reason to initialize shardFieldValues in mergeContributionFromShard before the {{rangeFacet = rangeFromShard}} short-circut, it's just wasted cycles the first time the method gets called, correct? ---- Looking at the changes to PivotFacetValue... * why a "Map<String, RangeFacetValue>" for rangeCounts instead of a NamedList? ** it seems like a really bad idea for these to not be returned in a deterministic order -- it should be in the same order as the top level facet.range results (which is hte order of the facet.range params in the request) ** no idea if this HashMap usage currently causes problems in the shard merging, but it smells fishy -- and either way the user should be able to count on getting the range facets back in the same order the asked for them * i didn't dig into the full ramifications of this, but you've got PivotFacetValue constructing RangeFacetValue objects and passing the response key in as the "fieldName" argument to the constructor -- those aren't the same thing. Either the method call here is wrong, or the constructor arg should be renamed -- either way any usage of this "fieldName" constructor arg and internal usage should be sanity checked to ensure it's used consistnetly everywhere ** are there tests of using a diff response key from the fieldName when hanging ranges under pivots? (both single node and cloud?) * why is PivotFacetValue.createFromNamedList conditionally calling RangeFacetValue.mergeContributionFromShard? ** Unless i'm missing something, that method is only ever creating entirely new instances of PivotFacetValue, so there shouldn't be any existing data to merge into ... correct? * in PivotFacetValue.mergeContributionFromShard, the dozen or so lines of looping over each shard's response to either construct a new RangeFacetValue or call RangeFacetValue.mergeContributionFromShard seems like it could/should be refactored into a static helper method (in the RangeFacetValue class) ** the exact same loop is used in the only other place mergeContributionFromShard is called (FacetComponent) ** if PivotFacetValue.createFromNamedList really does need to conditional merge (see previuos comment), then it's loop would also be exactly the same, and it could use the same static helper method. ---- Since the input to PivotFacetHelper.mergeQueryCounts is already in the correct response format (ie: no special objects need constructed to model the simple counts) can't it just return "querycounts" if "receivingMap" is null? (instead of looping & adding) ---- With the current patch, something is wrong with how localparams are dealt with when trying to use facet.ranges inside of pivots. These queries are examples of doing range queries on the same field with diff gap params, and (independently) doing a pivot... {noformat} http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&facet=true&facet.pivot=manu_id_s,cat&facet.range={!key=key1%20tag=t1%20facet.range.start=0%20facet.range.end=1000%20facet.range.gap=200}price&facet.range={!key=key2%20tag=t1%20facet.range.start=0%20facet.range.end=500%20facet.range.gap=100}price http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&facet=true&facet.pivot=manu_id_s,cat&facet.range={!key=key1%20tag=t1%20facet.range.gap=200}price&f.price.facet.range.start=0&f.price.facet.range.end=1000&facet.range={!key=key2%20tag=t1%20facet.range.gap=100}price {noformat} ...but as soon as you try to hang those ranges on the pivot you get a local errors... {noformat} http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&facet=true&facet.pivot={!range=t1}manu_id_s,cat&facet.range={!key=key1%20tag=t1%20facet.range.start=0%20facet.range.end=1000%20facet.range.gap=200}price&facet.range={!key=key2%20tag=t1%20facet.range.start=0%20facet.range.end=500%20facet.range.gap=100}price ... Missing required parameter: f.price.facet.range.start (or default: facet.range.start) http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&facet=true&facet.pivot={!range=t1}manu_id_s,cat&facet.range={!key=key1%20tag=t1%20facet.range.gap=200}price&f.price.facet.range.start=0&f.price.facet.range.end=1000&facet.range={!key=key2%20tag=t1%20facet.range.gap=100}price ... Missing required parameter: f.price.facet.range.gap (or default: facet.range.gap) {noformat} ...the fact that this error only pops up when combining pivots + ranges suggests that the patch didn't _break_ the existing code controlling how the param parsing is done, it aparently just isn't re-using the exact same parsing code in both situations? Which also means that within a single request, the range facet params are still getting parsed (at least) twice -- once (correctly) for the top level facets, and (at least) once (incorrectly) when hanging off of the pivot. bq. ...Finally I settled for a crude approach of enhancing FacetBase with tags and just parsing the facet queries and ranges and putting them in the request context. ... Hmmm.... I haven't seen this piece of code yet, but it sounds like it only prevents the "tags" from being parsed over and over again -- from what I've seen so far, RangeFacetProcessor.getFacetRangeCounts(ParsedParams,NamedList) which delegates to RangeFacetProcessor.getFacetRangeCounts(SchemaField,RangeEndpointCalculator,ParsedParams) is still going to cause a new RangeEndpointCalculator to be constructed, and the exact same start/end/include/etc... Strings to be parsed over and over and over again, for every pivot value, at every level. These things should really just be parsed *once* in prepare(), put into the request context, and reused at every level of every pivot. To put it another way... With the patch right now, if you do the following query using the techproducts example data: {noformat} http://localhost:8983/solr/techproducts/select?rows=0&q=*:*&facet=true&facet.range=manufacturedate_dt&facet.range.start=2000-01-01T00:00:00Z&facet.range.end=2010-01-01T00:00:00Z&facet.range.gap=%2B1YEAR {noformat} ... then 41 DateRangeEndpointCalculator objects get instantated even though 1 is plenty; the start value of "2000-01-01T00:00:00Z" and the end value "2010-01-01T00:00:00Z" each get parsed into Date objects 41 times, even though the result is always exactly the same. That seems really inefficient ... doesn't it defeat the purpose of doing that param parsing in prepare()? > Let facet queries hang off of pivots > ------------------------------------ > > Key: SOLR-4212 > URL: https://issues.apache.org/jira/browse/SOLR-4212 > Project: Solr > Issue Type: Sub-task > Components: search > Affects Versions: 4.0 > Reporter: Steve Molloy > Assignee: Shalin Shekhar Mangar > Fix For: 5.3, Trunk > > Attachments: SOLR-4212-multiple-q.patch, SOLR-4212-multiple-q.patch, > SOLR-4212.patch, SOLR-4212.patch, SOLR-4212.patch, SOLR-4212.patch, > SOLR-4212.patch, SOLR-4212.patch, SOLR-6353-6686-4212.patch, > SOLR-6353-6686-4212.patch, SOLR-6353-6686-4212.patch, > SOLR-6353-6686-4212.patch, patch-4212.txt > > > Facet pivot provide hierarchical support for computing data used to populate > a treemap or similar visualization. TreeMaps usually offer users extra > information by applying an overlay color on top of the existing square sizes > based on hierarchical counts. This second count is based on user choices, > representing, usually with gradient, the proportion of the square that fits > the user's choices. > The proposition is to use local parameters to specify facet query to apply > for pivot which matches a tag set on facet query. Parameter format would look > like: > facet.pivot={!query=r1}category,manufacturer > facet.query={!tag=r1}somequery > facet.query={!tag=r1}somedate:[NOW-1YEAR TO NOW] -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org