[ https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124591#comment-13124591 ]
Hoss Man commented on SOLR-2366: -------------------------------- Jan: I've got to be completely honest here -- catching up on this issue, I got really confused and lost by some of your comments and the updated docs. This sequence of comments really stands out at me... {quote} I have no good answer to this, other than inventing some syntax. ... I think the values facet.range.include=upper/lower is clear. Outer/edge would need some more work/definition. ... *My primary reason for suggesting this is to give users a terse, intuitive syntax for ranges.* ... One thing this improvement needs to tackle is how to return the range buckets in the Response. It will not be enough with the simple range_facet format ... We need something which can return the explicit ranges, {quote} (emphasis added by me) I really liked the simplicity of your earlier proposal, and I agree that it would be really powerful/helpful to give users a terse, intuitive syntax for specifying sequential ranges of variable sizes -- but it seems like we're really moving away from the syntax being "intuitive" because of the hoops you're having to jump through to treat this as an extension of the existing "facet.range" param in your design. I think we really ought to revisit my earlier suggestion to approach this as an entirely new "type" of faceting - not a new plugin or a contrib, but a new first-class type of faceting that FacetComponent would support, right along side facet.field, facet.query, and facet.range. Let's ignore everything about the existing facet.range.* param syntax, and the facet_range response format, and think about what makes the most sense for this feature on it's own. If there are ideas from facet.range that make sense to carry over (like facet.range.include) then great -- but let's approach it from the "something new that can borrow from facet.range" standpoint instead of the "extension to facet.range that has a bunch of caveats with how facet.range already works" I mean: if it looks like a duck, walks like a duck, and quacks like a duck, then i'm happy to call it a duck -- but in this case: * doesn't make sense with facet.range.other * needs special start/end syntax to play nice with facet.range.start/end * needs to change the response format ...ie: it doesn't look the same, it doesn't walk the same, and it doesn't quack. --- Regardless of whether this functionality becomes part of facet.range or not, I wanted to comment specifically on this idea... bq. If all gaps are specified as explicit ranges this is no ambiguity, so we could require all gaps to be explicit ranges if one wants to use it? This seems like a really harsh limitation to impose. If the only way to use an explicit range is in use cases where you *only* use explicit ranges, then what value add does this feature give you over just using multiple facet.query params? (it might be marginally fewer characters, but multiple facet.query params seem more intuitive and easier to read). I mean: I don't have a solution to propose, it just seems like there's not much point in supporting explicit ranges in that case. --- Having not thought about this issue in almost a month, and revisiting it with (fairly) fresh eyes, and thinking about all the use cases that have been discussed, it seems like the main goals we should address are really: * an intuitive syntax for specifying end points for ranges of varying sizes * ability to specify range end points using either fixed values or increments * ability to specify that ranges should be either use sequential end points, or be overlapping relative some fixed min/max value In other words: the only reason (that i know of) why overlapping ranges even came up in this issue was use cases like... {noformat} Price: $0-10, $0-20, $0-50, $0-100 Date: NOW-1DAY TO NOW, NOW-1MONTH TO NOW, NOW-1YEAR TO NOW {noformat} ...there doesn't seem to be a lot of motivations for using overlapping ranges in the "middle" of a sequence, and these types of use cases where *all* the ranges overlap seem just as important as use cases where the ranges don't overlap at all... {noformat} Price: $0-10, $10-20, $20-50, $50-100 Date: NOW-1DAY TO NOW, NOW-1MONTH TO NOW-1DAY, NOW-1YEAR TO NOW-1MONTH {noformat} ...so let's try to focus on a syntax that makes both easy, using both fixed and relative values, w/o worrying about supporting arbitrary overlapping ranges (since I can't think of a use case for it, and it could always be achieved using facet.query) So how about something like... {noformat} facet.sequence=<fieldname> facet.sequence.spec=[<wild>,]?<val>,<relval>[,<relval>]*[,<wild>]? facet.sequence.type=[before|after|between] facet.sequence.include=(same as facet.range.include) {noformat} Where "relval" would either be a concrete value, or a relative value; the effective sequence has to either increase or decrease consistently or it's an error; and "facet.sequence.type" determines whether the ranges are overlapping ("before" and "after") or not ("between") So if you had a spec like this... {noformat} facet.sequence.spec=0,10,+10,50,+50 {noformat} Then depending on facet.sequence.type you could either get... {noformat} facet.sequence.type=after Price: $0-10, $0-20, $0-50, $0-100 facet.sequence.type=between Price: $0-10, $10-20, $20-50, $50-100 facet.sequence.type=before Price: $0-100, $10-100, $20-100, $50-100 {noformat} "*" could be used at the start or end to indicate that you wanted an unbounded range, but it wouldn't be a factor in determining the "fixed point" used if type was "after" or "before", ie... {noformat} f.price.facet.sequence.spec=*,0,10,+10,50,+50,* f.created.facet.sequence.spec=NOW,-1DAY,-1MONTH,-1YEAR facet.sequence.type=after Price: below $0, $0-10, $0-20, $0-50, $0-100, $100 and up Created: NOW-1YEAR TO NOW, NOW-1YEAR TO NOW-1DAY, NOW-1YEAR TO NOW-1MONTH facet.sequence.type=between Price: below $0, $0-10, $10-20, $20-50, $50-100, $100 and up Created: NOW-1DAY TO NOW, NOW-1MONTH TO NOW-1DAY, NOW-1YEAR TO NOW-1MONTH facet.sequence.type=before Price: below $0, $0-100, $10-100, $20-100, $50-100, $100 and up Created: NOW-1DAY TO NOW, NOW-1MONTH TO NOW, NOW-1YEAR TO NOW {noformat} ...if we defined things that way, i *think* that would simplify a lot of the complexity we've been talking about, and simplify some of the use cases. the only remaining issues that have been brought up (that i can think of) that would still need to be work out would be: 1) what the response format needs to look like - I'd vote to punt on this until we figure out the semantics. 2) when exactly ranges are inclusive/exclusive of their endpoints - i *think* we should be able reuse the semantics from facet.range.include here, including "edge", if we define ranges involving "*" as "outer" ranges, but we'd need to work through more scenarios to be sure. 3) what happens if an increment overlaps with an absolute value, ie: my original example of "10,20,+50,+100,120,150". The three possible solutions I can think of are: * fail loudly * implement "precedence" rules, ie: that absolute values trump relative values (10-20,20-70,70-120,120-150) or vice-versa (10-20,20-70,70-170) * implement precedence rules but let them be controlled via a request param (similar to how "facet.range.hardend" works) --- What do you think? Are there any key use cases / features we've talked about that you think this approach overlooks? Do you still think it should really be an extension to "facet.range" ? > Facet Range Gaps > ---------------- > > Key: SOLR-2366 > URL: https://issues.apache.org/jira/browse/SOLR-2366 > Project: Solr > Issue Type: Improvement > Reporter: Grant Ingersoll > Priority: Minor > Fix For: 3.5, 4.0 > > Attachments: SOLR-2366.patch, SOLR-2366.patch > > > There really is no reason why the range gap for date and numeric faceting > needs to be evenly spaced. For instance, if and when SOLR-1581 is completed > and one were doing spatial distance calculations, one could facet by function > into 3 different sized buckets: walking distance (0-5KM), driving distance > (5KM-150KM) and everything else (150KM+), for instance. We should be able to > quantize the results into arbitrarily sized buckets. > (Original syntax proposal removed, see discussion for concrete syntax) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org