Thanks for the comments Patrick, but I'm not sure I'm fully understanding the suggestion here. I don't see a path forward that uses different fields, but maybe I'm missing something. Imagine you're running an ecommerce site selling automotive parts and you need to index fitment information that consists of the year + make of vehicles a part fits. Imagine a set of wiper blades fit 2010 Ford vehicles and 2011 Chevy vehicles (but _not_ 2011 Ford or 2010 Chevy). And let's say we want to facet on products that fit a 2011 Ford. We need to make sure this product does _not_ count. We can achieve this with points in two dimensions (year + make), but not as two separate fields (at least as far as I can come up with). A "two separate field approach" would consist of indexing year and make separately, and you'd lose the information that only certain combinations are valid. Am I overlooking something with your suggestion? Maybe there's something we can do with Lucene already that solves for this case and I'm just not aware of it? That's entirely possible and I'd love to learn more if there is!
As for MultiRangeQuery and the mention of sandbox modules, I think that's a bit of a different use-case. MultiRangeQuery lets you filter by a disjunction of ranges. The "multi" part doesn't relate to "multiple values in a doc" (but it does support that, as do the "standard" range queries). Where I see a gap right now, beyond just faceting, is that we can represent N-dim points in the points index and filter on them (using the points index), but we have no doc values equivalent. This means, 1) we can't facet, and 2) we can't create a "slow" query that does post-filtering instead of using the points index (which could be a very real advantage in cases with a sparse match set but a dense points index). So I like the idea of creating that concept and being able to facet and filter on it. Whether-or-not this is a "formal" doc values type or sits on top of BDV, I have less of a strong opinion. And finally... it really should be multi-valued. The points index supports multiple points-per-field within a single document. Seems like a big gap that we wouldn't support that with a doc value field. Because BDV is inherently single-valued, I propose we come up with an encoding scheme that encodes multiple points on top of that "single" BDV entry. This is where building on BDV started to feel a little icky to me and it seemed like it might be a good use-case for actually formalizing a format/encoding, but again, no strong preference. We could certainly do something more quickly on top of BDV and formalize an encoding later if/as necessary. Thanks again for the discussion so far Marc, Partrick and Rob! Cheers, -Greg On Tue, May 24, 2022 at 10:35 AM Patrick Zhai <zhai7...@gmail.com> wrote: > > As pointed out by Rob in the issue > >> I would also suggest to start with the simple >> separate-numeric-docvalues-fields case and use similar logic as the >> org.apache.lucene.facet.range package, just on 2-D, or maybe 3-D, N-D, etc > > > I think that's a preferable solution to me, because: > 1. It does not couple the dimensions together so that people can combine them > freely > 2. It might be able to be compressed better > > Best > > On Tue, May 24, 2022 at 9:08 AM Marc D'Mello <marcd2...@gmail.com> wrote: >> >> Hi, >> >> Thanks for the responses! For Patrick's question, right now in faceting we >> don't have any good way to AND between two fields. I think the original >> hyper rectangle issue has a good example of a use case: >> https://issues.apache.org/jira/browse/LUCENE-10274. >> >> As for Robert's point, this feature would also allow us to use >> MultiRangeQuery in IndexOrDocValuesQuery, but MultiRangeQuery is itself in >> the sandbox module so I'm assuming that's a pretty exotic use case as well. >> I personally have no issues using BinaryDocValues for this, I was just >> wondering if it would be better to create a dedicated doc values, but it >> seems that is not that case. >> >> Thanks, >> Marc >> >> On Tue, May 24, 2022 at 1:27 AM Robert Muir <rcm...@gmail.com> wrote: >>> >>> This seems really exotic feature to add a dedicated docvalues field for. >>> >>> We should let BINARY be the catchall for stuff like this. >>> >>> On Mon, May 23, 2022 at 10:17 PM Marc D'Mello <marcd2...@gmail.com> wrote: >>> > >>> > Hi, >>> > >>> > Some background: I've been working on this PR to add hyper rectangle >>> > faceting capabilities to Lucene facets and I needed to create a new doc >>> > values field to support this feature. Initially, I had a field that just >>> > extended BinaryDocValues, but then a discussion came up about whether to >>> > add a completely new DocValues field, maybe something like >>> > PointDocValuesField (and SortedPointDocValuesField as the multivalued >>> > version) to add first class support for this new field. Here is the link >>> > to the discussion. I think there are a few benefits to this: >>> > >>> > Formalize how we would store points as doc values rather than just >>> > packing points into a BinaryDocValues field in a format that could change >>> > at any time >>> > NumericDocValues enables us to create a SortedNumericDocValuesRange query >>> > which can be used with IndexOrDocValuesQuery to make some range queries >>> > more efficient. Adding this new doc values field would let us do the same >>> > thing with higher dimensional ranges >>> > >>> > I'm sure I could be missing some benefits, and I also am not super >>> > experienced with Lucene so there could be drawbacks I am missing as well >>> > :). >From what I understand though, Lucene doesn't have a lot of >>> > DocValues fields and there should be some thought put into adding new >>> > ones, so I was wondering if I could get some feedback about the idea. >>> > Thanks! >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org