Hi Greg, thanks for the explanation! The example makes perfect sense to me, I was under the impression that this was combining two independent fields and I was wrong.
I'm not biased towards having or not a new field for it, but for multi-value, don't we have a SortedSetDocValuesField that works as a multi-value version of BDV? Best Patrick On Tue, May 24, 2022 at 9:17 PM Greg Miller <gsmil...@gmail.com> wrote: > Thanks for the comments Patrick, but I'm not sure I'm fully > understanding the suggestion here. I don't see a path forward that > uses different fields, but maybe I'm missing something. Imagine you're > running an ecommerce site selling automotive parts and you need to > index fitment information that consists of the year + make of vehicles > a part fits. Imagine a set of wiper blades fit 2010 Ford vehicles and > 2011 Chevy vehicles (but _not_ 2011 Ford or 2010 Chevy). And let's say > we want to facet on products that fit a 2011 Ford. We need to make > sure this product does _not_ count. We can achieve this with points in > two dimensions (year + make), but not as two separate fields (at least > as far as I can come up with). A "two separate field approach" would > consist of indexing year and make separately, and you'd lose the > information that only certain combinations are valid. Am I overlooking > something with your suggestion? Maybe there's something we can do with > Lucene already that solves for this case and I'm just not aware of it? > That's entirely possible and I'd love to learn more if there is! > > As for MultiRangeQuery and the mention of sandbox modules, I think > that's a bit of a different use-case. MultiRangeQuery lets you filter > by a disjunction of ranges. The "multi" part doesn't relate to > "multiple values in a doc" (but it does support that, as do the > "standard" range queries). > > Where I see a gap right now, beyond just faceting, is that we can > represent N-dim points in the points index and filter on them (using > the points index), but we have no doc values equivalent. This means, > 1) we can't facet, and 2) we can't create a "slow" query that does > post-filtering instead of using the points index (which could be a > very real advantage in cases with a sparse match set but a dense > points index). So I like the idea of creating that concept and being > able to facet and filter on it. Whether-or-not this is a "formal" doc > values type or sits on top of BDV, I have less of a strong opinion. > > And finally... it really should be multi-valued. The points index > supports multiple points-per-field within a single document. Seems > like a big gap that we wouldn't support that with a doc value field. > Because BDV is inherently single-valued, I propose we come up with an > encoding scheme that encodes multiple points on top of that "single" > BDV entry. This is where building on BDV started to feel a little icky > to me and it seemed like it might be a good use-case for actually > formalizing a format/encoding, but again, no strong preference. We > could certainly do something more quickly on top of BDV and formalize > an encoding later if/as necessary. > > Thanks again for the discussion so far Marc, Partrick and Rob! > > Cheers, > -Greg > > On Tue, May 24, 2022 at 10:35 AM Patrick Zhai <zhai7...@gmail.com> wrote: > > > > As pointed out by Rob in the issue > > > >> I would also suggest to start with the simple > separate-numeric-docvalues-fields case and use similar logic as the > org.apache.lucene.facet.range package, just on 2-D, or maybe 3-D, N-D, etc > > > > > > I think that's a preferable solution to me, because: > > 1. It does not couple the dimensions together so that people can combine > them freely > > 2. It might be able to be compressed better > > > > Best > > > > On Tue, May 24, 2022 at 9:08 AM Marc D'Mello <marcd2...@gmail.com> > wrote: > >> > >> Hi, > >> > >> Thanks for the responses! For Patrick's question, right now in faceting > we don't have any good way to AND between two fields. I think the original > hyper rectangle issue has a good example of a use case: > https://issues.apache.org/jira/browse/LUCENE-10274. > >> > >> As for Robert's point, this feature would also allow us to use > MultiRangeQuery in IndexOrDocValuesQuery, but MultiRangeQuery is itself in > the sandbox module so I'm assuming that's a pretty exotic use case as well. > I personally have no issues using BinaryDocValues for this, I was just > wondering if it would be better to create a dedicated doc values, but it > seems that is not that case. > >> > >> Thanks, > >> Marc > >> > >> On Tue, May 24, 2022 at 1:27 AM Robert Muir <rcm...@gmail.com> wrote: > >>> > >>> This seems really exotic feature to add a dedicated docvalues field > for. > >>> > >>> We should let BINARY be the catchall for stuff like this. > >>> > >>> On Mon, May 23, 2022 at 10:17 PM Marc D'Mello <marcd2...@gmail.com> > wrote: > >>> > > >>> > Hi, > >>> > > >>> > Some background: I've been working on this PR to add hyper rectangle > faceting capabilities to Lucene facets and I needed to create a new doc > values field to support this feature. Initially, I had a field that just > extended BinaryDocValues, but then a discussion came up about whether to > add a completely new DocValues field, maybe something like > PointDocValuesField (and SortedPointDocValuesField as the multivalued > version) to add first class support for this new field. Here is the link to > the discussion. I think there are a few benefits to this: > >>> > > >>> > Formalize how we would store points as doc values rather than just > packing points into a BinaryDocValues field in a format that could change > at any time > >>> > NumericDocValues enables us to create a SortedNumericDocValuesRange > query which can be used with IndexOrDocValuesQuery to make some range > queries more efficient. Adding this new doc values field would let us do > the same thing with higher dimensional ranges > >>> > > >>> > I'm sure I could be missing some benefits, and I also am not super > experienced with Lucene so there could be drawbacks I am missing as well > :). From what I understand though, Lucene doesn't have a lot of DocValues > fields and there should be some thought put into adding new ones, so I was > wondering if I could get some feedback about the idea. Thanks! > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >>> For additional commands, e-mail: dev-h...@lucene.apache.org > >>> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >