Hi Greg, thanks for the explanation! The example makes perfect sense to me,
I was under the impression that this was combining two independent fields
and I was wrong.

I'm not biased towards having or not a new field for it, but for
multi-value, don't we have a SortedSetDocValuesField that works as a
multi-value version of BDV?

Best
Patrick

On Tue, May 24, 2022 at 9:17 PM Greg Miller <gsmil...@gmail.com> wrote:

> Thanks for the comments Patrick, but I'm not sure I'm fully
> understanding the suggestion here. I don't see a path forward that
> uses different fields, but maybe I'm missing something. Imagine you're
> running an ecommerce site selling automotive parts and you need to
> index fitment information that consists of the year + make of vehicles
> a part fits. Imagine a set of wiper blades fit 2010 Ford vehicles and
> 2011 Chevy vehicles (but _not_ 2011 Ford or 2010 Chevy). And let's say
> we want to facet on products that fit a 2011 Ford. We need to make
> sure this product does _not_ count. We can achieve this with points in
> two dimensions (year + make), but not as two separate fields (at least
> as far as I can come up with). A "two separate field approach" would
> consist of indexing year and make separately, and you'd lose the
> information that only certain combinations are valid. Am I overlooking
> something with your suggestion? Maybe there's something we can do with
> Lucene already that solves for this case and I'm just not aware of it?
> That's entirely possible and I'd love to learn more if there is!
>
> As for MultiRangeQuery and the mention of sandbox modules, I think
> that's a bit of a different use-case. MultiRangeQuery lets you filter
> by a disjunction of ranges. The "multi" part doesn't relate to
> "multiple values in a doc" (but it does support that, as do the
> "standard" range queries).
>
> Where I see a gap right now, beyond just faceting, is that we can
> represent N-dim points in the points index and filter on them (using
> the points index), but we have no doc values equivalent. This means,
> 1) we can't facet, and 2) we can't create a "slow" query that does
> post-filtering instead of using the points index (which could be a
> very real advantage in cases with a sparse match set but a dense
> points index). So I like the idea of creating that concept and being
> able to facet and filter on it. Whether-or-not this is a "formal" doc
> values type or sits on top of BDV, I have less of a strong opinion.
>
> And finally... it really should be multi-valued. The points index
> supports multiple points-per-field within a single document. Seems
> like a big gap that we wouldn't support that with a doc value field.
> Because BDV is inherently single-valued, I propose we come up with an
> encoding scheme that encodes multiple points on top of that "single"
> BDV entry. This is where building on BDV started to feel a little icky
> to me and it seemed like it might be a good use-case for actually
> formalizing a format/encoding, but again, no strong preference. We
> could certainly do something more quickly on top of BDV and formalize
> an encoding later if/as necessary.
>
> Thanks again for the discussion so far Marc, Partrick and Rob!
>
> Cheers,
> -Greg
>
> On Tue, May 24, 2022 at 10:35 AM Patrick Zhai <zhai7...@gmail.com> wrote:
> >
> > As pointed out by Rob in the issue
> >
> >> I would also suggest to start with the simple
> separate-numeric-docvalues-fields case and use similar logic as the
> org.apache.lucene.facet.range package, just on 2-D, or maybe 3-D, N-D, etc
> >
> >
> > I think that's a preferable solution to me, because:
> > 1. It does not couple the dimensions together so that people can combine
> them freely
> > 2. It might be able to be compressed better
> >
> > Best
> >
> > On Tue, May 24, 2022 at 9:08 AM Marc D'Mello <marcd2...@gmail.com>
> wrote:
> >>
> >> Hi,
> >>
> >> Thanks for the responses! For Patrick's question, right now in faceting
> we don't have any good way to AND between two fields. I think the original
> hyper rectangle issue has a good example of a use case:
> https://issues.apache.org/jira/browse/LUCENE-10274.
> >>
> >> As for Robert's point, this feature would also allow us to use
> MultiRangeQuery in IndexOrDocValuesQuery, but MultiRangeQuery is itself in
> the sandbox module so I'm assuming that's a pretty exotic use case as well.
> I personally have no issues using BinaryDocValues for this, I was just
> wondering if it would be better to create a dedicated doc values, but it
> seems that is not that case.
> >>
> >> Thanks,
> >> Marc
> >>
> >> On Tue, May 24, 2022 at 1:27 AM Robert Muir <rcm...@gmail.com> wrote:
> >>>
> >>> This seems really exotic feature to add a dedicated docvalues field
> for.
> >>>
> >>> We should let BINARY be the catchall for stuff like this.
> >>>
> >>> On Mon, May 23, 2022 at 10:17 PM Marc D'Mello <marcd2...@gmail.com>
> wrote:
> >>> >
> >>> > Hi,
> >>> >
> >>> > Some background: I've been working on this PR to add hyper rectangle
> faceting capabilities to Lucene facets and I needed to create a new doc
> values field to support this feature. Initially, I had a field that just
> extended BinaryDocValues, but then a discussion came up about whether to
> add a completely new DocValues field, maybe something like
> PointDocValuesField (and SortedPointDocValuesField as the multivalued
> version) to add first class support for this new field. Here is the link to
> the discussion. I think there are a few benefits to this:
> >>> >
> >>> > Formalize how we would store points as doc values rather than just
> packing points into a BinaryDocValues field in a format that could change
> at any time
> >>> > NumericDocValues enables us to create a SortedNumericDocValuesRange
> query which can be used with IndexOrDocValuesQuery to make some range
> queries more efficient. Adding this new doc values field would let us do
> the same thing with higher dimensional ranges
> >>> >
> >>> > I'm sure I could be missing some benefits, and I also am not super
> experienced with Lucene so there could be drawbacks I am missing as well
> :). From what I understand though, Lucene doesn't have a lot of DocValues
> fields and there should be some thought put into adding new ones, so I was
> wondering if I could get some feedback about the idea. Thanks!
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >>> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Reply via email to