Thanks for the comments Patrick, but I'm not sure I'm fully
understanding the suggestion here. I don't see a path forward that
uses different fields, but maybe I'm missing something. Imagine you're
running an ecommerce site selling automotive parts and you need to
index fitment information that consists of the year + make of vehicles
a part fits. Imagine a set of wiper blades fit 2010 Ford vehicles and
2011 Chevy vehicles (but _not_ 2011 Ford or 2010 Chevy). And let's say
we want to facet on products that fit a 2011 Ford. We need to make
sure this product does _not_ count. We can achieve this with points in
two dimensions (year + make), but not as two separate fields (at least
as far as I can come up with). A "two separate field approach" would
consist of indexing year and make separately, and you'd lose the
information that only certain combinations are valid. Am I overlooking
something with your suggestion? Maybe there's something we can do with
Lucene already that solves for this case and I'm just not aware of it?
That's entirely possible and I'd love to learn more if there is!

As for MultiRangeQuery and the mention of sandbox modules, I think
that's a bit of a different use-case. MultiRangeQuery lets you filter
by a disjunction of ranges. The "multi" part doesn't relate to
"multiple values in a doc" (but it does support that, as do the
"standard" range queries).

Where I see a gap right now, beyond just faceting, is that we can
represent N-dim points in the points index and filter on them (using
the points index), but we have no doc values equivalent. This means,
1) we can't facet, and 2) we can't create a "slow" query that does
post-filtering instead of using the points index (which could be a
very real advantage in cases with a sparse match set but a dense
points index). So I like the idea of creating that concept and being
able to facet and filter on it. Whether-or-not this is a "formal" doc
values type or sits on top of BDV, I have less of a strong opinion.

And finally... it really should be multi-valued. The points index
supports multiple points-per-field within a single document. Seems
like a big gap that we wouldn't support that with a doc value field.
Because BDV is inherently single-valued, I propose we come up with an
encoding scheme that encodes multiple points on top of that "single"
BDV entry. This is where building on BDV started to feel a little icky
to me and it seemed like it might be a good use-case for actually
formalizing a format/encoding, but again, no strong preference. We
could certainly do something more quickly on top of BDV and formalize
an encoding later if/as necessary.

Thanks again for the discussion so far Marc, Partrick and Rob!

Cheers,
-Greg

On Tue, May 24, 2022 at 10:35 AM Patrick Zhai <zhai7...@gmail.com> wrote:
>
> As pointed out by Rob in the issue
>
>> I would also suggest to start with the simple 
>> separate-numeric-docvalues-fields case and use similar logic as the 
>> org.apache.lucene.facet.range package, just on 2-D, or maybe 3-D, N-D, etc
>
>
> I think that's a preferable solution to me, because:
> 1. It does not couple the dimensions together so that people can combine them 
> freely
> 2. It might be able to be compressed better
>
> Best
>
> On Tue, May 24, 2022 at 9:08 AM Marc D'Mello <marcd2...@gmail.com> wrote:
>>
>> Hi,
>>
>> Thanks for the responses! For Patrick's question, right now in faceting we 
>> don't have any good way to AND between two fields. I think the original 
>> hyper rectangle issue has a good example of a use case: 
>> https://issues.apache.org/jira/browse/LUCENE-10274.
>>
>> As for Robert's point, this feature would also allow us to use 
>> MultiRangeQuery in IndexOrDocValuesQuery, but MultiRangeQuery is itself in 
>> the sandbox module so I'm assuming that's a pretty exotic use case as well. 
>> I personally have no issues using BinaryDocValues for this, I was just 
>> wondering if it would be better to create a dedicated doc values, but it 
>> seems that is not that case.
>>
>> Thanks,
>> Marc
>>
>> On Tue, May 24, 2022 at 1:27 AM Robert Muir <rcm...@gmail.com> wrote:
>>>
>>> This seems really exotic feature to add a dedicated docvalues field for.
>>>
>>> We should let BINARY be the catchall for stuff like this.
>>>
>>> On Mon, May 23, 2022 at 10:17 PM Marc D'Mello <marcd2...@gmail.com> wrote:
>>> >
>>> > Hi,
>>> >
>>> > Some background: I've been working on this PR to add hyper rectangle 
>>> > faceting capabilities to Lucene facets and I needed to create a new doc 
>>> > values field to support this feature. Initially, I had a field that just 
>>> > extended BinaryDocValues, but then a discussion came up about whether to 
>>> > add a completely new DocValues field, maybe something like 
>>> > PointDocValuesField (and SortedPointDocValuesField as the multivalued 
>>> > version) to add first class support for this new field. Here is the link 
>>> > to the discussion. I think there are a few benefits to this:
>>> >
>>> > Formalize how we would store points as doc values rather than just 
>>> > packing points into a BinaryDocValues field in a format that could change 
>>> > at any time
>>> > NumericDocValues enables us to create a SortedNumericDocValuesRange query 
>>> > which can be used with IndexOrDocValuesQuery to make some range queries 
>>> > more efficient. Adding this new doc values field would let us do the same 
>>> > thing with higher dimensional ranges
>>> >
>>> > I'm sure I could be missing some benefits, and I also am not super 
>>> > experienced with Lucene so there could be drawbacks I am missing as well 
>>> > :). >From what I understand though, Lucene doesn't have a lot of 
>>> > DocValues fields and there should be some thought put into adding new 
>>> > ones, so I was wondering if I could get some feedback about the idea. 
>>> > Thanks!
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to