Also, there should be examples from other fields. Suppose you are
indexing map data and want to support a UI that shows "hot spots" on
the map where there is a lot of let's say ... activity of some sort.
You'd like to facet on 2-d areas.

Or for log analytics -- you want to do anomaly detection and find
regions of time and some other dimension (API endpoint, host,
whatever) that have a lot of -- events of interest. Probably could
benefit from multi-dimensional faceting?

On Wed, May 25, 2022 at 2:07 AM Patrick Zhai <zhai7...@gmail.com> wrote:
>
> Hi Greg, thanks for the explanation! The example makes perfect sense to me, I 
> was under the impression that this was combining two independent fields and I 
> was wrong.
>
> I'm not biased towards having or not a new field for it, but for multi-value, 
> don't we have a SortedSetDocValuesField that works as a multi-value version 
> of BDV?
>
> Best
> Patrick
>
> On Tue, May 24, 2022 at 9:17 PM Greg Miller <gsmil...@gmail.com> wrote:
>>
>> Thanks for the comments Patrick, but I'm not sure I'm fully
>> understanding the suggestion here. I don't see a path forward that
>> uses different fields, but maybe I'm missing something. Imagine you're
>> running an ecommerce site selling automotive parts and you need to
>> index fitment information that consists of the year + make of vehicles
>> a part fits. Imagine a set of wiper blades fit 2010 Ford vehicles and
>> 2011 Chevy vehicles (but _not_ 2011 Ford or 2010 Chevy). And let's say
>> we want to facet on products that fit a 2011 Ford. We need to make
>> sure this product does _not_ count. We can achieve this with points in
>> two dimensions (year + make), but not as two separate fields (at least
>> as far as I can come up with). A "two separate field approach" would
>> consist of indexing year and make separately, and you'd lose the
>> information that only certain combinations are valid. Am I overlooking
>> something with your suggestion? Maybe there's something we can do with
>> Lucene already that solves for this case and I'm just not aware of it?
>> That's entirely possible and I'd love to learn more if there is!
>>
>> As for MultiRangeQuery and the mention of sandbox modules, I think
>> that's a bit of a different use-case. MultiRangeQuery lets you filter
>> by a disjunction of ranges. The "multi" part doesn't relate to
>> "multiple values in a doc" (but it does support that, as do the
>> "standard" range queries).
>>
>> Where I see a gap right now, beyond just faceting, is that we can
>> represent N-dim points in the points index and filter on them (using
>> the points index), but we have no doc values equivalent. This means,
>> 1) we can't facet, and 2) we can't create a "slow" query that does
>> post-filtering instead of using the points index (which could be a
>> very real advantage in cases with a sparse match set but a dense
>> points index). So I like the idea of creating that concept and being
>> able to facet and filter on it. Whether-or-not this is a "formal" doc
>> values type or sits on top of BDV, I have less of a strong opinion.
>>
>> And finally... it really should be multi-valued. The points index
>> supports multiple points-per-field within a single document. Seems
>> like a big gap that we wouldn't support that with a doc value field.
>> Because BDV is inherently single-valued, I propose we come up with an
>> encoding scheme that encodes multiple points on top of that "single"
>> BDV entry. This is where building on BDV started to feel a little icky
>> to me and it seemed like it might be a good use-case for actually
>> formalizing a format/encoding, but again, no strong preference. We
>> could certainly do something more quickly on top of BDV and formalize
>> an encoding later if/as necessary.
>>
>> Thanks again for the discussion so far Marc, Partrick and Rob!
>>
>> Cheers,
>> -Greg
>>
>> On Tue, May 24, 2022 at 10:35 AM Patrick Zhai <zhai7...@gmail.com> wrote:
>> >
>> > As pointed out by Rob in the issue
>> >
>> >> I would also suggest to start with the simple 
>> >> separate-numeric-docvalues-fields case and use similar logic as the 
>> >> org.apache.lucene.facet.range package, just on 2-D, or maybe 3-D, N-D, etc
>> >
>> >
>> > I think that's a preferable solution to me, because:
>> > 1. It does not couple the dimensions together so that people can combine 
>> > them freely
>> > 2. It might be able to be compressed better
>> >
>> > Best
>> >
>> > On Tue, May 24, 2022 at 9:08 AM Marc D'Mello <marcd2...@gmail.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >> Thanks for the responses! For Patrick's question, right now in faceting 
>> >> we don't have any good way to AND between two fields. I think the 
>> >> original hyper rectangle issue has a good example of a use case: 
>> >> https://issues.apache.org/jira/browse/LUCENE-10274.
>> >>
>> >> As for Robert's point, this feature would also allow us to use 
>> >> MultiRangeQuery in IndexOrDocValuesQuery, but MultiRangeQuery is itself 
>> >> in the sandbox module so I'm assuming that's a pretty exotic use case as 
>> >> well. I personally have no issues using BinaryDocValues for this, I was 
>> >> just wondering if it would be better to create a dedicated doc values, 
>> >> but it seems that is not that case.
>> >>
>> >> Thanks,
>> >> Marc
>> >>
>> >> On Tue, May 24, 2022 at 1:27 AM Robert Muir <rcm...@gmail.com> wrote:
>> >>>
>> >>> This seems really exotic feature to add a dedicated docvalues field for.
>> >>>
>> >>> We should let BINARY be the catchall for stuff like this.
>> >>>
>> >>> On Mon, May 23, 2022 at 10:17 PM Marc D'Mello <marcd2...@gmail.com> 
>> >>> wrote:
>> >>> >
>> >>> > Hi,
>> >>> >
>> >>> > Some background: I've been working on this PR to add hyper rectangle 
>> >>> > faceting capabilities to Lucene facets and I needed to create a new 
>> >>> > doc values field to support this feature. Initially, I had a field 
>> >>> > that just extended BinaryDocValues, but then a discussion came up 
>> >>> > about whether to add a completely new DocValues field, maybe something 
>> >>> > like PointDocValuesField (and SortedPointDocValuesField as the 
>> >>> > multivalued version) to add first class support for this new field. 
>> >>> > Here is the link to the discussion. I think there are a few benefits 
>> >>> > to this:
>> >>> >
>> >>> > Formalize how we would store points as doc values rather than just 
>> >>> > packing points into a BinaryDocValues field in a format that could 
>> >>> > change at any time
>> >>> > NumericDocValues enables us to create a SortedNumericDocValuesRange 
>> >>> > query which can be used with IndexOrDocValuesQuery to make some range 
>> >>> > queries more efficient. Adding this new doc values field would let us 
>> >>> > do the same thing with higher dimensional ranges
>> >>> >
>> >>> > I'm sure I could be missing some benefits, and I also am not super 
>> >>> > experienced with Lucene so there could be drawbacks I am missing as 
>> >>> > well :). From what I understand though, Lucene doesn't have a lot of 
>> >>> > DocValues fields and there should be some thought put into adding new 
>> >>> > ones, so I was wondering if I could get some feedback about the idea. 
>> >>> > Thanks!
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> >>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> >>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to