Also, there should be examples from other fields. Suppose you are indexing map data and want to support a UI that shows "hot spots" on the map where there is a lot of let's say ... activity of some sort. You'd like to facet on 2-d areas.
Or for log analytics -- you want to do anomaly detection and find regions of time and some other dimension (API endpoint, host, whatever) that have a lot of -- events of interest. Probably could benefit from multi-dimensional faceting? On Wed, May 25, 2022 at 2:07 AM Patrick Zhai <zhai7...@gmail.com> wrote: > > Hi Greg, thanks for the explanation! The example makes perfect sense to me, I > was under the impression that this was combining two independent fields and I > was wrong. > > I'm not biased towards having or not a new field for it, but for multi-value, > don't we have a SortedSetDocValuesField that works as a multi-value version > of BDV? > > Best > Patrick > > On Tue, May 24, 2022 at 9:17 PM Greg Miller <gsmil...@gmail.com> wrote: >> >> Thanks for the comments Patrick, but I'm not sure I'm fully >> understanding the suggestion here. I don't see a path forward that >> uses different fields, but maybe I'm missing something. Imagine you're >> running an ecommerce site selling automotive parts and you need to >> index fitment information that consists of the year + make of vehicles >> a part fits. Imagine a set of wiper blades fit 2010 Ford vehicles and >> 2011 Chevy vehicles (but _not_ 2011 Ford or 2010 Chevy). And let's say >> we want to facet on products that fit a 2011 Ford. We need to make >> sure this product does _not_ count. We can achieve this with points in >> two dimensions (year + make), but not as two separate fields (at least >> as far as I can come up with). A "two separate field approach" would >> consist of indexing year and make separately, and you'd lose the >> information that only certain combinations are valid. Am I overlooking >> something with your suggestion? Maybe there's something we can do with >> Lucene already that solves for this case and I'm just not aware of it? >> That's entirely possible and I'd love to learn more if there is! >> >> As for MultiRangeQuery and the mention of sandbox modules, I think >> that's a bit of a different use-case. MultiRangeQuery lets you filter >> by a disjunction of ranges. The "multi" part doesn't relate to >> "multiple values in a doc" (but it does support that, as do the >> "standard" range queries). >> >> Where I see a gap right now, beyond just faceting, is that we can >> represent N-dim points in the points index and filter on them (using >> the points index), but we have no doc values equivalent. This means, >> 1) we can't facet, and 2) we can't create a "slow" query that does >> post-filtering instead of using the points index (which could be a >> very real advantage in cases with a sparse match set but a dense >> points index). So I like the idea of creating that concept and being >> able to facet and filter on it. Whether-or-not this is a "formal" doc >> values type or sits on top of BDV, I have less of a strong opinion. >> >> And finally... it really should be multi-valued. The points index >> supports multiple points-per-field within a single document. Seems >> like a big gap that we wouldn't support that with a doc value field. >> Because BDV is inherently single-valued, I propose we come up with an >> encoding scheme that encodes multiple points on top of that "single" >> BDV entry. This is where building on BDV started to feel a little icky >> to me and it seemed like it might be a good use-case for actually >> formalizing a format/encoding, but again, no strong preference. We >> could certainly do something more quickly on top of BDV and formalize >> an encoding later if/as necessary. >> >> Thanks again for the discussion so far Marc, Partrick and Rob! >> >> Cheers, >> -Greg >> >> On Tue, May 24, 2022 at 10:35 AM Patrick Zhai <zhai7...@gmail.com> wrote: >> > >> > As pointed out by Rob in the issue >> > >> >> I would also suggest to start with the simple >> >> separate-numeric-docvalues-fields case and use similar logic as the >> >> org.apache.lucene.facet.range package, just on 2-D, or maybe 3-D, N-D, etc >> > >> > >> > I think that's a preferable solution to me, because: >> > 1. It does not couple the dimensions together so that people can combine >> > them freely >> > 2. It might be able to be compressed better >> > >> > Best >> > >> > On Tue, May 24, 2022 at 9:08 AM Marc D'Mello <marcd2...@gmail.com> wrote: >> >> >> >> Hi, >> >> >> >> Thanks for the responses! For Patrick's question, right now in faceting >> >> we don't have any good way to AND between two fields. I think the >> >> original hyper rectangle issue has a good example of a use case: >> >> https://issues.apache.org/jira/browse/LUCENE-10274. >> >> >> >> As for Robert's point, this feature would also allow us to use >> >> MultiRangeQuery in IndexOrDocValuesQuery, but MultiRangeQuery is itself >> >> in the sandbox module so I'm assuming that's a pretty exotic use case as >> >> well. I personally have no issues using BinaryDocValues for this, I was >> >> just wondering if it would be better to create a dedicated doc values, >> >> but it seems that is not that case. >> >> >> >> Thanks, >> >> Marc >> >> >> >> On Tue, May 24, 2022 at 1:27 AM Robert Muir <rcm...@gmail.com> wrote: >> >>> >> >>> This seems really exotic feature to add a dedicated docvalues field for. >> >>> >> >>> We should let BINARY be the catchall for stuff like this. >> >>> >> >>> On Mon, May 23, 2022 at 10:17 PM Marc D'Mello <marcd2...@gmail.com> >> >>> wrote: >> >>> > >> >>> > Hi, >> >>> > >> >>> > Some background: I've been working on this PR to add hyper rectangle >> >>> > faceting capabilities to Lucene facets and I needed to create a new >> >>> > doc values field to support this feature. Initially, I had a field >> >>> > that just extended BinaryDocValues, but then a discussion came up >> >>> > about whether to add a completely new DocValues field, maybe something >> >>> > like PointDocValuesField (and SortedPointDocValuesField as the >> >>> > multivalued version) to add first class support for this new field. >> >>> > Here is the link to the discussion. I think there are a few benefits >> >>> > to this: >> >>> > >> >>> > Formalize how we would store points as doc values rather than just >> >>> > packing points into a BinaryDocValues field in a format that could >> >>> > change at any time >> >>> > NumericDocValues enables us to create a SortedNumericDocValuesRange >> >>> > query which can be used with IndexOrDocValuesQuery to make some range >> >>> > queries more efficient. Adding this new doc values field would let us >> >>> > do the same thing with higher dimensional ranges >> >>> > >> >>> > I'm sure I could be missing some benefits, and I also am not super >> >>> > experienced with Lucene so there could be drawbacks I am missing as >> >>> > well :). From what I understand though, Lucene doesn't have a lot of >> >>> > DocValues fields and there should be some thought put into adding new >> >>> > ones, so I was wondering if I could get some feedback about the idea. >> >>> > Thanks! >> >>> >> >>> --------------------------------------------------------------------- >> >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> >>> For additional commands, e-mail: dev-h...@lucene.apache.org >> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org