Re: Computing multiple different aggregations over a match-set in one pass

Stefan Vodita Tue, 14 Feb 2023 03:19:53 -0800

Hi Greg,

I see now where my example didn’t give enough info. In my mind, `Genre /
Author nationality / Author name` is stored in one hierarchical facet field.
The data we’re aggregating over, like publish date or price, are stored in
DocValues.


The demo package shows something similar [1], where the aggregation
is computed across a facet field using data from a `popularity` DocValue.

In the demo, we compute `sum(_score * sqrt(popularity))`, but what if we
want several other different aggregations with respect to the same facet
field? Maybe we want `max(popularity)`. In that case, iterating twice
duplicates most of the work, correct?


Stefan

[1] 
https://github.com/apache/lucene/blob/7f8b7ffbcad2265b047a5e2195f76cc924028063/lucene/demo/src/java/org/apache/lucene/demo/facet/ExpressionAggregationFacetsExample.java#L91

On Mon, 13 Feb 2023 at 22:46, Greg Miller <gsmil...@gmail.com> wrote:
>
> Hi Stefan-
>
> That helps, thanks. I'm a bit confused about where you're concerned with
> iterating over the match set multiple times. Is this a situation where the
> ordinals you want to facet over are stored in different index fields, so
> you have to create multiple Facets instances (one per field) to compute the
> aggregations? If that's the case, then yes—you have to iterate over the
> match set multiple times (once per field). I'm not sure that's such a big
> issue given that you're doing novel work during each iteration, so the only
> repetitive cost is actually iterating the hits. If the ordinals are
> "packed" into the same field though (which is the default in Lucene if
> you're using taxonomy faceting), then you should only need to do a single
> iteration over that field.
>
> Cheers,
> -Greg
>
> On Sat, Feb 11, 2023 at 2:27 AM Stefan Vodita <stefan.vod...@gmail.com>
> wrote:
>
> > Hi Greg,
> >
> > I’m assuming we have one match-set which was not constrained by any
> > of the categories we want to aggregate over, so it may have books by
> > Mark Twain, books by American authors, and sci-fi books.
> >
> > Maybe we can imagine we obtained it by searching for a keyword, say
> > “Washington”, which is present in Mark Twain’s writing, and those of other
> > American authors, and in sci-fi novels too.
> >
> > Does that make the example clearer?
> >
> >
> > Stefan
> >
> >
> > On Sat, 11 Feb 2023 at 00:16, Greg Miller <gsmil...@gmail.com> wrote:
> > >
> > > Hi Stefan-
> > >
> > > Can you clarify your example a little bit? It sounds like you want to
> > facet
> > > over three different match sets (one constrained by "Mark Twain" as the
> > > author, one constrained by "American authors" and one constrained by the
> > > "sci-fi" genre). Is that correct?
> > >
> > > Cheers,
> > > -Greg
> > >
> > > On Fri, Feb 10, 2023 at 11:33 AM Stefan Vodita <stefan.vod...@gmail.com>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Let’s say I have an index of books, similar to the example in the facet
> > > > demo [1]
> > > > with a hierarchical facet field encapsulating `Genre / Author’s
> > > > nationality /
> > > > Author’s name`.
> > > >
> > > > I might like to find the latest publish date of a book written by Mark
> > > > Twain, the
> > > > sum of the prices of books written by American authors, and the number
> > of
> > > > sci-fi novels.
> > > >
> > > > As far as I understand, this would require faceting 3 times over the
> > > > match-set,
> > > > one iteration for each aggregation of a different type (max(date),
> > > > sum(price),
> > > > count). That seems inefficient if we could instead compute all
> > > > aggregations in
> > > > one pass.
> > > >
> > > > Is there a way to do that?
> > > >
> > > >
> > > > Stefan
> > > >
> > > > [1]
> > > >
> > https://javadoc.io/doc/org.apache.lucene/lucene-demo/latest/org/apache/lucene/demo/facet/package-summary.html
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > > >
> > > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Computing multiple different aggregations over a match-set in one pass

Reply via email to