Hi Greg, I see now where my example didn’t give enough info. In my mind, `Genre / Author nationality / Author name` is stored in one hierarchical facet field. The data we’re aggregating over, like publish date or price, are stored in DocValues.
The demo package shows something similar [1], where the aggregation is computed across a facet field using data from a `popularity` DocValue. In the demo, we compute `sum(_score * sqrt(popularity))`, but what if we want several other different aggregations with respect to the same facet field? Maybe we want `max(popularity)`. In that case, iterating twice duplicates most of the work, correct? Stefan [1] https://github.com/apache/lucene/blob/7f8b7ffbcad2265b047a5e2195f76cc924028063/lucene/demo/src/java/org/apache/lucene/demo/facet/ExpressionAggregationFacetsExample.java#L91 On Mon, 13 Feb 2023 at 22:46, Greg Miller <[email protected]> wrote: > > Hi Stefan- > > That helps, thanks. I'm a bit confused about where you're concerned with > iterating over the match set multiple times. Is this a situation where the > ordinals you want to facet over are stored in different index fields, so > you have to create multiple Facets instances (one per field) to compute the > aggregations? If that's the case, then yes—you have to iterate over the > match set multiple times (once per field). I'm not sure that's such a big > issue given that you're doing novel work during each iteration, so the only > repetitive cost is actually iterating the hits. If the ordinals are > "packed" into the same field though (which is the default in Lucene if > you're using taxonomy faceting), then you should only need to do a single > iteration over that field. > > Cheers, > -Greg > > On Sat, Feb 11, 2023 at 2:27 AM Stefan Vodita <[email protected]> > wrote: > > > Hi Greg, > > > > I’m assuming we have one match-set which was not constrained by any > > of the categories we want to aggregate over, so it may have books by > > Mark Twain, books by American authors, and sci-fi books. > > > > Maybe we can imagine we obtained it by searching for a keyword, say > > “Washington”, which is present in Mark Twain’s writing, and those of other > > American authors, and in sci-fi novels too. > > > > Does that make the example clearer? > > > > > > Stefan > > > > > > On Sat, 11 Feb 2023 at 00:16, Greg Miller <[email protected]> wrote: > > > > > > Hi Stefan- > > > > > > Can you clarify your example a little bit? It sounds like you want to > > facet > > > over three different match sets (one constrained by "Mark Twain" as the > > > author, one constrained by "American authors" and one constrained by the > > > "sci-fi" genre). Is that correct? > > > > > > Cheers, > > > -Greg > > > > > > On Fri, Feb 10, 2023 at 11:33 AM Stefan Vodita <[email protected]> > > > wrote: > > > > > > > Hi all, > > > > > > > > Let’s say I have an index of books, similar to the example in the facet > > > > demo [1] > > > > with a hierarchical facet field encapsulating `Genre / Author’s > > > > nationality / > > > > Author’s name`. > > > > > > > > I might like to find the latest publish date of a book written by Mark > > > > Twain, the > > > > sum of the prices of books written by American authors, and the number > > of > > > > sci-fi novels. > > > > > > > > As far as I understand, this would require faceting 3 times over the > > > > match-set, > > > > one iteration for each aggregation of a different type (max(date), > > > > sum(price), > > > > count). That seems inefficient if we could instead compute all > > > > aggregations in > > > > one pass. > > > > > > > > Is there a way to do that? > > > > > > > > > > > > Stefan > > > > > > > > [1] > > > > > > https://javadoc.io/doc/org.apache.lucene/lucene-demo/latest/org/apache/lucene/demo/facet/package-summary.html > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [email protected] > > > > For additional commands, e-mail: [email protected] > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
