Hi everyone, I work on the Lucene product search team at Amazon. We’ve been considering indexing scoring signals for ordinals into the taxonomy, which could reduce index size for some use-cases.
Example Let's consider a library of research papers, where each paper is represented by a Lucene document and the paper's author is a facet field in that document. For each author we store the total number of citations. We want to compute a measure of each author's impact, the total number of citations divided by the number of articles published. Implementation Each author will be assigned an ordinal in the taxonomy. Lucene doesn't currently support storing data about an ordinal, but the taxonomy is itself a Lucene index, where each ordinal is represented by a document. Right now, the ordinal document has only a few fields allowing it to model the taxonomy structure, but we could conceivably add arbitrary fields to the ordinal documents. We would index the total number of citations an author has as a DocValue in the corresponding ordinal document. Advantages The alternative would be to denormalize data about the authors and have it on each doc that references that author. This leads to duplication. Since Lucene already has a document representation of the author (the ordinal doc), it makes sense conceptually that data about the author should be associated with the ordinal doc. I'm curious if anyone else has tried something like this and if the approach seems reasonable. I’ve made an attempt to code it and I can open a PR if this sounds like a useful feature. Stefan --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org