Hi, On Thu, Sep 5, 2013 at 9:28 AM, Kristofer Karlsson <k...@spotify.com> wrote: > I have a use case where some of my documents have duplicate terms in > various fields or within the same field. > > For an example, I may have a million documents with just the term "foo" in > field A, and one particular document with the term "foo" in both field A > and B, or have two terms "foo" in the same field. > > If I search for "foo foo" I would like to filter out all the documents with > only one matching term - is this possible?
I don't think we have existing queries that allow for doing it efficiently (if someone reads this and knows it is wrong, please correct!). However, it should be doable to implement such a query rather easily by iterating over the postings lists of the 'foo' term in all the fields you are interested in, suming up frequencies (the index must have been created with IndexOptions.DOCS_AND_FREQS or higher) and only keeping documents whose sum of frequencies is at least 2. -- Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org