Re: SpanMultiTermQueryWrapper with PrefixQuery hitting num clause limit

Yixun Xu Thu, 28 Mar 2024 12:52:57 -0700

That makes sense. Thank you!

On Thu, Mar 28, 2024 at 12:58 PM Robert Muir <rcm...@gmail.com> wrote:


> using spans and wildcards together is asking for trouble, you will hit
> limits, it is not efficient by definition.
>
> I'd recommend to change your indexing so that your queries are fast
> and you aren't using wildcards that enumerate many terms at
> search-time.
> Don't index words such as "bar_294e50e1-fc3c-450f-a04f-7b4ad79587d6"
> and then use wildcards to match just "bar".
> Instead add a synonym "bar" (or similar, whatever you want) to
> "bar_294e50e1-fc3c-450f-a04f-7b4ad79587d6"
> This way you can match it with ordinary termquery: "bar"
>
> e.g. for your simple example, this would look approximately like this:
> instead of: abc foo bar_" + UUID.randomUUID()
> index something like: abc foo bar bar_" + UUID.randomUUID()
>
> but if you use an analyzer, then
> bar_294e50e1-fc3c-450f-a04f-7b4ad79587d6 and its synonym "bar" will
> sit at the same position, so your spans/sloppy-phrases will work fine.
>
> On Thu, Mar 28, 2024 at 11:37 AM Yixun Xu <yix...@gmail.com> wrote:
> >
> > Hello,
> >
> > We are trying to search for phrases where the last term is a prefix
> match.
> > For example, find all documents that contain "foo bar.*", with a
> > configurable slop between "foo" and "bar". We were able to do this using
> > `SpanNearQuery` where the last clause is a `SpanMultiTermQueryWrapper`
> that
> > wraps a `PrefixQuery`. However, this seems to run into the limit of 1024
> > clauses very quickly if the last term appears as a common prefix in the
> > index.
> >
> > I have a branch that reproduces the query at
> >
> https://github.com/apache/lucene/compare/main...yixunx:yx/span-query-limit?expand=1
> ,
> > and also pasted the code below.
> >
> > It seems that if slop = 0 then we can use `MultiPhraseQuery` instead,
> which
> > doesn't hit the clause limit. For the slop != 0 case, is it intended that
> > `SpanMultiTermQueryWrapper` can easily hit the clause limit, or am I
> using
> > the queries wrong? Is there a workaround other than increasing
> > `maxClauseCount`?
> >
> > Thank you for the help!
> >
> > ```java
> > public class TestSpanNearQueryClauseLimit extends LuceneTestCase {
> >
> >     private static final String FIELD_NAME = "field";
> >     private static final int NUM_DOCUMENTS = 1025;
> >
> >     /**
> >      * Creates an index with NUM_DOCUMENTS documents. Each document has a
> > text field in the form of "abc foo bar_[UUID]".
> >      */
> >     private Directory createIndex() throws Exception {
> >         Directory dir = newDirectory();
> >         try (IndexWriter writer = new IndexWriter(dir, new
> > IndexWriterConfig())) {
> >             for (int i = 0; i < NUM_DOCUMENTS; i++) {
> >                 Document doc = new Document();
> >                 doc.add(new TextField("field", "abc foo bar_" +
> > UUID.randomUUID(), Field.Store.YES));
> >                 writer.addDocument(doc);
> >             }
> >             writer.commit();
> >         }
> >         return dir;
> >     }
> >
> >     public void testSpanNearQueryClauseLimit() throws Exception {
> >         Directory dir = createIndex();
> >
> >         // Find documents that match "abc <some term> bar.*", which
> should
> > match all documents.
> >         try (IndexReader reader = DirectoryReader.open(dir)) {
> >             Query query = new SpanNearQuery.Builder(FIELD_NAME, true)
> >                     .setSlop(1)
> >                     .addClause(new SpanTermQuery(new Term(FIELD_NAME,
> > "abc")))
> >                     .addClause(new SpanMultiTermQueryWrapper<>(new
> > PrefixQuery(new Term(FIELD_NAME, "bar"))))
> >                     .build();
> >
> >             // This throws exception if NUM_DOCUMENTS is > 1024.
> >             // ```
> >             //
> org.apache.lucene.search.IndexSearcher$TooManyNestedClauses:
> > Query contains too many nested clauses;
> >             // maxClauseCount is set to 1024
> >             // ```
> >             TopDocs docs = new IndexSearcher(reader).search(query, 10);
> >             System.out.println(docs.totalHits);
> >         }
> >
> >         dir.close();
> >     }
> > }
> > ```
> >
> > Thank you,
> > Yixun Xu
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: SpanMultiTermQueryWrapper with PrefixQuery hitting num clause limit

Reply via email to