That makes sense. Thank you! On Thu, Mar 28, 2024 at 12:58 PM Robert Muir <rcm...@gmail.com> wrote:
> using spans and wildcards together is asking for trouble, you will hit > limits, it is not efficient by definition. > > I'd recommend to change your indexing so that your queries are fast > and you aren't using wildcards that enumerate many terms at > search-time. > Don't index words such as "bar_294e50e1-fc3c-450f-a04f-7b4ad79587d6" > and then use wildcards to match just "bar". > Instead add a synonym "bar" (or similar, whatever you want) to > "bar_294e50e1-fc3c-450f-a04f-7b4ad79587d6" > This way you can match it with ordinary termquery: "bar" > > e.g. for your simple example, this would look approximately like this: > instead of: abc foo bar_" + UUID.randomUUID() > index something like: abc foo bar bar_" + UUID.randomUUID() > > but if you use an analyzer, then > bar_294e50e1-fc3c-450f-a04f-7b4ad79587d6 and its synonym "bar" will > sit at the same position, so your spans/sloppy-phrases will work fine. > > On Thu, Mar 28, 2024 at 11:37 AM Yixun Xu <yix...@gmail.com> wrote: > > > > Hello, > > > > We are trying to search for phrases where the last term is a prefix > match. > > For example, find all documents that contain "foo bar.*", with a > > configurable slop between "foo" and "bar". We were able to do this using > > `SpanNearQuery` where the last clause is a `SpanMultiTermQueryWrapper` > that > > wraps a `PrefixQuery`. However, this seems to run into the limit of 1024 > > clauses very quickly if the last term appears as a common prefix in the > > index. > > > > I have a branch that reproduces the query at > > > https://github.com/apache/lucene/compare/main...yixunx:yx/span-query-limit?expand=1 > , > > and also pasted the code below. > > > > It seems that if slop = 0 then we can use `MultiPhraseQuery` instead, > which > > doesn't hit the clause limit. For the slop != 0 case, is it intended that > > `SpanMultiTermQueryWrapper` can easily hit the clause limit, or am I > using > > the queries wrong? Is there a workaround other than increasing > > `maxClauseCount`? > > > > Thank you for the help! > > > > ```java > > public class TestSpanNearQueryClauseLimit extends LuceneTestCase { > > > > private static final String FIELD_NAME = "field"; > > private static final int NUM_DOCUMENTS = 1025; > > > > /** > > * Creates an index with NUM_DOCUMENTS documents. Each document has a > > text field in the form of "abc foo bar_[UUID]". > > */ > > private Directory createIndex() throws Exception { > > Directory dir = newDirectory(); > > try (IndexWriter writer = new IndexWriter(dir, new > > IndexWriterConfig())) { > > for (int i = 0; i < NUM_DOCUMENTS; i++) { > > Document doc = new Document(); > > doc.add(new TextField("field", "abc foo bar_" + > > UUID.randomUUID(), Field.Store.YES)); > > writer.addDocument(doc); > > } > > writer.commit(); > > } > > return dir; > > } > > > > public void testSpanNearQueryClauseLimit() throws Exception { > > Directory dir = createIndex(); > > > > // Find documents that match "abc <some term> bar.*", which > should > > match all documents. > > try (IndexReader reader = DirectoryReader.open(dir)) { > > Query query = new SpanNearQuery.Builder(FIELD_NAME, true) > > .setSlop(1) > > .addClause(new SpanTermQuery(new Term(FIELD_NAME, > > "abc"))) > > .addClause(new SpanMultiTermQueryWrapper<>(new > > PrefixQuery(new Term(FIELD_NAME, "bar")))) > > .build(); > > > > // This throws exception if NUM_DOCUMENTS is > 1024. > > // ``` > > // > org.apache.lucene.search.IndexSearcher$TooManyNestedClauses: > > Query contains too many nested clauses; > > // maxClauseCount is set to 1024 > > // ``` > > TopDocs docs = new IndexSearcher(reader).search(query, 10); > > System.out.println(docs.totalHits); > > } > > > > dir.close(); > > } > > } > > ``` > > > > Thank you, > > Yixun Xu > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >