A WildcardTerm subclasses a MultitermQuery. If you are using the QueryParser,
you need to set the rewrite method on the parser.
Try this…and beware of hitting the max BooleanQuery clause limit…and/or reset
that
BooleanQuery.setMaxClauseCount(numberBigEnoughForYourNeeds);
import java.util.HashSet;
import java.util.Set;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.MultiTermQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.Weight;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
public class RewriteTest {
/** Simple command-line based search demo. */
public static void main(String[] args) throws Exception {
Analyzer analyzer = new StandardAnalyzer();
String field = "contents";
Directory directory = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(analyzer);
IndexWriter indexWriter = new IndexWriter(directory, config);
for (int i = 0; i < 100; i++) {
Document d = new Document();
d.add(new TextField(field, "aard00"+i, Field.Store.YES));
indexWriter.addDocument(d);
}
indexWriter.flush();
indexWriter.close();
String queryString = "aard????";
IndexReader reader = DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);
QueryParser parser = new QueryParser(field, analyzer);
parser.setMultiTermRewriteMethod(MultiTermQuery.CONSTANT_SCORE_BOOLEAN_REWRITE);
Query q = parser.parse(queryString);
q = q.rewrite(reader);
Set<Term> terms = new HashSet<>();
Weight weight = q.createWeight(searcher, false);
weight.extractTerms(terms);
for (Term t : terms) {
System.out.println(t);
}
reader.close();
}
}
From: Evert Wagenaar [mailto:[email protected]]
Sent: Tuesday, October 25, 2016 1:42 PM
To: [email protected]
Subject: Re: How to get the terms matching a WildCardQuery in Lucene 6.2?
Hi Allison,
Unfortunately I can't compile the code (see below). Can you tell me what's
wrong?
I tried both MultiTermQuery.SCORING_BOOLEAN_REWRITE and
CONSTANT_SCORE_BOOLEAN_REWRITE
What I don't understand actually is the relation between my Query (which is a
wildcard Query and not a MultiTermQuery.
Can you explain?
Thanks,
Evert Wagenaar
[Inline image 1]
Full code of Searcher:
package tk.evertwagenaar.lucene;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Date;
import java.util.HashSet;
import java.util.Set;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.MultiTermQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.Weight;
import org.apache.lucene.store.FSDirectory;
/** Simple command-line based search demo. */
public class SearchFiles {
private static IndexReader reader;
private static Query q;
private SearchFiles() {
}
/** Simple command-line based search demo. */
public static void main(String[] args) throws Exception {
String usage = "Usage:\tjava org.apache.lucene.demo.SearchFiles
[-index dir] [-field f] [-repeat n] [-queries file] [-query string] [-raw]
[-paging hitsPerPage]\n\nSee http://lucene.apache.org/core/4_1_0/demo/ for
details.";
if (args.length > 0 && ("-h".equals(args[0]) ||
"-help".equals(args[0]))) {
System.out.println(usage);
System.exit(0);
}
String index = "index";
String field = "contents";
String queries = null;
int repeat = 0;
boolean raw = false;
String queryString = "aard????";
int hitsPerPage = 10;
reader = DirectoryReader.open(FSDirectory.open(Paths.get(index)));
IndexSearcher searcher = new IndexSearcher(reader);
Analyzer analyzer = new StandardAnalyzer();
BufferedReader in = null;
QueryParser parser = new QueryParser(field, analyzer);
while (true) {
if (queries == null && queryString == null) { // prompt
the user
System.out.println("Enter query: ");
}
Query q = parser.parse(queryString);
System.out.println("Searching for: " + q.toString(field));
if (repeat > 0) { // repeat & time as benchmark
Date start = new Date();
for (int i = 0; i < repeat; i++) {
searcher.search(q, 100);
}
Date end = new Date();
System.out.println("Time: " + (end.getTime() -
start.getTime()) + "ms");
doPagingSearch(in, searcher, q, hitsPerPage, raw,
queries == null && queryString == null);
MultiTermQuery.CONSTANT_SCORE_BOOLEAN_REWRITE
q = q.rewrite(reader);
Set<Term> terms = new HashSet<>();
Weight weight = q.createWeight(searcher, false);
terms = weight.extractTerms(terms);
System.out.println("Match: " + terms);
reader.close();
}
}
}
/**
* Search the Query against the Index
*/
public static void doPagingSearch(BufferedReader in, IndexSearcher
searcher, Query query, int hitsPerPage,
boolean raw, boolean interactive) throws IOException {
// Collect enough docs to show 5 pages
TopDocs results = searcher.search(query, 5 * hitsPerPage);
ScoreDoc[] hits = results.scoreDocs;
int numTotalHits = results.totalHits;
System.out.println(numTotalHits + " total matching documents");
int start = 0;
int end = Math.min(numTotalHits, hitsPerPage);
hits = searcher.search(query, numTotalHits).scoreDocs;
end = Math.min(hits.length, start + hitsPerPage);
for (int i = start; i < end; i++) {
Document doc = searcher.doc(hits[i].doc);
String path = doc.get("path");
System.out.println((i + 1) + ". " + path);
query.rewrite(reader);
}
}
}
Evert Wagenaar
On Tue, Oct 25, 2016 at 1:58 AM, Evert Wagenaar
<[email protected]<mailto:[email protected]>> wrote:
Thanks Allison. I will try it.
Op maandag 24 oktober 2016 heeft Allison, Timothy B.
<[email protected]<mailto:[email protected]>> het volgende geschreven:
Make sure to setRewriteMethod on the MultiTermQuery to:
MultiTermQuery.SCORING_BOOLEAN_REWRITE or CONSTANT_SCORE_BOOLEAN_REWRITE
Then something like this should work:
q = q.rewrite(reader);
Set<Term> terms = new HashSet<>();
Weight weight = q.createWeight(searcher, false);
weight.extractTerms(terms);
-----Original Message-----
From: Evert Wagenaar [mailto:[email protected]]
Sent: Monday, October 24, 2016 12:41 PM
To: [email protected]
Subject: How to get the terms matching a WildCardQuery in Lucene 6.2?
I already asked this on StackOverflow. Unfortunately without any answer for
over a week now.
Therefore again to the real experts:
I downloaded a list of 350.000 English words in a .txt file and Indexed it
using the latest Lucene (6.2). I want to apply wildcard queries like aard????
and then retreive a list of matches.
I've done this before in an older version of Lucene. Here it was pretty simple.
I just had to do a Query.rewrite() and this retuned what I needed.
Unfortunately in 6.2 this doesn't work anymore. There is a
Query.rewrite(Indexreader reader) which should return a HashMap of Terms.
In my case there's only one matching Term (aardvark). The Searcher returns one
hit, containing the Document path to the wordlist. The HashMap is however empty.
When I change the Query to find more then one single match (like aa*) the
HashMap remains empty.
I tried the MatchExtractor too. Unfortunately without result.
The Objective of this is to demonstrate the power of Lucene to easily find
words of a particular length, given one or more characters. I'm pretty sure I
can do this using regular expressions in Java but then it's outside my
objective.
Can anyone tell me why this isn't working? I use the StandardAnalyzer.
Should I use a different Application?
Any help is greatly appreciated.
Thanks.
--
Sent from Gmail IPad
--
Sent from Gmail IPad