Thanks Allison. I will try it. Let you know if it works.
Evert Wagenaar Op dinsdag 25 oktober 2016 heeft Allison, Timothy B. <talli...@mitre.org> het volgende geschreven: > A WildcardTerm subclasses a MultitermQuery. If you are using the > QueryParser, you need to set the rewrite method on the parser. > > Try this…and beware of hitting the max BooleanQuery clause limit…and/or > reset that > > > > BooleanQuery.setMaxClauseCount(numberBigEnoughForYourNeeds); > > > > import java.util.HashSet; > import java.util.Set; > > import org.apache.lucene.analysis.Analyzer; > import org.apache.lucene.analysis.standard.StandardAnalyzer; > import org.apache.lucene.document.Document; > import org.apache.lucene.document.Field; > import org.apache.lucene.document.TextField; > import org.apache.lucene.index.DirectoryReader; > import org.apache.lucene.index.IndexReader; > import org.apache.lucene.index.IndexWriter; > import org.apache.lucene.index.IndexWriterConfig; > import org.apache.lucene.index.Term; > import org.apache.lucene.queryparser.classic.QueryParser; > import org.apache.lucene.search.IndexSearcher; > import org.apache.lucene.search.MultiTermQuery; > import org.apache.lucene.search.Query; > import org.apache.lucene.search.Weight; > import org.apache.lucene.store.Directory; > import org.apache.lucene.store.RAMDirectory; > > public class RewriteTest { > > > > > /** Simple command-line based search demo. */ > public static void main(String[] args) throws Exception { > Analyzer analyzer = new StandardAnalyzer(); > String field = "contents"; > Directory directory = new RAMDirectory(); > IndexWriterConfig config = new IndexWriterConfig(analyzer); > IndexWriter indexWriter = new IndexWriter(directory, config); > for (int i = 0; i < 100; i++) { > Document d = new Document(); > d.add(new TextField(field, "aard00"+i, Field.Store.YES)); > indexWriter.addDocument(d); > } > indexWriter.flush(); > indexWriter.close(); > > String queryString = "aard????"; > > IndexReader reader = DirectoryReader.open(directory); > IndexSearcher searcher = new IndexSearcher(reader); > > > QueryParser parser = new QueryParser(field, analyzer); > parser.setMultiTermRewriteMethod(MultiTermQuery.CONSTANT_SCORE_ > BOOLEAN_REWRITE); > Query q = parser.parse(queryString); > q = q.rewrite(reader); > Set<Term> terms = new HashSet<>(); > Weight weight = q.createWeight(searcher, false); > weight.extractTerms(terms); > for (Term t : terms) { > System.out.println(t); > } > reader.close(); > } > > } > > > From: Evert Wagenaar [mailto:evert.wagen...@gmail.com <javascript:;>] > Sent: Tuesday, October 25, 2016 1:42 PM > To: java-user@lucene.apache.org <javascript:;> > Subject: Re: How to get the terms matching a WildCardQuery in Lucene 6.2? > > Hi Allison, > > Unfortunately I can't compile the code (see below). Can you tell me what's > wrong? > I tried both MultiTermQuery.SCORING_BOOLEAN_REWRITE and > CONSTANT_SCORE_BOOLEAN_REWRITE > > What I don't understand actually is the relation between my Query (which > is a wildcard Query and not a MultiTermQuery. > > Can you explain? > > Thanks, > > Evert Wagenaar > > > [Inline image 1] > > Full code of Searcher: > > > package tk.evertwagenaar.lucene; > > > > import java.io.BufferedReader; > > import java.io.IOException; > > import java.io.InputStreamReader; > > import java.nio.charset.StandardCharsets; > > import java.nio.file.Files; > > import java.nio.file.Paths; > > import java.util.Date; > > import java.util.HashSet; > > import java.util.Set; > > > > import org.apache.lucene.analysis.Analyzer; > > import org.apache.lucene.analysis.standard.StandardAnalyzer; > > import org.apache.lucene.document.Document; > > import org.apache.lucene.index.DirectoryReader; > > import org.apache.lucene.index.IndexReader; > > import org.apache.lucene.index.Term; > > import org.apache.lucene.queryparser.classic.QueryParser; > > import org.apache.lucene.search.IndexSearcher; > > import org.apache.lucene.search.MultiTermQuery; > > import org.apache.lucene.search.Query; > > import org.apache.lucene.search.ScoreDoc; > > import org.apache.lucene.search.TopDocs; > > import org.apache.lucene.search.Weight; > > import org.apache.lucene.store.FSDirectory; > > > > /** Simple command-line based search demo. */ > > public class SearchFiles { > > > > private static IndexReader reader; > > private static Query q; > > > > private SearchFiles() { > > } > > > > /** Simple command-line based search demo. */ > > public static void main(String[] args) throws Exception { > > String usage = "Usage:\tjava org.apache.lucene.demo.SearchFiles > [-index dir] [-field f] [-repeat n] [-queries file] [-query string] [-raw] > [-paging hitsPerPage]\n\nSee http://lucene.apache.org/core/4_1_0/demo/ > for details."; > > if (args.length > 0 && ("-h".equals(args[0]) || > "-help".equals(args[0]))) { > > System.out.println(usage); > > System.exit(0); > > } > > > > String index = "index"; > > String field = "contents"; > > String queries = null; > > int repeat = 0; > > boolean raw = false; > > String queryString = "aard????"; > > int hitsPerPage = 10; > > > > reader = DirectoryReader.open(FSDirectory.open(Paths.get( > index))); > > IndexSearcher searcher = new IndexSearcher(reader); > > Analyzer analyzer = new StandardAnalyzer(); > > > > BufferedReader in = null; > > > > QueryParser parser = new QueryParser(field, analyzer); > > while (true) { > > if (queries == null && queryString == null) { // > prompt the user > > System.out.println("Enter query: "); > > } > > > > Query q = parser.parse(queryString); > > System.out.println("Searching for: " + > q.toString(field)); > > > > if (repeat > 0) { // repeat & time as benchmark > > Date start = new Date(); > > for (int i = 0; i < repeat; i++) { > > searcher.search(q, 100); > > } > > Date end = new Date(); > > System.out.println("Time: " + (end.getTime() - > start.getTime()) + "ms"); > > doPagingSearch(in, searcher, q, hitsPerPage, > raw, queries == null && queryString == null); > > > > > > MultiTermQuery.CONSTANT_SCORE_BOOLEAN_REWRITE > > > > q = q.rewrite(reader); > > Set<Term> terms = new HashSet<>(); > > Weight weight = q.createWeight(searcher, > false); > > terms = weight.extractTerms(terms); > > > > System.out.println("Match: " + terms); > > reader.close(); > > > > } > > } > > } > > > > /** > > * Search the Query against the Index > > */ > > public static void doPagingSearch(BufferedReader in, IndexSearcher > searcher, Query query, int hitsPerPage, > > boolean raw, boolean interactive) throws IOException { > > > > // Collect enough docs to show 5 pages > > TopDocs results = searcher.search(query, 5 * hitsPerPage); > > ScoreDoc[] hits = results.scoreDocs; > > > > int numTotalHits = results.totalHits; > > System.out.println(numTotalHits + " total matching > documents"); > > > > int start = 0; > > int end = Math.min(numTotalHits, hitsPerPage); > > > > hits = searcher.search(query, numTotalHits).scoreDocs; > > end = Math.min(hits.length, start + hitsPerPage); > > > > for (int i = start; i < end; i++) { > > Document doc = searcher.doc(hits[i].doc); > > String path = doc.get("path"); > > System.out.println((i + 1) + ". " + path); > > query.rewrite(reader); > > } > > } > > } > Evert Wagenaar > > On Tue, Oct 25, 2016 at 1:58 AM, Evert Wagenaar <evert.wagen...@gmail.com > <javascript:;><mailto:evert.wagen...@gmail.com <javascript:;>>> wrote: > Thanks Allison. I will try it. > > > Op maandag 24 oktober 2016 heeft Allison, Timothy B. <talli...@mitre.org > <javascript:;><mailto:talli...@mitre.org <javascript:;>>> het volgende > geschreven: > Make sure to setRewriteMethod on the MultiTermQuery to: > MultiTermQuery.SCORING_BOOLEAN_REWRITE or CONSTANT_SCORE_BOOLEAN_REWRITE > > Then something like this should work: > > q = q.rewrite(reader); > > Set<Term> terms = new HashSet<>(); > Weight weight = q.createWeight(searcher, false); > > weight.extractTerms(terms); > > > > -----Original Message----- > From: Evert Wagenaar [mailto:evert.wagen...@gmail.com <javascript:;>] > Sent: Monday, October 24, 2016 12:41 PM > To: java-user@lucene.apache.org <javascript:;> > Subject: How to get the terms matching a WildCardQuery in Lucene 6.2? > > I already asked this on StackOverflow. Unfortunately without any answer > for over a week now. > > Therefore again to the real experts: > > > I downloaded a list of 350.000 English words in a .txt file and Indexed it > using the latest Lucene (6.2). I want to apply wildcard queries like > aard???? and then retreive a list of matches. > > I've done this before in an older version of Lucene. Here it was pretty > simple. I just had to do a Query.rewrite() and this retuned what I needed. > Unfortunately in 6.2 this doesn't work anymore. There is a > Query.rewrite(Indexreader reader) which should return a HashMap of Terms. > In my case there's only one matching Term (aardvark). The Searcher returns > one hit, containing the Document path to the wordlist. The HashMap is > however empty. > > When I change the Query to find more then one single match (like aa*) the > HashMap remains empty. > > I tried the MatchExtractor too. Unfortunately without result. > > The Objective of this is to demonstrate the power of Lucene to easily find > words of a particular length, given one or more characters. I'm pretty sure > I can do this using regular expressions in Java but then it's outside my > objective. > > Can anyone tell me why this isn't working? I use the StandardAnalyzer. > Should I use a different Application? > > Any help is greatly appreciated. > > Thanks. > > > > -- > Sent from Gmail IPad > > > -- > Sent from Gmail IPad > > -- Sent from Gmail IPad