Hi Ian i think i found out the problem (from tests here http://www.devdaily.com/java/jwarehouse/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/shingle/ShingleAnalyzerWrapperTest.java.shtml)
if you generate the query as a BooleanQuery then it seems to work. The following works: BooleanQuery query = getShingleBooleanQuery(analyzer,title,fieldToSearch); TopDocs hits = searcher.search(query, 10); where private static BooleanQuery getShingleBooleanQuery(Analyzer analyzer, String qs, String fieldToSearch) throws Exception { BooleanQuery q = new BooleanQuery(); TokenStream ts = analyzer.tokenStream(fieldToSearch,new StringReader(qs)); CharTermAttribute termAtt = ts.addAttribute(CharTermAttribute.class); ts.reset(); while (ts.incrementToken()) { String termText = termAtt.toString(); q.add(new TermQuery(new Term(fieldToSearch, termText)),BooleanClause.Occur.SHOULD); } System.out.println("... parsed query: " + q); return q; } Thank you (again) for your help Peyman On Oct 11, 2011, at 3:51 PM, Ian Lea wrote: > Something does appear dodgy here. Using 3.4.0 the following very > simple code, with no custom classes > > ShingleAnalyzerWrapper saw = new ShingleAnalyzerWrapper(LUCENE_34); > QueryParser qp = new QueryParser(LUCENE_34, "t", saw); > String s = "simple sentences rule"; > Query q = qp.parse(s); > System.out.printf("%s parsed to %s\n", s, q); > > produces > > simple sentences rule parsed to t:simple t:sentences t:rule > > Like you, I would have expected there to be some shingles in there. > Are we both missing something? > > > -- > Ian. > > > On Tue, Oct 11, 2011 at 3:25 PM, Peyman Faratin <pey...@robustlinks.com> > wrote: >> Hi >> >> I have the following shinglefilter (Lucene 3.2) >> >> public TokenStream tokenStream(String fieldName, Reader reader) { >> StandardTokenizer first = new >> StandardTokenizer(Version.LUCENE_32, reader); >> StandardFilter second = new >> StandardFilter(Version.LUCENE_32,first); >> LowerCaseFilter third = new >> LowerCaseFilter(Version.LUCENE_32,second); >> StopFilter fourth = new >> StopFilter(Version.LUCENE_32,third,Stopwords); >> PositionFilter fifth = new PositionFilter(fourth); >> ShingleFilter filter = new ShingleFilter(fifth,shingleSize); >> return filter; >> } >> >> that produces the following token stream given sentence >> >> "please parse this sentence into a shingle of size 2. I'll pay $2 for it" >> >> 1: [_ parse:7->12:shingle] >> 2: [parse:7->12:<ALPHANUM>] [parse sentence:7->26:shingle] >> 3: [sentence:18->26:<ALPHANUM>] [sentence shingle:18->41:shingle] >> 4: [shingle:34->41:<ALPHANUM>] [shingle size:34->49:shingle] >> 5: [size:45->49:<ALPHANUM>] [size 2:45->51:shingle] >> 6: [2:50->51:<NUM>] [2 pay:50->61:shingle] >> 7: [pay:58->61:<ALPHANUM>] [pay 2:58->64:shingle] >> 8: [2:63->64:<NUM>] >> >> The query analyzer produces the following analyzed query for the field >> "titleShingled" for above sentence: >> >> ...... analyzed query:titleShingled:parse titleShingled:sentence >> titleShingled:shingle titleShingled:size titleShingled:2 titleShingled:pay >> titleShingled:2 >> >> As you can see there is no bigram singles in the query. I tried removing the >> unigrams from the token stream (using filter.setOutputUnigrams(false) in >> above shingles filter) but even though the singles seem to be fine the query >> is empty >> >> >> 1: [_ parse:7->12:shingle] >> 2: [parse sentence:7->26:shingle] >> 3: [sentence shingle:18->41:shingle] >> 4: [shingle size:34->49:shingle] >> 5: [size 2:45->51:shingle] >> 6: [2 pay:50->61:shingle] >> 7: [pay 2:58->64:shingle] >> >> ...... analyzed query: >> >> My goal is to index both unigrams and bigrams but first try to search on >> bigrams. I think it is the queryparser that is parsing the shingles in a >> manner that I am not understanding properly. >> >> QueryParser parser = new >> QueryParser(Version.LUCENE_32,"titleShingled",new >> ShinglesAnalyzer(2,Stopwords)); >> >> Any help would be very much appreciated >> >> Peyman >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org >