Re: How can i get collect stemmed query?
Thank you very much~! I'll try it :) -- View this message in context: http://lucene.472066.n3.nabble.com/How-can-i-get-collect-search-result-from-custom-filtered-query-tp1723055p1742898.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How can i get collect stemmed query?
Oh you are constructing the string 'fly +body:away' in your StemFilter? Just to make sure, does this q=+body:(fly away) return your document? And analysis.jsp (at query time) displays 'fly +body:away' from the string 'flyaway'? I don't know why are you doing this but your stemfilter should return only terms, not field names attached to it. Maybe you can find this useful so that you can do what you want without writing custom code. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory --- On Tue, 10/19/10, Jerad ag...@naver.com wrote: From: Jerad ag...@naver.com Subject: Re: How can i get collect stemmed query? To: solr-user@lucene.apache.org Date: Tuesday, October 19, 2010, 5:10 AM Thanks for your reply :) 1. I tested that q=*:*fl=body , 1 doc returned as result as I expected. 2. I'm edit my scheme.xml as you instructed. analyzer type=query class=com.testsolr.ir.customAnalyzer.MyCustomQueryAnalyzer //No filter description. /analyzer but no result returned. 3. I wonder that... Tipically Tokenizer and filter flow was 1) Input stream provide text stream to tokenizer or filter. 2) tokenizer or filter get a token, and processed token and offset attribute info has returned. 3) offset attributes has the infomation of token's. This is a part of tipical filter src that I thought. public class CustomStemFilter extends TokenFilter { private MyCustomStemer stemmer; private TermAttribute termAttr; private OffsetAttribute offsetAttr; private TypeAttribute typeAttr; private HashtableString,String reserved = new HashtableString,String(); public CustomStemFilter( TokenStream tokenStream, boolean isQuery, MyCustomStemer stemmer ){ super( tokenStream ); this.stemmer = stemmer; termAttr = (TermAttribute) addAttribute(TermAttribute.class); offsetAttr = (OffsetAttribute) addAttribute(OffsetAttribute.class); typeAttr = (TypeAttribute) addAttribute(TypeAttribute.class); addAttribute(PositionIncrementAttribute.class); //Some of my custom logic here. //do something. } private MyCustomStemmer stemmer = new MyCustomStemmer(); public boolean incrementToken() throws IOException { clearAttributes(); if (!input.incrementToken()) return false; StringBuffer queryBuffer = new StringBuffer(); //stemming logic here. //generated query string has append to queryBuffer. termAttr.setTermBuffer(queryBuffer.toString(), 0, queryBuffer.length()); offsetAttr.setOffset(0, queryBuffer.length()); offSet += queryBuffer.length(); typeAttr.setType(word); return true; } } ※ MyCustomStemmer analyze input string flyaway to query string : fly +body:away and return it. At index time, contents to be searched is normally analyzed and indexed as below. a) Contents to be indexed : fly away b) Token fly and length of fly = 3(Has been setup by offset attribute method) has returned by filter or analyzer. c) Next token away and length of away = 4 has returned. I think it's a general index flow. But, I customized MyCustomFilter that filter generate query string, not a token. In the process, offset value has changed : query's length, not a single token's length. I wonder that value to be set up by offsetAttr.setOffset() method has influence on search result on using solr? (I tested this on main page's query input box at http://localhost:8983/solr/admin/ ) -- View this message in context: http://lucene.472066.n3.nabble.com/How-can-i-get-collect-search-result-from-custom-filtered-query-tp1723055p1729717.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How can i get collect stemmed query?
Are you using KLTQueryAnalyzer outside of the Solr? (pre-process) Or you defined a fieldType in schema.xml that uses KLTQueryAnalyzer? Can you append debugQuery=on to your search url and paste output? --- On Mon, 10/18/10, Jerad ag...@naver.com wrote: From: Jerad ag...@naver.com Subject: How can i get collect stemmed query? To: solr-user@lucene.apache.org Date: Monday, October 18, 2010, 9:15 AM Hi~. I'm beginner who wanna make search system by using solr 1.4.1 and lucene 2.92. I got a collect lucene query from my custom Analyzer and filter from given query, but no result displayed. Here is my Analyzer source. -- public class KLTQueryAnalyzer extends Analyzer{ public static final Version LUCENE_VERSION = Version.LUCENE_29; public static int QUERY_MIN_LEN_WORD_FILTER = 1; public static int QUERY_MAX_LEN_WORD_FILTER = 40; public int elapsedTime = 0; @Override public TokenStream tokenStream(String paramString, Reader reader) { StandardTokenizer tokenizer = new StandardTokenizer( du.utas.mcrdr.ir.lucene.WebDocIR.LUCENE_VERSION, reader ); TokenStream tokenStream = new LengthFilter( tokenizer, QUERY_MIN_LEN_WORD_FILTER, QUERY_MAX_LEN_WORD_FILTER ); tokenStream = new LowerCaseFilter( tokenStream ); //My custom stemmer method KLTSingleWordStemmer stemer = new KLTSingleWordStemmer(QUERY_MIN_LEN_WORD_FILTER, QUERY_MAX_LEN_WORD_FILTER); //My custom analyzer filter. this filter return sub-merged query. //ex) INPUT : flyaway // RETURN VALUE : fly +body:away tokenStream = new KLTQueryStemFilter( tokenStream, stemer, this ); return tokenStream; } } -- example query) Input User query : +body:flyaway Expected analyzed query : +body:fly +body:away INDEXED DATA : body fly away I'm expecting 1 docs returned from index, but I have no result returned. explain my custom flow 1. User input query : +body:flyaway 2. Analyzer return that : fly +body:away 3. Solr attach search field tag at filter returned query : +body as i defined at schema.xml.(default operator AND) 4. I'm indexed 1 docs that have field name body, has containing this phrase fly away 5. I expect 1 docs return of result by query +body:fly +body:away but 0 docs returned. What's the problem?? Anybody help me please~ : -- View this message in context: http://lucene.472066.n3.nabble.com/How-can-i-get-collect-stemmed-query-tp1723055p1723055.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How can i get collect stemmed query?
Oops, I'm Sorry! I found some mistakes on previous posted source.( Main class name has been wrong :) This is the collect analyzer source. --- public class MyCustomQueryAnalyzer extends Analyzer{ public static final Version LUCENE_VERSION = Version.LUCENE_29; public static int QUERY_MIN_LEN_WORD_FILTER = 1; public static int QUERY_MAX_LEN_WORD_FILTER = 40; public int elapsedTime = 0; @Override public TokenStream tokenStream(String paramString, Reader reader) { StandardTokenizer tokenizer = new StandardTokenizer( du.utas.mcrdr.ir.lucene.WebDocIR.LUCENE_VERSION, reader ); TokenStream tokenStream = new LengthFilter( tokenizer, QUERY_MIN_LEN_WORD_FILTER, QUERY_MAX_LEN_WORD_FILTER ); tokenStream = new LowerCaseFilter( tokenStream ); //My custom stemmer method MyCustomSingleWordStemmer stemer = new MyCustomSingleWordStemmer(QUERY_MIN_LEN_WORD_FILTER, QUERY_MAX_LEN_WORD_FILTER); //My custom analyzer filter. this filter return sub-merged query. //ex) INPUT : flyaway // RETURN VALUE : fly +body:away tokenStream = new KLTQueryStemFilter( tokenStream, stemer, this ); return tokenStream; } } --- [Additional info] 1. MyCustomQueryAnalyzer made outside of Solr. I made this analyzer outside of the solr package and make it to ~.jar and located at ~/Solr/example/work/Jetty_0_0_0_0_8982_solr.war__solr__-2c5peu/webapp/WEB-INF/lib 2. I edited field type and field name in scheme.xml which to be searched. field name=body type=textTp indexed=true stored=true omitNorms=true/ fieldType name=textTp class=solr.TextField analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query class=com.testsolr.ir.customAnalyzer.MyCustomQueryAnalyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType This is my custom scheme.xml and custom search field type. 3. I've got this xml result when I append debugQuery=on to my search url. ?xml version=1.0 encoding=UTF-8 ? - response - lst name=responseHeader int name=status0/int int name=QTime0/int - lst name=params str name=debugQueryon/str str name=indenton/str str name=start0/str str name=q+body:flyaway/str str name=version2.2/str str name=rows10/str /lst /lst result name=response numFound=0 start=0 / - lst name=debug str name=rawquerystring+body:flyaway/str str name=querystring+body:flyaway/str str name=parsedquery+body:fly +body:away/str str name=parsedquery_toString+body:fly +body:away/str lst name=explain / str name=QParserLuceneQParser/str - lst name=timing double name=time0.0/double - lst name=prepare double name=time0.0/double - lst name=org.apache.solr.handler.component.QueryComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst - lst name=process double name=time0.0/double - lst name=org.apache.solr.handler.component.QueryComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst /lst /lst /response I really appreciate your advice~ :) -- View this message in context: http://lucene.472066.n3.nabble.com/How-can-i-get-collect-search-result-from-custom-filtered-query-tp1723055p1723815.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How can i get collect stemmed query?
rawquerystring = +body:flyaway parsedquery = +body:fly +body:away shows that your custom filter is working as you expected. However you are using different tokenizers in query (standardtokenizer hard-coded) and index (whitespacetokenizer) time. That may cause numFound=0. For example if your indexed document contains 'fly, away' in its body field, your query won't return it. Because of comma. admin/analysis.jsp shows indexed tokens. You can issue a *:* query to see if that document really exists. q=*:*fl=body Your query analyzer definition should look like : analyzer type=query class=com.testsolr.ir.customAnalyzer.MyCustomQueryAnalyzer / you cannot have both an analyzer and a tokenizer at the same time. Once you get this working, in your case it is better to write a custom filter factory plug-in and define query analyzer using it. ( for performance reason) And you can load your plug-in easier : http://wiki.apache.org/solr/SolrPlugins#How_to_Load_Plugins analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LengthFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=com.testsolr.ir.KLTQueryStemFilter/ /analyzer --- On Mon, 10/18/10, Jerad ag...@naver.com wrote: From: Jerad ag...@naver.com Subject: Re: How can i get collect stemmed query? To: solr-user@lucene.apache.org Date: Monday, October 18, 2010, 12:14 PM Oops, I'm Sorry! I found some mistakes on previous posted source.( Main class name has been wrong :) This is the collect analyzer source. --- public class MyCustomQueryAnalyzer extends Analyzer{ public static final Version LUCENE_VERSION = Version.LUCENE_29; public static int QUERY_MIN_LEN_WORD_FILTER = 1; public static int QUERY_MAX_LEN_WORD_FILTER = 40; public int elapsedTime = 0; @Override public TokenStream tokenStream(String paramString, Reader reader) { StandardTokenizer tokenizer = new StandardTokenizer( du.utas.mcrdr.ir.lucene.WebDocIR.LUCENE_VERSION, reader ); TokenStream tokenStream = new LengthFilter( tokenizer, QUERY_MIN_LEN_WORD_FILTER, QUERY_MAX_LEN_WORD_FILTER ); tokenStream = new LowerCaseFilter( tokenStream ); //My custom stemmer method MyCustomSingleWordStemmer stemer = new MyCustomSingleWordStemmer(QUERY_MIN_LEN_WORD_FILTER, QUERY_MAX_LEN_WORD_FILTER); //My custom analyzer filter. this filter return sub-merged query. //ex) INPUT : flyaway // RETURN VALUE : fly +body:away tokenStream = new KLTQueryStemFilter( tokenStream, stemer, this ); return tokenStream; } } --- [Additional info] 1. MyCustomQueryAnalyzer made outside of Solr. I made this analyzer outside of the solr package and make it to ~.jar and located at ~/Solr/example/work/Jetty_0_0_0_0_8982_solr.war__solr__-2c5peu/webapp/WEB-INF/lib 2. I edited field type and field name in scheme.xml which to be searched. field name=body type=textTp indexed=true stored=true omitNorms=true/ fieldType name=textTp class=solr.TextField analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query class=com.testsolr.ir.customAnalyzer.MyCustomQueryAnalyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType This is my custom scheme.xml and custom search field type. 3. I've got this xml result when I append debugQuery=on to my search url. ?xml version=1.0 encoding=UTF-8 ? - response - lst name=responseHeader int name=status0/int int name=QTime0/int - lst name=params str name=debugQueryon/str str name=indenton/str str name=start0/str str name=q+body:flyaway/str str name=version2.2/str str name=rows10/str /lst /lst result name=response numFound=0 start=0 / - lst name=debug str name=rawquerystring+body:flyaway/str str name=querystring+body:flyaway/str str name=parsedquery+body:fly +body:away/str str name=parsedquery_toString+body:fly +body:away/str lst name=explain / str name=QParserLuceneQParser/str - lst name=timing double name=time0.0/double - lst name=prepare double name=time0.0/double - lst name=org.apache.solr.handler.component.QueryComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst - lst name
Re: How can i get collect stemmed query?
Thanks for your reply :) 1. I tested that q=*:*fl=body , 1 doc returned as result as I expected. 2. I'm edit my scheme.xml as you instructed. analyzer type=query class=com.testsolr.ir.customAnalyzer.MyCustomQueryAnalyzer //No filter description. /analyzer but no result returned. 3. I wonder that... Tipically Tokenizer and filter flow was 1) Input stream provide text stream to tokenizer or filter. 2) tokenizer or filter get a token, and processed token and offset attribute info has returned. 3) offset attributes has the infomation of token's. This is a part of tipical filter src that I thought. public class CustomStemFilter extends TokenFilter { private MyCustomStemer stemmer; private TermAttribute termAttr; private OffsetAttribute offsetAttr; private TypeAttribute typeAttr; private HashtableString,String reserved = new HashtableString,String(); public CustomStemFilter( TokenStream tokenStream, boolean isQuery, MyCustomStemer stemmer ){ super( tokenStream ); this.stemmer = stemmer; termAttr = (TermAttribute) addAttribute(TermAttribute.class); offsetAttr = (OffsetAttribute) addAttribute(OffsetAttribute.class); typeAttr = (TypeAttribute) addAttribute(TypeAttribute.class); addAttribute(PositionIncrementAttribute.class); //Some of my custom logic here. //do something. } private MyCustomStemmer stemmer = new MyCustomStemmer(); public boolean incrementToken() throws IOException { clearAttributes(); if (!input.incrementToken()) return false; StringBuffer queryBuffer = new StringBuffer(); //stemming logic here. //generated query string has append to queryBuffer. termAttr.setTermBuffer(queryBuffer.toString(), 0, queryBuffer.length()); offsetAttr.setOffset(0, queryBuffer.length()); offSet += queryBuffer.length(); typeAttr.setType(word); return true; } } ※ MyCustomStemmer analyze input string flyaway to query string : fly +body:away and return it. At index time, contents to be searched is normally analyzed and indexed as below. a) Contents to be indexed : fly away b) Token fly and length of fly = 3(Has been setup by offset attribute method) has returned by filter or analyzer. c) Next token away and length of away = 4 has returned. I think it's a general index flow. But, I customized MyCustomFilter that filter generate query string, not a token. In the process, offset value has changed : query's length, not a single token's length. I wonder that value to be set up by offsetAttr.setOffset() method has influence on search result on using solr? (I tested this on main page's query input box at http://localhost:8983/solr/admin/ ) -- View this message in context: http://lucene.472066.n3.nabble.com/How-can-i-get-collect-search-result-from-custom-filtered-query-tp1723055p1729717.html Sent from the Solr - User mailing list archive at Nabble.com.