Re: How can i get collect stemmed query?

2010-10-20 Thread Jerad

Thank you very much~! I'll try it :)


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-can-i-get-collect-search-result-from-custom-filtered-query-tp1723055p1742898.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How can i get collect stemmed query?

2010-10-19 Thread Ahmet Arslan
Oh you are constructing the string 'fly +body:away' in your StemFilter?
Just to make sure, does this q=+body:(fly away) return your document?
And analysis.jsp (at query time) displays 'fly +body:away' from the string 
'flyaway'?

I don't know why are you doing this but your stemfilter should return only 
terms, not field names attached to it.

Maybe you can find this useful so that you can do what you want without writing 
custom code.
 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory



 

--- On Tue, 10/19/10, Jerad ag...@naver.com wrote:

 From: Jerad ag...@naver.com
 Subject: Re: How can i get collect stemmed query?
 To: solr-user@lucene.apache.org
 Date: Tuesday, October 19, 2010, 5:10 AM
 
 Thanks for your reply :)
 
 1. I tested that q=*:*fl=body , 1 doc returned as
 result as I expected.
 
 2. I'm edit my scheme.xml as you instructed. 
 
     analyzer type=query
 class=com.testsolr.ir.customAnalyzer.MyCustomQueryAnalyzer
 
         //No filter description.
     /analyzer 
 
     but no result returned.
 
 3. I wonder that...
 
     Tipically Tokenizer and filter flow was
 
     1) Input stream provide text stream to
 tokenizer or filter.
     2) tokenizer or filter get a token, and
 processed token and offset
 attribute info has returned.
     3) offset attributes has the infomation of
 token's.
     
         This is a part of tipical
 filter src that I thought.
        
 
         public class CustomStemFilter
 extends TokenFilter {
 
             private
 MyCustomStemer stemmer;
             private
 TermAttribute termAttr;
             private
 OffsetAttribute offsetAttr;
             private
 TypeAttribute typeAttr;
             private
 HashtableString,String reserved = new
 HashtableString,String();
             
             public
 CustomStemFilter( TokenStream tokenStream, boolean isQuery,
 MyCustomStemer stemmer ){
              
   super( tokenStream );
                 
              
   this.stemmer = stemmer;
              
   termAttr   = (TermAttribute)
 addAttribute(TermAttribute.class);   
              
   offsetAttr = (OffsetAttribute)
 addAttribute(OffsetAttribute.class);   
                
 typeAttr   = (TypeAttribute)
 addAttribute(TypeAttribute.class);   
                
 addAttribute(PositionIncrementAttribute.class);
             
                
 //Some of my custom logic here.
                
 //do something.
             }
             
             private
 MyCustomStemmer stemmer = new MyCustomStemmer();
             
             public boolean
 incrementToken() throws IOException {
                
 clearAttributes();
             
                 if
 (!input.incrementToken())
                
     return false;
 
                
 StringBuffer queryBuffer = new StringBuffer();
                 
                
 //stemming logic here.
                
 //generated query string has append to queryBuffer.
                 
              
   termAttr.setTermBuffer(queryBuffer.toString(), 0,
 queryBuffer.length());
              
   offsetAttr.setOffset(0, queryBuffer.length());
         
       offSet += queryBuffer.length();
         
       typeAttr.setType(word);
         
       
         
       return true;
             }
         }
        
 
 
         ※ MyCustomStemmer analyze
 input string flyaway to query string :
 fly +body:away
            and return
 it.
 
         At index time, contents to be
 searched is normally analyzed and
 indexed as below.
         
         a) Contents to be indexed : fly
 away
         b) Token fly and length of
 fly = 3(Has been setup by offset
 attribute method) 
            has returned
 by filter or analyzer.
         c) Next token away and length
 of away = 4 has returned.
         
         I think it's a general index
 flow.
 
         But, I customized
 MyCustomFilter that filter generate query string,
 not a token.
         In the process, offset value
 has changed : query's length, not a
 single token's length.
         
         I wonder that value to be set
 up by offsetAttr.setOffset() method 
         has influence on search result
 on using solr? 
         (I tested this on main page's
 query input box at
 http://localhost:8983/solr/admin/ )
 

 
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-can-i-get-collect-search-result-from-custom-filtered-query-tp1723055p1729717.html
 Sent from the Solr - User mailing list archive at
 Nabble.com.
 





Re: How can i get collect stemmed query?

2010-10-18 Thread Ahmet Arslan
Are you using KLTQueryAnalyzer outside of the Solr? (pre-process)
Or you defined a fieldType in schema.xml that uses KLTQueryAnalyzer?

Can you append debugQuery=on to your search url and paste output?

--- On Mon, 10/18/10, Jerad ag...@naver.com wrote:

 From: Jerad ag...@naver.com
 Subject: How can i get collect stemmed query?
 To: solr-user@lucene.apache.org
 Date: Monday, October 18, 2010, 9:15 AM
 
 Hi~. I'm beginner who wanna make search system by using
 solr 1.4.1 and lucene
 2.92.
 
 I got a collect lucene query from my custom Analyzer and
 filter from given
 query,
 but no result displayed.
 
 Here is my Analyzer source.
 
 --
 public class KLTQueryAnalyzer extends Analyzer{
     public static final Version LUCENE_VERSION =
 Version.LUCENE_29;
     public static int QUERY_MIN_LEN_WORD_FILTER =
 1;
     public static int QUERY_MAX_LEN_WORD_FILTER =
 40;
     
     public int elapsedTime = 0;
     
     @Override
     public TokenStream tokenStream(String
 paramString, Reader reader) {
         StandardTokenizer tokenizer =
 new StandardTokenizer( 
            
 du.utas.mcrdr.ir.lucene.WebDocIR.LUCENE_VERSION, reader );
 
         TokenStream tokenStream = new
 LengthFilter( tokenizer,
 QUERY_MIN_LEN_WORD_FILTER,
          
    QUERY_MAX_LEN_WORD_FILTER );
         tokenStream = new
 LowerCaseFilter( tokenStream );
 
 
         //My custom stemmer method
         KLTSingleWordStemmer stemer =
 new
 KLTSingleWordStemmer(QUERY_MIN_LEN_WORD_FILTER,
 QUERY_MAX_LEN_WORD_FILTER);
 
         //My custom analyzer filter.
 this filter return sub-merged query.
         //ex) INPUT : flyaway
         // 
    RETURN VALUE : fly +body:away
         tokenStream = new
 KLTQueryStemFilter( tokenStream, stemer, this );
 
         return tokenStream;
     }
 }
 --
 
 
 example query)  Input User query : +body:flyaway 
                
       Expected analyzed query : +body:fly
 +body:away
 
               INDEXED
 DATA : body fly away
 
 
 I'm expecting 1 docs returned from index, but I have no
 result returned.
 
 explain my custom flow
 
 1. User input query : +body:flyaway
 2. Analyzer return that : fly +body:away
 3. Solr attach search field tag at filter returned query :
 +body as i
 defined at schema.xml.(default operator AND)
 4. I'm indexed 1 docs that have field name body, has
 containing this
 phrase fly away
 5. I expect 1 docs return of result by query +body:fly
 +body:away but 0
 docs returned.
 
 What's the problem?? Anybody help me please~ :
 
 
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-can-i-get-collect-stemmed-query-tp1723055p1723055.html
 Sent from the Solr - User mailing list archive at
 Nabble.com.
 





Re: How can i get collect stemmed query?

2010-10-18 Thread Jerad

Oops, I'm Sorry! I found some mistakes on previous posted source.( Main class
name has been wrong :)

This is the collect analyzer source.
---
public class MyCustomQueryAnalyzer extends Analyzer{ 
public static final Version LUCENE_VERSION = Version.LUCENE_29; 
public static int QUERY_MIN_LEN_WORD_FILTER = 1; 
public static int QUERY_MAX_LEN_WORD_FILTER = 40; 

public int elapsedTime = 0; 

@Override 
public TokenStream tokenStream(String paramString, Reader reader) { 
StandardTokenizer tokenizer = new StandardTokenizer( 
du.utas.mcrdr.ir.lucene.WebDocIR.LUCENE_VERSION, reader ); 

TokenStream tokenStream = new LengthFilter( tokenizer,
QUERY_MIN_LEN_WORD_FILTER, 
 QUERY_MAX_LEN_WORD_FILTER ); 
tokenStream = new LowerCaseFilter( tokenStream ); 


//My custom stemmer method 
MyCustomSingleWordStemmer stemer = new
MyCustomSingleWordStemmer(QUERY_MIN_LEN_WORD_FILTER,
QUERY_MAX_LEN_WORD_FILTER); 

//My custom analyzer filter. this filter return sub-merged query. 
//ex) INPUT : flyaway 
// RETURN VALUE : fly +body:away 
tokenStream = new KLTQueryStemFilter( tokenStream, stemer, this ); 

return tokenStream; 
} 
} 

---

[Additional info]

1. MyCustomQueryAnalyzer made outside of Solr.
I made this analyzer outside of the solr package and make it to ~.jar
and located at 

   
~/Solr/example/work/Jetty_0_0_0_0_8982_solr.war__solr__-2c5peu/webapp/WEB-INF/lib
 

2. I edited field type and field name in scheme.xml which to be searched.

field name=body type=textTp indexed=true stored=true
omitNorms=true/

fieldType name=textTp class=solr.TextField
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
class=com.testsolr.ir.customAnalyzer.MyCustomQueryAnalyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
  /analyzer
/fieldType

This is my custom scheme.xml and custom search field type.

3. I've got this xml result when I append debugQuery=on to my search url.


  ?xml version=1.0 encoding=UTF-8 ? 
- response
- lst name=responseHeader
  int name=status0/int 
  int name=QTime0/int 
- lst name=params
  str name=debugQueryon/str 
  str name=indenton/str 
  str name=start0/str 
  str name=q+body:flyaway/str 
  str name=version2.2/str 
  str name=rows10/str 
  /lst
  /lst
  result name=response numFound=0 start=0 / 
- lst name=debug
  str name=rawquerystring+body:flyaway/str 
  str name=querystring+body:flyaway/str 
  str name=parsedquery+body:fly +body:away/str 
  str name=parsedquery_toString+body:fly +body:away/str 
  lst name=explain / 
  str name=QParserLuceneQParser/str 
- lst name=timing
  double name=time0.0/double 
- lst name=prepare
  double name=time0.0/double 
- lst name=org.apache.solr.handler.component.QueryComponent
  double name=time0.0/double 
  /lst
- lst name=org.apache.solr.handler.component.FacetComponent
  double name=time0.0/double 
  /lst
- lst name=org.apache.solr.handler.component.MoreLikeThisComponent
  double name=time0.0/double 
  /lst
- lst name=org.apache.solr.handler.component.HighlightComponent
  double name=time0.0/double 
  /lst
- lst name=org.apache.solr.handler.component.StatsComponent
  double name=time0.0/double 
  /lst
- lst name=org.apache.solr.handler.component.DebugComponent
  double name=time0.0/double 
  /lst
  /lst
- lst name=process
  double name=time0.0/double 
- lst name=org.apache.solr.handler.component.QueryComponent
  double name=time0.0/double 
  /lst
- lst name=org.apache.solr.handler.component.FacetComponent
  double name=time0.0/double 
  /lst
- lst name=org.apache.solr.handler.component.MoreLikeThisComponent
  double name=time0.0/double 
  /lst
- lst name=org.apache.solr.handler.component.HighlightComponent
  double name=time0.0/double 
  /lst
- lst name=org.apache.solr.handler.component.StatsComponent
  double name=time0.0/double 
  /lst
- lst name=org.apache.solr.handler.component.DebugComponent
  double name=time0.0/double 
  /lst
  /lst
  /lst
  /lst
  /response


I really appreciate your advice~ :)

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-can-i-get-collect-search-result-from-custom-filtered-query-tp1723055p1723815.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How can i get collect stemmed query?

2010-10-18 Thread Ahmet Arslan
rawquerystring = +body:flyaway
parsedquery = +body:fly +body:away

shows that your custom filter is working as you expected.

However you are using different tokenizers in query (standardtokenizer 
hard-coded) and index (whitespacetokenizer) time. That may cause numFound=0.  

For example if your indexed document contains 'fly, away' in its body field, 
your query won't return it. Because of comma. 

admin/analysis.jsp shows indexed tokens. 

You can issue a *:* query to see if that document really exists.
q=*:*fl=body

Your query analyzer definition should look like   :
analyzer type=query  
class=com.testsolr.ir.customAnalyzer.MyCustomQueryAnalyzer  /

you cannot have both an analyzer and a tokenizer at the same time.

Once you get this working, in your case it is better to write a custom filter 
factory plug-in and define query analyzer using it. ( for performance reason)
And you can load your plug-in easier : 
http://wiki.apache.org/solr/SolrPlugins#How_to_Load_Plugins

analyzer type=query
          tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.LengthFilterFactory/
        filter class=solr.LowerCaseFilterFactory/
filter class=com.testsolr.ir.KLTQueryStemFilter/
      /analyzer


--- On Mon, 10/18/10, Jerad ag...@naver.com wrote:

 From: Jerad ag...@naver.com
 Subject: Re: How can i get collect stemmed query?
 To: solr-user@lucene.apache.org
 Date: Monday, October 18, 2010, 12:14 PM
 
 Oops, I'm Sorry! I found some mistakes on previous posted
 source.( Main class
 name has been wrong :)
 
 This is the collect analyzer source.
 ---
 public class MyCustomQueryAnalyzer extends Analyzer{ 
     public static final Version LUCENE_VERSION =
 Version.LUCENE_29; 
     public static int QUERY_MIN_LEN_WORD_FILTER =
 1; 
     public static int QUERY_MAX_LEN_WORD_FILTER =
 40; 
         
     public int elapsedTime = 0; 
         
     @Override 
     public TokenStream tokenStream(String
 paramString, Reader reader) { 
         StandardTokenizer tokenizer =
 new StandardTokenizer( 
            
 du.utas.mcrdr.ir.lucene.WebDocIR.LUCENE_VERSION, reader ); 
 
         TokenStream tokenStream = new
 LengthFilter( tokenizer,
 QUERY_MIN_LEN_WORD_FILTER, 
          
    QUERY_MAX_LEN_WORD_FILTER ); 
         tokenStream = new
 LowerCaseFilter( tokenStream ); 
 
 
         //My custom stemmer method 
         MyCustomSingleWordStemmer
 stemer = new
 MyCustomSingleWordStemmer(QUERY_MIN_LEN_WORD_FILTER,
 QUERY_MAX_LEN_WORD_FILTER); 
 
         //My custom analyzer filter.
 this filter return sub-merged query. 
         //ex) INPUT : flyaway 
         // 
    RETURN VALUE : fly +body:away 
         tokenStream = new
 KLTQueryStemFilter( tokenStream, stemer, this ); 
 
         return tokenStream; 
     } 
 } 
 
 ---
 
 [Additional info]
 
 1. MyCustomQueryAnalyzer made outside of Solr.
     I made this analyzer outside of the solr
 package and make it to ~.jar
 and located at 
 
    
 ~/Solr/example/work/Jetty_0_0_0_0_8982_solr.war__solr__-2c5peu/webapp/WEB-INF/lib
 
 
 2. I edited field type and field name in scheme.xml which
 to be searched.
 
     field name=body type=textTp
 indexed=true stored=true
 omitNorms=true/
 
     fieldType name=textTp
 class=solr.TextField
       analyzer type=index
           tokenizer
 class=solr.WhitespaceTokenizerFactory/
         filter
 class=solr.LowerCaseFilterFactory/
       /analyzer
       analyzer type=query
 class=com.testsolr.ir.customAnalyzer.MyCustomQueryAnalyzer
         tokenizer
 class=solr.WhitespaceTokenizerFactory/
       /analyzer
     /fieldType
 
     This is my custom scheme.xml and custom
 search field type.
 
 3. I've got this xml result when I append
 debugQuery=on to my search url.
 
 
   ?xml version=1.0 encoding=UTF-8 ? 
 - response
 - lst name=responseHeader
   int name=status0/int 
   int name=QTime0/int 
 - lst name=params
   str name=debugQueryon/str 
   str name=indenton/str 
   str name=start0/str 
   str name=q+body:flyaway/str 
   str name=version2.2/str 
   str name=rows10/str 
   /lst
   /lst
   result name=response numFound=0 start=0
 / 
 - lst name=debug
   str
 name=rawquerystring+body:flyaway/str 
   str
 name=querystring+body:flyaway/str 
   str name=parsedquery+body:fly
 +body:away/str 
   str name=parsedquery_toString+body:fly
 +body:away/str 
   lst name=explain / 
   str name=QParserLuceneQParser/str
 
 - lst name=timing
   double name=time0.0/double 
 - lst name=prepare
   double name=time0.0/double 
 - lst
 name=org.apache.solr.handler.component.QueryComponent
   double name=time0.0/double 
   /lst
 - lst
 name=org.apache.solr.handler.component.FacetComponent
   double name=time0.0/double 
   /lst
 - lst
 name

Re: How can i get collect stemmed query?

2010-10-18 Thread Jerad

Thanks for your reply :)

1. I tested that q=*:*fl=body , 1 doc returned as result as I expected.

2. I'm edit my scheme.xml as you instructed. 

analyzer type=query
class=com.testsolr.ir.customAnalyzer.MyCustomQueryAnalyzer 
//No filter description.
/analyzer 

but no result returned.

3. I wonder that...

Tipically Tokenizer and filter flow was

1) Input stream provide text stream to tokenizer or filter.
2) tokenizer or filter get a token, and processed token and offset
attribute info has returned.
3) offset attributes has the infomation of token's.

This is a part of tipical filter src that I thought.
   

public class CustomStemFilter extends TokenFilter {

private MyCustomStemer stemmer;
private TermAttribute termAttr;
private OffsetAttribute offsetAttr;
private TypeAttribute typeAttr;
private HashtableString,String reserved = new
HashtableString,String();

public CustomStemFilter( TokenStream tokenStream, boolean 
isQuery,
MyCustomStemer stemmer ){
super( tokenStream );

this.stemmer = stemmer;
termAttr   = (TermAttribute) 
addAttribute(TermAttribute.class);   
offsetAttr = (OffsetAttribute)
addAttribute(OffsetAttribute.class);   
typeAttr   = (TypeAttribute)
addAttribute(TypeAttribute.class);   
addAttribute(PositionIncrementAttribute.class);

//Some of my custom logic here.
//do something.
}

private MyCustomStemmer stemmer = new MyCustomStemmer();

public boolean incrementToken() throws IOException {
clearAttributes();

if (!input.incrementToken())
return false;

StringBuffer queryBuffer = new StringBuffer();

//stemming logic here.
//generated query string has append to queryBuffer.

termAttr.setTermBuffer(queryBuffer.toString(), 0,
queryBuffer.length());
offsetAttr.setOffset(0, queryBuffer.length());
offSet += queryBuffer.length();
typeAttr.setType(word);

return true;
}
}
   


※ MyCustomStemmer analyze input string flyaway to query string :
fly +body:away
   and return it.

At index time, contents to be searched is normally analyzed and
indexed as below.

a) Contents to be indexed : fly away
b) Token fly and length of fly = 3(Has been setup by offset
attribute method) 
   has returned by filter or analyzer.
c) Next token away and length of away = 4 has returned.

I think it's a general index flow.

But, I customized MyCustomFilter that filter generate query string,
not a token.
In the process, offset value has changed : query's length, not a
single token's length.

I wonder that value to be set up by offsetAttr.setOffset() method 
has influence on search result on using solr? 
(I tested this on main page's query input box at
http://localhost:8983/solr/admin/ )


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-can-i-get-collect-search-result-from-custom-filtered-query-tp1723055p1729717.html
Sent from the Solr - User mailing list archive at Nabble.com.