Hi,

I had a question regarding building the Boolean Query as is done now in 
Nutch. While indexing the documents, I am adding fields called "fname" 
and "lname" for first name and last name of the author (as Field.Text). Now I 
want to search for 'adam smith'. My expectation is that I will get results 
were 'adam' is the first or last name, 'smith' is the first or last name, and 
the documents which has these words some where in the content. I have boosted 
the first name and last name to have higher weightage than the content. I 
have modified the query-basic plugin accordingly and the query representation 
looks like 
Query str = +((+url:adam^4.0 +url:smith^4.0 +url:"adam smith"~2147483647^4.0) 
(+fname:adam^4.0 +fname:smith^4.0 +fname:"adam smith"~2147483647^4.0) 
(+lname:adam^4.0 +lname:smith^4.0 +lname:"adam smith"~2147483647^4.0) 
(+summary:adam^4.0 +summary:smith^4.0 +summary:"adam smith"~2147483647^4.0)  
(+anchor:adam^2.0 +anchor:smith^2.0 + anchor:"adam smith"~4^2.0) 
(+content:adam +content:smith +content:"adam smith"~2147483647))

However I am getting only one hit, which is the document which has 'adam 
smith' in the content. 
Can someone please explain how do I go about making a query which is 
essentially (fname:adam OR fname:smith OR lname:adam OR lname:smith OR 
content:adam OR content:smith OR content:"adam smith"~SLOP_FACTOR) ?

Also while on this topic, can we directly execute Lucence partial and wild 
card queries from Nutch ? I currently see that NutchDocumentAnalyzer strips 
of any special characters that I put in the query.

Thanks and Have a great long weekend,
Praveen.






-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to