Re: developing a parse-/index-/query- plugin set
Chris Mattmann wrote: So, one thing it seems is that fields to be indexed, and used in a field query must be fully lowercase to work? Additionally, it seems that they can't have symbols in them, such as _, is that correct? Would you guys consider this to be a bug? Yes, this sounds like a bug. Performing Lucene Query: using filter QueryFilter(+contactemail:[EMAIL PROTECTED]) and numHits = 20 051016 190347 11 total hits: 0 A query whose only clause has a boost of 0.0 will return no results. Nutch uses the convention that clauses whose boost is 0.0 may be converted to filters, for efficiency. A filter affects the set of hits, but not their ranking. So a boost of 0.0 is used to declare that a clause does not affect ranking and may not be used in isolation. This makes it akin to searching for filetype:pdf on Google--filetype is only used to filter other queries and may not be a standalone query. Doug
Re: developing a parse-/index-/query- plugin set
Hi Doug, On 10/17/05 11:38 AM, Doug Cutting [EMAIL PROTECTED] wrote: Chris Mattmann wrote: So, one thing it seems is that fields to be indexed, and used in a field query must be fully lowercase to work? Additionally, it seems that they can't have symbols in them, such as _, is that correct? Would you guys consider this to be a bug? Yes, this sounds like a bug. Okay, I will look and see if I can figure out why this is happening and if I can, I will try and submit a patch. Performing Lucene Query: using filter QueryFilter(+contactemail:[EMAIL PROTECTED]) and numHits = 20 051016 190347 11 total hits: 0 A query whose only clause has a boost of 0.0 will return no results. Nutch uses the convention that clauses whose boost is 0.0 may be converted to filters, for efficiency. A filter affects the set of hits, but not their ranking. So a boost of 0.0 is used to declare that a clause does not affect ranking and may not be used in isolation. This makes it akin to searching for filetype:pdf on Google--filetype is only used to filter other queries and may not be a standalone query. Okay, this makes sense. In fact, when I do a query now for: contactemail:[EMAIL PROTECTED] specimen The query actually works. Of the 3 documents I indexed only one of them has the contactemail [EMAIL PROTECTED], and so I only got one result back. So your answer there makes total sense. So, my question to you then is, what type of QueryFilter should I develop in order to get my query for contactemail:email address to work as a standalone query? For instance, right now I'm sub-classing the RawFieldQueryFilter, which doesn't seem to be the right way to do it now. Is there a class in Nutch that I can sub-class to get most of the functionality for doing a type:value query as a standalone query? Thanks for the help. Cheers, Chris Doug __ Chris A. Mattmann [EMAIL PROTECTED] Staff Member Modeling and Data Management Systems Section (387) Data Management Systems and Technologies Group _ Jet Propulsion LaboratoryPasadena, CA Office: 171-266BMailstop: 171-246 ___ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology.
Re: developing a parse-/index-/query- plugin set
Chris Mattmann wrote: So, my question to you then is, what type of QueryFilter should I develop in order to get my query for contactemail:email address to work as a standalone query? For instance, right now I'm sub-classing the RawFieldQueryFilter, which doesn't seem to be the right way to do it now. Is there a class in Nutch that I can sub-class to get most of the functionality for doing a type:value query as a standalone query? You can simply pass a non-zero boost to the RawFieldQueryFilter constructor, e.g.: public class MyQueryFilter extends RawFieldQueryFilter { public MyQueryFilter() { super(myfield, 1.0f); } } Or you can implement QueryFilter directly. There's not that much to it. Doug
Re: developing a parse-/index-/query- plugin set
Hi Doug, Thanks, that worked. Cheers, Chris On 10/17/05 11:56 AM, Doug Cutting [EMAIL PROTECTED] wrote: Chris Mattmann wrote: So, my question to you then is, what type of QueryFilter should I develop in order to get my query for contactemail:email address to work as a standalone query? For instance, right now I'm sub-classing the RawFieldQueryFilter, which doesn't seem to be the right way to do it now. Is there a class in Nutch that I can sub-class to get most of the functionality for doing a type:value query as a standalone query? You can simply pass a non-zero boost to the RawFieldQueryFilter constructor, e.g.: public class MyQueryFilter extends RawFieldQueryFilter { public MyQueryFilter() { super(myfield, 1.0f); } } Or you can implement QueryFilter directly. There's not that much to it. Doug __ Chris A. Mattmann [EMAIL PROTECTED] Staff Member Modeling and Data Management Systems Section (387) Data Management Systems and Technologies Group _ Jet Propulsion LaboratoryPasadena, CA Office: 171-266BMailstop: 171-246 ___ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology.