Hi Doug,
On 10/17/05 11:38 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote: > Chris Mattmann wrote: >> So, one thing it seems is that fields to be indexed, and used in a field >> query must be fully lowercase to work? Additionally, it seems that they >> can't have symbols in them, such as "_", is that correct? Would you guys >> consider this to be a bug? > > Yes, this sounds like a bug. Okay, I will look and see if I can figure out why this is happening and if I can, I will try and submit a patch. > >> Performing Lucene Query: >> >> using filter QueryFilter(+contactemail:[EMAIL PROTECTED]) and >> numHits = 20 >> >> 051016 190347 11 total hits: 0 > > A query whose only clause has a boost of 0.0 will return no results. > Nutch uses the convention that clauses whose boost is 0.0 may be > converted to filters, for efficiency. A filter affects the set of hits, > but not their ranking. So a boost of 0.0 is used to declare that a > clause does not affect ranking and may not be used in isolation. This > makes it akin to searching for "filetype:pdf" on Google--filetype is > only used to filter other queries and may not be a standalone query. Okay, this makes sense. In fact, when I do a query now for: "contactemail:[EMAIL PROTECTED] specimen" The query actually works. Of the 3 documents I indexed only one of them has the contactemail [EMAIL PROTECTED], and so I only got one result back. So your answer there makes total sense. So, my question to you then is, what type of QueryFilter should I develop in order to get my query for contactemail:<email address> to work as a standalone query? For instance, right now I'm sub-classing the RawFieldQueryFilter, which doesn't seem to be the right way to do it now. Is there a class in Nutch that I can sub-class to get most of the functionality for doing a type:<value> query as a standalone query? Thanks for the help. Cheers, Chris > > Doug ______________________________________________ Chris A. Mattmann [EMAIL PROTECTED] Staff Member Modeling and Data Management Systems Section (387) Data Management Systems and Technologies Group _________________________________________________ Jet Propulsion Laboratory Pasadena, CA Office: 171-266B Mailstop: 171-246 _______________________________________________________ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology.