Hi Doug,

On 10/17/05 11:38 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote:

> Chris Mattmann wrote:
>>  So, one thing it seems is that fields to be indexed, and used in a field
>> query must be fully lowercase to work? Additionally, it seems that they
>> can't have symbols in them, such as "_", is that correct? Would you guys
>> consider this to be a bug?
> 
> Yes, this sounds like a bug.

Okay, I will look and see if I can figure out why this is happening and if I
can, I will try and submit a patch.


> 
>> Performing Lucene Query:
>> 
>> using filter QueryFilter(+contactemail:[EMAIL PROTECTED]) and
>> numHits = 20
>> 
>> 051016 190347 11 total hits: 0
> 
> A query whose only clause has a boost of 0.0 will return no results.
> Nutch uses the convention that clauses whose boost is 0.0 may be
> converted to filters, for efficiency.  A filter affects the set of hits,
> but not their ranking.  So a boost of 0.0 is used to declare that a
> clause does not affect ranking and may not be used in isolation.  This
> makes it akin to searching for "filetype:pdf" on Google--filetype is
> only used to filter other queries and may not be a standalone query.

Okay, this makes sense. In fact, when I do a query now for:

"contactemail:[EMAIL PROTECTED] specimen"

The query actually works. Of the 3 documents I indexed only one of them has
the contactemail [EMAIL PROTECTED], and so I only got one result
back. So your answer there makes total sense. So, my question to you then
is, what type of QueryFilter should I develop in order to get my query for
contactemail:<email address> to work as a standalone query? For instance,
right now I'm sub-classing the RawFieldQueryFilter, which doesn't seem to be
the right way to do it now. Is there a class in Nutch that I can sub-class
to get most of the functionality for doing a type:<value> query as a
standalone query?

Thanks for the help.

Cheers,
  Chris

> 
> Doug

______________________________________________
Chris A. Mattmann
[EMAIL PROTECTED]
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group
 
_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                        Mailstop:  171-246
_______________________________________________________
 
Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.
 
 



Reply via email to