Re: developing a parse-/index-/query- plugin set

2005-10-17 Thread Doug Cutting

Chris Mattmann wrote:

 So, one thing it seems is that fields to be indexed, and used in a field
query must be fully lowercase to work? Additionally, it seems that they
can't have symbols in them, such as _, is that correct? Would you guys
consider this to be a bug?


Yes, this sounds like a bug.

Performing Lucene Query: 


using filter QueryFilter(+contactemail:[EMAIL PROTECTED]) and
numHits = 20

051016 190347 11 total hits: 0


A query whose only clause has a boost of 0.0 will return no results. 
Nutch uses the convention that clauses whose boost is 0.0 may be 
converted to filters, for efficiency.  A filter affects the set of hits, 
but not their ranking.  So a boost of 0.0 is used to declare that a 
clause does not affect ranking and may not be used in isolation.  This 
makes it akin to searching for filetype:pdf on Google--filetype is 
only used to filter other queries and may not be a standalone query.


Doug


Re: developing a parse-/index-/query- plugin set

2005-10-17 Thread Chris Mattmann
Hi Doug,


On 10/17/05 11:38 AM, Doug Cutting [EMAIL PROTECTED] wrote:

 Chris Mattmann wrote:
  So, one thing it seems is that fields to be indexed, and used in a field
 query must be fully lowercase to work? Additionally, it seems that they
 can't have symbols in them, such as _, is that correct? Would you guys
 consider this to be a bug?
 
 Yes, this sounds like a bug.

Okay, I will look and see if I can figure out why this is happening and if I
can, I will try and submit a patch.


 
 Performing Lucene Query:
 
 using filter QueryFilter(+contactemail:[EMAIL PROTECTED]) and
 numHits = 20
 
 051016 190347 11 total hits: 0
 
 A query whose only clause has a boost of 0.0 will return no results.
 Nutch uses the convention that clauses whose boost is 0.0 may be
 converted to filters, for efficiency.  A filter affects the set of hits,
 but not their ranking.  So a boost of 0.0 is used to declare that a
 clause does not affect ranking and may not be used in isolation.  This
 makes it akin to searching for filetype:pdf on Google--filetype is
 only used to filter other queries and may not be a standalone query.

Okay, this makes sense. In fact, when I do a query now for:

contactemail:[EMAIL PROTECTED] specimen

The query actually works. Of the 3 documents I indexed only one of them has
the contactemail [EMAIL PROTECTED], and so I only got one result
back. So your answer there makes total sense. So, my question to you then
is, what type of QueryFilter should I develop in order to get my query for
contactemail:email address to work as a standalone query? For instance,
right now I'm sub-classing the RawFieldQueryFilter, which doesn't seem to be
the right way to do it now. Is there a class in Nutch that I can sub-class
to get most of the functionality for doing a type:value query as a
standalone query?

Thanks for the help.

Cheers,
  Chris

 
 Doug

__
Chris A. Mattmann
[EMAIL PROTECTED]
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group
 
_
Jet Propulsion LaboratoryPasadena, CA
Office: 171-266BMailstop:  171-246
___
 
Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.
 
 





Re: developing a parse-/index-/query- plugin set

2005-10-17 Thread Doug Cutting

Chris Mattmann wrote:

So, my question to you then
is, what type of QueryFilter should I develop in order to get my query for
contactemail:email address to work as a standalone query? For instance,
right now I'm sub-classing the RawFieldQueryFilter, which doesn't seem to be
the right way to do it now. Is there a class in Nutch that I can sub-class
to get most of the functionality for doing a type:value query as a
standalone query?


You can simply pass a non-zero boost to the RawFieldQueryFilter 
constructor, e.g.:


public class MyQueryFilter extends RawFieldQueryFilter {
  public MyQueryFilter() {
super(myfield, 1.0f);
  }
}

Or you can implement QueryFilter directly.  There's not that much to it.

Doug


Re: developing a parse-/index-/query- plugin set

2005-10-17 Thread Chris Mattmann
Hi Doug,

 Thanks, that worked.

Cheers,
  Chris



On 10/17/05 11:56 AM, Doug Cutting [EMAIL PROTECTED] wrote:

 Chris Mattmann wrote:
 So, my question to you then
 is, what type of QueryFilter should I develop in order to get my query for
 contactemail:email address to work as a standalone query? For instance,
 right now I'm sub-classing the RawFieldQueryFilter, which doesn't seem to be
 the right way to do it now. Is there a class in Nutch that I can sub-class
 to get most of the functionality for doing a type:value query as a
 standalone query?
 
 You can simply pass a non-zero boost to the RawFieldQueryFilter
 constructor, e.g.:
 
 public class MyQueryFilter extends RawFieldQueryFilter {
public MyQueryFilter() {
  super(myfield, 1.0f);
}
 }
 
 Or you can implement QueryFilter directly.  There's not that much to it.
 
 Doug

__
Chris A. Mattmann
[EMAIL PROTECTED]
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group
 
_
Jet Propulsion LaboratoryPasadena, CA
Office: 171-266BMailstop:  171-246
___
 
Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.