On Mon, 2005-04-04 at 09:17 +0000, mad Cow wrote: > Could some more experienced users suggest a solution to my problem. I have > documents which contain multiple terms and phrases, and I wish to collect > documents which match only the term I query for. > > For example: > Doc1 contains, > species:"homo sapien" Mammalia > > Doc2 contains, > species:"homo sapien" > > I wish to collect documents ONLY with "homo sapien" but a search for > species:"homo sapien" returns both documents as they both contain the > phrase. > I have written code to cache every term for every field an I hoped that I > could do the search - species:"homo sapien" -species:Mammalia. Unfortunately > the terms homo and sapien seem to be separate. So when I collect every term > to use with the "-" operator I end up with a query thus > species:"homo sapien" -species:(homo Mammalia sapien) > > which isn't the same. > > Can anybody suggest another approach?
If the species are fixed I recommend using the Keyword type: http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Field.html#Keyword(java.lang.String, java.lang.String) and add each species as a separate field (Lucene can handle multiple fields with the same name). Then the query 'species:"homo sapien" -species:Mammalia' should work. But I think the real problem is that you category hierarchy that you want to filter by, which is awkward to do with Lucene alone. When I come across these situations I normally pair up Lucene with a database that holds the categorization information and take one of two approaches: 1. Do the search in Lucene, then do the category filtering against the database (which holds document/category information). Lucene holds no category information in this case 2. Take the query, look up the relevant category information in the database and expand the query so it only picks up the categories you want (you'd store each category a document is in as a separate Lucene keyword field). -- Miles Barr <[EMAIL PROTECTED]> Runtime Collective Ltd. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]