AW: Delete Indexed from Merged Document

2004-06-23 Thread Wolf-Dietrich . Materna
Hello, 
 Karthik N S [mailto:[EMAIL PROTECTED] wrote:
 Hi
 Mr Wolf  
Wolf-Dietrich is my first name, so leave out Mr. or use
my family name (which is uncommon here).

   What is this
 
 // remove the document from index
   int docID = hits.id(0);
 
  and can I increment the 0 factor  in the bracket ...for deletion
Yes, but there is no reason to do this in this case.
You search for documents using their file name (including their full path!).
You get a result (some kind of list). Please read Java-Docs about Hits
class.
hits.id(0) returns the (internal) ID of the first hit in your result.
This is the document that you want to remove (using
indexReader.delete(...).).
There are no more documents in your result hits unless your key is not
unique.
hits.length() returns 0 or 1.
Regards,
Wolf-Dietrich Materna

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



AW: QueryParser handling a NOT query on its own

2004-06-21 Thread Wolf-Dietrich . Materna
Hello,
 Allen Atamer [mailto:[EMAIL PROTECTED] 
 
 The Javadoc spec calls for one or more clauses in a query, 
 but I had trouble with a NOT query just on its own. For example
Most search engines including Lucene doesn't support this query type.
That's why the query parser treats this queries as invalid (if
it recognizes them).

 QueryParser.parse(my_field:-exclude) throws a parsing exception
 
 Same with
 
 QueryParser.parse(my_field:-(exclude))
 QueryParser.parse(my_field:(* AND -exclude)
These queries are defined as invalid.

 The query QueryParser.parse(my_field:(-(exclude))) gives a 
 legitimate query that brings no results.
This is the result returned by many search engines. They select a document 
set by searching for the query terms first and get the documents which
contains 
them from index. Because your query doesn't contains any term which should
be part of a document, your result is empty. Note that they check the
excluded words only for documents selected during the first step to save
time and memory.

 What I would expect is the following: If I have an index with 
 100 total entries, and 20 records with the word exclude in 
 them, then the above queries should give 80 hits. There is no 
 test case for this scenario in TestQueryParser. Please 
 confirm whether this is a bug or not,
This is no error. You might think, that it should be easy to
get all documents and discard all of them which contains the
exluded word.
But imaging your index contains about 1 million documents, the search
requires a very long time and the result will be very huge (and so
mostly useless). Many Engines avoid this trouble and return an empty 
result or report an error.

You can workaround this limitation by adding a dummy field and
term to each document while creating your index, e.g.
  document.add(Field.Text(dummy_field, dummy_value));
You have to add dummy_field:dummy_value to your query string, e.g.:
dummy_field:dummy_value my_field:-(exclude) will returns 80 hits
in your scenario (after rebuild your index).
Warning: the performance will be very poor on larger indexes and
users can run denial of service attacks by sending such queries.

Note that a single * is not allowed too, because performance would be 
very poor if expression starts with a wildcard. In this case Lucene
would run of memory too, because the internal result contains all 
words/terms stored in your index.
Regards,
Wolf-Dietrich Materna

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]