Take a look at the source in IndexHTML.java (C:\lucene-
2.1.0\src\demo\org\apache\lucene\demo on my machine). The code goes through
quite a bit of effort to remove old documents identified by uid. My comment
was really that the underlying engine doesn't recognize duplicates, any such
requirements must be implemented on top of the base engine.

But as an exercise, how would you imagine the *engine* could implement
anything like this? The only thing I can imagine is that a field would be
identified as unique (similar to a database UNIQUE constraint on a column).
But now we're mixing databases and text searching, and I don't want to go
there....

Of course, this would all work if we could just create the DWIM algorithm...
Do What I Mean......

Erick

On 4/21/07, jim shirreffs <[EMAIL PROTECTED]> wrote:

"Lucene has no concept of "document identity" in that you can index
the same document 15 times in a row and Lucene will have 15 entries. "

Is this true? When ever I run the demo indexing logic document already
indexed are skipped. What am I missing.

jim s


start java org.apache.lucene.demo.IndexHTML -index /opt/lucene/index  ..


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to