identifying a document

Samuel LEMOINE Fri, 20 Jul 2007 06:01:31 -0700

Hi everybody !

I'm asking myself about the way Lucene deals with deleting documents.

As far as I know, a document is identified by a document number, butthis document number is not reliable for long-term issues as it maychange on segment merging.The way Lucene deletes documents' data from the index questions me,cause it relies on terms (or document number, which as told above is notreliable, and must be retrieved from a request). The methods I've foundfor deleting documents from the index are those from IndexWriter andIndexReader classes, deleteDocuments(term ) or deleteDocuments(term[] ).These methods deletes the index'entries containing the given term.According to the API javadoc, deleteDocuments(term[] ) will delete eachfile that contains at least one of the given terms: if it really worksin this way, I don't really understand why it's does so. Wouldn't it bemore useful if it deleted each file containing *all of* the given terms?(or maybe it'is the way it works actually?)These reflexions lead me to conclude that, in order to be able to removethe entries of a specific document in a Lucene index, we must store anuntokenized field to identify each document solely. I find it strangehaving to use such an "artifice" to keep traces of documentsindependantly. It's not very impeding, it's just... strange.

Any contributive thinkings on this matter are welcome :)


Thanks for reading,

Samuel

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

deleting/updating/identifying a document

Reply via email to