Hi Team, the question of how to delete with IndexWriter using doc ids is currently being discussed on java-user (http://www.gossamer-threads.com/lists/lucene/java-user/57228), so I thought this is a good time to mention an idea that I recently had. I'm planning to work on column-stored fields soon (I used to call them per-document payloads). Then we'll have the ability to store metadata for each document very efficiently in the index.
This new data structure could be used to store a unique ID for each doc in the index. The IndexReader would then get an API that provides a mapping from the dynamic doc ids to the new unique ones. We would also have to store a reverse mapping (UID -> ID) in the index - we could use a VInt list + skip list for that. Then we should be able to make IndexReaders "read-only" (LUCENE-1030) and provide a new API in IndexWriter "delete by UID". This would allow to "delete by query" as well. The disadvantage is that the index would become bigger, but that should still be ok: 8 bytes per doc for the ID->UID map (assuming we took long for the UID, which I'd suggest). The UID->ID map might even be a bit smaller initially (using VInts and VLongs), but might become bigger when the index has lot's of deleted docs, because then the delta encoding wouldn't be as efficient anymore for the UIDs. If RAM permits, the maps could also be cached in memory (optional, configurable). The FieldCache overhaul (LUCENE-831) with column fields as source can help here. After all this is implemented (column fields, UIDs, "read-only" IndexReaders, FieldCache overhaul) I'd like to make the column fields (and norms) updateable via IndexWriter. OK lot's of food for thought. -Michael --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]