Hello Pulkit, thank you for your answer and excuse me for my late reply. I am currently working on the payload stuff and have implemented my own Analyzer and Tokenfilter for adding custom payloads. As far as I understand I can add Payload for every term occurence and write this into the posting list. My posting list now looks like this:
car -> DocID1, [Payload 1], DocID2, [Payload2]....., DocID N, [Payload N] Where each payload is a BitSet depending on the versions of a document. I must admit that the index is getting really big at the moment because I am adding around 8 to 16 bytes with each payload. I have to find a good compression for the bitvectors. Further I am always getting the error org.apache.lucene.index.CorruptIndexException: checksum mismatch in segments file if I use my own Analyzer. After I uncomment the checksum test everything works fine. Even Luke isn't giving me an error. Any ideas? Another problem is the BitVector creation during tokenization. I am running through all versions during the tokenizing step for creating my bitvectors (stored in a HashMap). So my bitvectors are completly created after the last field is analyzed (I added every wikipedia verison as an own field). Therefore I need to add the payload after the tokenizing step. Is this possible? What happens if I add payload for a current term and I add another payload for the same term later ? Is it overwritten or appended? Greetings Alex -- View this message in context: http://lucene.472066.n3.nabble.com/Implementing-indexing-of-Versioned-Document-Collections-tp1872701p1910449.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org