Hello everybody, I am currently testing several new Lucene 4.0 codec implementations to compare with an own solution. The difference is that I am only indexing frequencies and not positions. I would like to have this for the other codecs. I know there was already a post for this topic http://lucene.472066.n3.nabble.com/Omit-positions-but-not-TF-td599710.html.
I just wanted to ask if there has something changed especially for the new codecs. I had a look at the FixedPostingWriterImpl and PostingsConsumer. Are those they right places for adapting Pos/Freq handling? What would happen if I just skip writing postions/payloads? Would it mess up the index? The written files have different endings like pyl, skp, pos, doc etc. Gives me "not counting" the pos file a correct index size estimation for W Freqs W/O Pos? Or where exactly are term positions written? Regards Alex PS: Some results with the current codecs if someone is interested. I indexed 10% of Wikipedia(english). Each version is indexed as document. Docs 240179 Versions 8467927 Distinct Terms 3501214 total Terms 1520008204 Avg. Versions 35.25 Avg. Terms per Version 179.50 Avg. Terms per Doc 6328.65 PforDelta W Freq W Pos 20.6 GB PforDelta W/O Freq W/O Pos 1.6 GB Standard 4.0 W Freq W Pos 28.1 GB Standard 4.0 W/O Freq W/O Pos 6.2 GB Pfor W Freq W Pos 22 GB Pfor W/O Freq W/O Pos 3.1 GB Performance follows ;) -- View this message in context: http://lucene.472066.n3.nabble.com/New-codecs-keep-Freq-skip-omit-Pos-tp2849776p2849776.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org