New codecs keep Freq skip/omit Pos

Alex vB Thu, 21 Apr 2011 18:52:34 -0700

Hello everybody,

I am currently testing several new Lucene 4.0 codec implementations to
compare with an own solution.
The difference is that I am only indexing frequencies and not positions. I
would like to have this for the other codecs. I know there was already a
post for this topic
http://lucene.472066.n3.nabble.com/Omit-positions-but-not-TF-td599710.html.

I just wanted to ask if there has something changed especially for the new
codecs.
I had a look at the FixedPostingWriterImpl and PostingsConsumer. Are those
they right places for adapting Pos/Freq handling? What would happen if I
just skip writing postions/payloads? Would it mess up the index?

The written files have different endings like pyl, skp, pos, doc etc. Gives
me "not counting" the pos file a correct index size estimation for W Freqs
W/O Pos? Or where exactly are term positions written?

Regards
Alex

PS: Some results with the current codecs if someone is interested. I indexed
10% of Wikipedia(english).
Each version is indexed as document.

Docs 240179
Versions 8467927
Distinct Terms 3501214
total Terms 1520008204
Avg. Versions 35.25
Avg. Terms per Version 179.50
Avg. Terms per Doc 6328.65

PforDelta W Freq W Pos 20.6 GB
PforDelta W/O Freq W/O Pos 1.6 GB
Standard 4.0 W Freq W Pos 28.1 GB
Standard 4.0 W/O Freq W/O Pos 6.2 GB
Pfor W Freq W Pos 22 GB
Pfor W/O Freq W/O Pos 3.1 GB

Performance follows ;)

--
View this message in context:
http://lucene.472066.n3.nabble.com/New-codecs-keep-Freq-skip-omit-Pos-tp2849776p2849776.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

New codecs keep Freq skip/omit Pos

Reply via email to