Would it be possible somehow to make skip list (postings) RLE compressed
without affecting performance in cases where RLE cannot identify longer runs?
we have an unusual (?) case where we have an opportunity to sort documents on
category field before indexing. this order gets slightly disturbed during
updates on index, but they normally they stay mostly sorted. Also, we noticed
that there are longer runs of doc IDs even in standard case for hi frequency
tokens due to some sort of locality (eg. web pages from one web site tend to
have a lot of tokens in common)...
having RLE compressed skip lists for our category fields would bring huge
savings (less readVInts), but on the other side it requires at least one if()
in tight loops in next() and skipTo() that can slow down sparse case.
so the questions would be, can one see benefits in standard case? is it doable
at all without turning everything upside down? maybe already there, me being
plain stupid here....
___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]