Hi Paul, thanks for feedback
>I suppose you mean run length encoding? (I missed the first posting about
this.)
You are right, that is what I meant by RLE. This was the first posting. I am
just trying get some feedback to see if there are some knock-out conditions
disqualifying this idea.
a bit of background,
It came up as a side effect after playing with Filter , better said, with your
Matcher idea. We found some *relatively* clean way to get around BitSet -
Filter marriage by avoiding Filter completely and making our own
"SlicingIndexReader extends FilterIndexReader" that has ability to receive
Matcher(filter) from outside. Also, FilteredTermDocs implements TermDocs
inside of it to use this "Matcher" to do skipping. We could not wait for
your nice code that solves Filter problems to get on svn. I did not bother
anybody with this code as I think it solves problem with Filter on conceptually
soft ground. It provides me with capacity to make different view on index
subset defined by Matcher, but does not have flexibility your approach with
Matcher has (eg BooleanQuery.add(Matcher, Occur)...). Also, it is fast, as it
filters out at "the source", with Scorers totally unaware of it. ....
irrelevant, just mentioning it as maybe someone finds this idea
useful for something else.
Back to the RLE, by doing all this, we came up with one
SortedIntIntervalListMatcher as we have some fields in index that compress
perfectly using this trick! IntervalList is practically the same thing as RLE,
solves the same problem. So the idea, "can RLE save some ticks/ index size
without affecting performance in typical non-sorted case", I would say yes, but
it is good idea to ask for feedback from people more familiar with multi
interval skip lists and bit level gurus like you and Yonik, honestly, I have
no idea what would be relative cost of an extra if(0xFF==b) in tight next() and
skipTo() methods, as this determines how big slow down in the worst case will
be (totally sparse, no RLE "firing").
>You could try and use a VInt flag value (how about 0xFF ?) to start an encoded
>run length encoded series of bits. For example 0xFF would be followed by the
>next delta as a VInt, and by the run length as the next VInt.
>You might also try and generalize the bytes of VInt to nibbles (half bytes).
I will see, I need to figure out some experiments that are not as involved as
implementing it on Lucene index (change index format and whatnot..)
Anyhow, it is encouraging not to have immediate "sorry, cannot work because
...." from you or someone else on this list :)
thanks again for feedback,
e.
___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]