On Thu, Feb 11, 2010 at 08:30:14AM -0500, Michael McCandless wrote:
> Oh you're saying we don't know if the underlying enum actually skipped vs
> just scanned?
Yep.
> Isn't the skip data also based on deltas?
Yes, but that's internal to the skip reader, in both Lucene and Lucy/KS. When
it comes time to skip, the skip reader's doc id is assigned directly, in both
libraries. From StandardPostingsReaderImpl.java:
doc = skipper.getDoc();
Trying to apply the skip reader's doc id information as a delta would get
quite complicated. (A delta against... what?) I'm not sure that's even
possible.
> So even if real skipping happened, Lucy/KS would not "lose" the offset that
> the aggregator had previously added? Or maybe I'm lost on what the issue is
> here...
It would indeed "lose" the offset, because the skip reader's doc id
information gets assigned directly rather than applied as a delta.
And since the aggregator layer is not aware of when this occurs, it cannot
intervene to re-apply the offset.
Having driven down this dead-end, turned around and come back, I've become
persuaded that requiring the segment-level postings iterator to be aware of
its consumer is not a good idea.
> > A generic aggregator wouldn't know that it needed to do that. The postings
> > codec developer would be forced to write aggregation code in addition to
> > segment-level code.
>
> Right, if position were not primitive but contained within an opaque
> (to the aggregator) object. And, you were doing the flat positions
> space.
>
> I guess... this restriction still seems academic... ie, not a real
> issue in Lucene.
Not for the standard posting formats that Lucene offers. But the point of
flex is to provide an extension framework, I thought.
Well, whatever. It's just another place where Lucy and Lucene will part ways.
Marvin Humphrey
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]