On Fri, Jun 24, 2011 at 11:00 AM, Emmanuel Lécharny <elecha...@apache.org> wrote: > On 6/24/11 9:51 AM, Alex Karasulu wrote: >> >>>> The reverse index has no duplicate keys. The only way to get a >>>> duplicate key in the reverse index is if the same entry (i.e. 37) >>>> contained the same value ('foo') for the same (sn) attribute. And this >>>> we know is not possible. So the lookups against the reverse table will >>>> be faster. >>> >>> I was thinking about something a bit different : as soon as you have >>> grabbed >>> the list of entry's ID from the first index, looking into the other >>> indexes >>> will also return a list of Entry's ID. Checking if those IDs are valid >>> candidate can then be done in one shot : do the intersection of the two >>> sets >>> (they are ordered, so it's a O(n) operation) and just get the matching >>> entries. >>> >>> Compared to the current processing (ie, accessing the reverse index for >>> *each* candidate), this will be way faster, IMO. >> >> This is a VERY interesting idea. Maybe we should create a separate >> thread for this and drive deeper into it. You got something I think >> here. >> > have a look at > https://cwiki.apache.org/confluence/display/DIRxSRVx11/Index+and+IndexEntry, > where I added some paragraphs explaining this idea. We can comment on this > page.
Nice pictures - what did you use for that? Reading further ... Also if you're doing this in a branch, hence we're not yet committed on the approach, can you please do this on a separate page so you don't alter the existing documentation? Thanks, Alex