Hi guys,
as I was reviewing the Table interface, and the Index hierarchy, I had a
bit of time to try to understand this part of the server I was not
comfortable with. And I saw that each index is using two tables : a
forward and a reverse table.
The forward table is obviously mandatory. It is used to retrieve the
list of entry's ID from an AT value. Fine. But what about the reverse
table ?
Alex told me that it's useful when dealing with such filters :
(&(cn=A)(sn=B)).
The optimization engine will get the index that provides the smallest
set of candidate (say, CN here). We then grab from the CN forward table
all the entry's ID associated with the 'A' value. Let's call it the
CN-IDs set.
Now, if we were to grab all the entries from the MasterTable to compare
their SN attribute with the 'B' value, t would be quite costly. We use
the SN's reverse table. For each entry's ID, we check if the associated
value in the SN's reverse table exists or not. If it exists, we have a
valid candidate, otherwise, we simply discard the candidate.
As we can see, we never fetch the entry unless we have a valid candidate
(of course, if the SN and CN AT are indexed).
However, I do think this reverse table is spurious. We can get the same
result by using the SN's forward table : for each entry's ID, check in
the SVn's forward table if the set of values (SN-IDs) contains the ID we
pulled from the CN-IDs set.
We can even do an SN-IDs ^ CN-IDs ( '^' <=> intersection) to get the
list of valid candidates. No need to do N fetches in the reverse table
(where N is the number of ID in CN-IDs).
I think we can remove those reverse table, and I expect that it can
speed up the search a bit, but mainly speed up greatly any modification
operation (add, delete, modify) and even decrease the disk consumption.
May be I'm missing something though. I have created a branch
(http://svn.apache.org/viewvc/directory/apacheds/branches/apacheds-no-reverse-index/)
to play with this idea.
--
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com