B-Tree's are best for random, incremental updates. They require log_b(N) disk accesses for inserts, deletes and accesses, where b is the number of entries per page, and N is the total number of entries in the tree. But that's too slow for text indexing. Rather Lucene uses a combination of file sorting and merging to update indexes, which is much faster than a B-tree would be. For access, Lucene is equivalent to a B-Tree with all but the leaves cached in memory, so that accesses require only a single disk access.

Slides 5 and 6 of the following presentation discuss this a bit:

http://www.research.ibm.com/haifa/Workshops/ir2005/papers/DougCutting-Haifa05.pdf

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to