Sleepycat Software writes:

 > The question is if it's creating a new level in the Btree.  If it

 Tests shows that 11 million keys create a level 4 btree. In the
tests I'm inserting a sorted list of keys, in ascending order. I understand
that 11 million keys should fit in a level 3 btree. Since there is
no tree balancing I don't know how to get the optimal case. In addition
to balancing we will also need a re-packer to get rid of the 40% free
space in leaf pages. I understand that this free space helps when inserting
new entries. But if I want to publish a read-only database, I want to
get rid of them.

 > Well, yeah, but we're still talking a problem that can be solved
 > for $10.

 A friend of mine is exactly defending the same point :-) After many
discussions with him, my final argument is the following:
We want htdig indexing to be near optimal for
               . space usage
               . dynamic updating
               . scalability
               . performances
               . functionalities
because we don't want to write it from scratch next year. If we
neglect one point or another, that's what is going to happen. Indexing
can be considered as a middleware, despite the fact that it's currently
seen as an application. You want indexes for everything and everywhere
(in your mail box, for all the text files on your disk etc.). The main
reason why it's not used as a middleware is because at present all 
indexing libraries use either two much disk space or cannot be updated
dynamically. People will *not* by a new hard drive to index their stuff.
They will index their stuff as long as it fits on their disk or not index
it at all. 

 > I don't mean to be a jerk, honest, but the obvious answer here is
 > to switch to a different operating system.  There are lots of free
 > OS releases that support large filesystems.  Memory is cheap, disk
 > is cheap, development is very, very expensive.  I expect to support

 How much money is a CD editor ready to pay to fit something on 1 cd
instead of 2 ? How much does it cost to an internet search engine to
have 300 Gb of indexes instead of 150 Gb ? Space is time :-) I agree
with you about the 2Gb limit : change OS or apply patch. But so many
people will be so happy if they can index twice as much without being
forced to upgrade/change/patch.

 > My guess is that we'll do it sometime in September/October.
 > I've got customers that I've promised this to, so it's very
 > high on our list.

 I'll try to implement this starting from now. The fact that I've
been able to build a 60 millions entries tree in 1.2gb file gives me
hope. I would really need an hint, though : why do we have 40% free
space when using variable length keys/data (varying from 6 bytes to 35
bytes, average 8 bytes, page size 4k) ? Is there a way to reduce this ?

       Thanks,

-- 
                Loic Dachary

                ECILA
                100 av. du Gal Leclerc
                93500 Pantin - France
                Tel: 33 1 56 96 09 80, Fax: 33 1 56 96 09 61
                e-mail: [EMAIL PROTECTED] URL: http://www.senga.org/


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to