Sleepycat Software writes:

 > Yes, I guess that works... might as well use 2K pages on disk
 > so that you get a finer granularity from the compression, I
 > don't think it makes the problem harder.

 Yes. The bigger the pages are, the better the compression rate is.
But I did the tests on 4k pages -> 2k pages and only get < 0.1% exceptions.

 > You will have to maintain a list of free pages somewhere (when
 > an in-memory page in the middle of the database changes and moves
 > because it needs more on-disk pages to be allocated, you'll have
 > "free" pages in the middle of the on-disk file.)

 That the problem I'm facing right now. To store the page numbers, 
no problem. I can just say : the 4k page must compress to 2k - 8bytes
and I have room to store the possible exception page number without
polluting anything..
 When a page is freed, however, it must be stored in a free list. 
I try to figure out how to do while keeping in mind that these pages
are in a shared memory pool that can be used by many process and many
threads in a process. The fact that getting a page will only occur in
less than 0.1% may be an excuse to agressively lock the free list...
I keep searching/reading code.

 > It's unclear to me that's the worst case -- the unused space on
 > the pages is probably all 0's, which means it will compress
 > well.  If I had to think of a "worst case", I'd try storing a
 > set of already zlib compressed binaries in a Btree, with their
 > paths as the key.

 I meant worst case for my usage pattern. I agree that it's not
intuitive that it compresses better when it's sorted. I'll do some
more testing. Obviously, if data within the pages are already compressed,
and all different, this is an absolute worst case. I agree that 
transparent compression will only succeed for specific usage patterns,
not all.

 > Regardless, those numbers sound good.  What was the range of
 > compression?  How many compressed down to 1K?

 I didn't check that.. wait ... here : 16106 out of 230073 are <=
1024, 153076 out of 230073 are <= 1500 (2 are < 256 bytes :-).

 > I'd also like to see numbers on 8K pages -- 8K is the "standard"
 > out there, and you'll get better compression from the large size.

 I can't do that right now, sorry (I deleted my 900Mb test file).

 > We need to start thinking about recovery, too.  If the system
 > crashes when you've only written one on-disk page of a two
 > on-disk page pair, how do you do recovery to guarantee that no
 > data is ever lost?

 Arg. Thanks about this one :-}

-- 
                Loic Dachary

                ECILA
                100 av. du Gal Leclerc
                93500 Pantin - France
                Tel: 33 1 56 96 09 80, Fax: 33 1 56 96 09 61
                e-mail: [EMAIL PROTECTED] URL: http://www.senga.org/


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to