Hi,
Here are the results of the last tests. I will not conduct any
more tests since I'm now 100% sure that the following rule applies:
When using compression, if your database is big enough you
also win real time processing.
Reminder: the transparent Berkeley DB compression code used
in htdig3 compresses the file to 50% (+ 0.1% in the worst case).
The value of 'big enough' depends on your configuration.
It's around 1Gb on a PII350, 128Mb RAM, 6Mb/s disk, Linux-2.2, 32Mb cache.
It's around 5Gb on a PIII450, 512Mb RAM, 12Mb/s disk, FreeBSD-3.2, 64Mb cache.
Why do we win real-time ? There are two candidate factors:
. File that is 50% smaller reduces seek latency to fetch
blocks.
. The kernel buffer cache contains twice as much usable data.
Keith Bostic argues that seek latency has no visible effect. I
tend to agree with him, given the fact that the elevator algorithm of
the kernel sorts I/O operations. The gain is, IMHO, mainly due the cache
being twice as big because it contains compressed data.
The following test results (make bench in htdig3/test) are run
on PIII450, 512Mb RAM, 12Mb/s disk, FreeBSD-3.2, 64Mb cache. The real
time gain is 5% only for a 5.8Gb file, that's why I think the
threshold is around 5Gb.
From now on I will therefore assume that compression is a very
good thing because it offers a win-win situation for large databases :-)
make bench
make[1]: Entering directory `/usr/home/loic/htdig3/test'
rm -f /spare/test /spare/test_weakcmpr
/usr/bin/time -l ../test/dbbench -C `expr 64 \* 1024 \* 1024` -S 8192 -z -w words.all
-l 180 -B /spare/test
Reading from words.all ... pushed 133495560 words
37825.68 real 15860.08 user 6371.04 sys
83492 maximum resident set size
270 average shared memory size
1333 average unshared data size
154 average unshared stack size
172566485 page reclaims
1 page faults
0 swaps
16946 block input operations
2584557 block output operations
0 messages sent
0 messages received
0 signals received
28274 voluntary context switches
317480 involuntary context switches
ls -l /spare/test
-rw-r--r-- 1 loic wheel 2913034240 Aug 12 04:12 /spare/test
if [ -f /spare/test_weakcmpr ] ; then ../db/dist/db_dump -p /spare/test_weakcmpr ; fi
format=print
type=btree
recnum=1
bt_minkey=2
db_pagesize=8192
HEADER=END
./db/dist/db_stat -z -d /spare/test
0x53162 Btree magic number.
6 Btree version number.
Flags:
2 Minimum keys per-page.
8192 Underlying tree page size.
4 Number of levels in the tree.
133M Number of keys in the tree.
3692 Number of tree internal pages.
707475 Number of tree leaf pages.
0 Number of tree duplicate pages.
0 Number of tree overflow pages.
0 Number of pages on the free list.
12M Number of bytes free in tree internal pages (59% ff).
2858M Number of bytes free in tree leaf pages (196% ff).
0 Number of bytes free in tree duplicate pages (0% ff).
0 Number of bytes free in tree overflow pages (0% ff).
make[1]: Leaving directory `/usr/home/loic/htdig3/test'
make CMPR='' bench
make[1]: Entering directory `/usr/home/loic/htdig3/test'
rm -f /spare/test /spare/test_weakcmpr
/usr/bin/time -l ../test/dbbench -C `expr 64 \* 1024 \* 1024` -S 8192 -w words.all -l
180 -B /spare/test
Reading from words.all ... pushed 133495560 words
36121.84 real 5456.87 user 674.53 sys
82988 maximum resident set size
327 average shared memory size
4776 average unshared data size
186 average unshared stack size
21252 page reclaims
7 page faults
0 swaps
24638 block input operations
2979151 block output operations
0 messages sent
0 messages received
0 signals received
47394 voluntary context switches
108297 involuntary context switches
ls -l /spare/test
-rw-r--r-- 1 loic wheel 5825888256 Aug 12 14:56 /spare/test
if [ -f /spare/test_weakcmpr ] ; then ../db/dist/db_dump -p /spare/test_weakcmpr ; fi
./db/dist/db_stat -d /spare/test
0x53162 Btree magic number.
6 Btree version number.
Flags:
2 Minimum keys per-page.
8192 Underlying tree page size.
4 Number of levels in the tree.
133M Number of keys in the tree.
3692 Number of tree internal pages.
707475 Number of tree leaf pages.
0 Number of tree duplicate pages.
0 Number of tree overflow pages.
0 Number of pages on the free list.
12M Number of bytes free in tree internal pages (59% ff).
2858M Number of bytes free in tree leaf pages (196% ff).
0 Number of bytes free in tree duplicate pages (0% ff).
0 Number of bytes free in tree overflow pages (0% ff).
make[1]: Leaving directory `/usr/home/loic/htdig3/test'
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.