Jim, On 2/26/06 8:00 AM, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote:
> Any idea on how decompression time compares to IO bandwidth? In other > words, how long does it take to decompress 1MB vs read that 1MB vs read > whatever the uncompressed size is? On DBT-3 data, I've just run some tests meant to simulate the speed differences of compression versus native I/O. My thought is that an external use of gzip on a binary dump file should be close to the speed of LZW on toasted fields, so I just dumped the "supplier" table (see below) of size 202MB in data pages to disk, then ran gzip/gunzip on the the binary file. Second test - an 8k block dd from that same file, meant to simulate a seq scan (it's faster by 25% than doing it in PG though): ==================== gzip/gunzip ===================== [EMAIL PROTECTED] ls -l supplier.bin -rw-r--r-- 1 mppdemo1 mppdemo1 177494266 Feb 26 09:17 supplier.bin [EMAIL PROTECTED] time gzip supplier.bin real 0m12.979s user 0m12.558s sys 0m0.400s [EMAIL PROTECTED] time gunzip supplier.bin real 0m2.286s user 0m1.713s sys 0m0.573s [EMAIL PROTECTED] time dd if=supplier.bin of=/dev/null bs=8k 21666+1 records in 21666+1 records out real 0m0.138s user 0m0.003s sys 0m0.135s ==================== Interpretation ===================== Zip speed: 177.5 MB in 13 seconds, or 13.6 MB/s Unzip speed: 177.5 MB in 2.29 seconds, or 77.5 MB/s Direct access speed: 177.5 MB in 0.138 seconds or 1,286 MB/s Note that this filesystem can do about 400MB/s, and we routinely see scan rates of 300MB/s within PG, so the real comparision is: Direct seqscan at 300MB/s versus gunzip at 77.5MB/s ==================== Background data ===================== demo=# \d supplier Table "public.supplier" Column | Type | Modifiers -------------+------------------------+----------- s_suppkey | integer | not null s_name | character(25) | not null s_address | character varying(40) | not null s_nationkey | integer | not null s_phone | character(15) | not null s_acctbal | numeric(15,2) | not null s_comment | character varying(101) | not null demo=# select relname,8*relpages/128 as MB from pg_class order by relpages desc limit 6; relname | mb ----------+-------- lineitem | 123434 orders | 24907 partsupp | 14785 part | 3997 customer | 3293 supplier | 202 (6 rows) Time: 2.024 ms demo=# copy supplier to '/tmp/supplier.bin' with binary; COPY Time: 7328.186 ms demo=# copy supplier to '/tmp/supplier.txt'; COPY Time: 5503.168 ms ******** Note how the text file dumps faster than binary, and it's smaller ******** at 148MB. demo=# select version(); version ---------------------------------------------------------------------------- ---------------------------------------------------------------------------- ----- PostgreSQL 8.1.3 (Bizgres MPP 2.1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 3.4.4 20050721 (Red Hat 3.4.4-2) compiled on Feb 23 2006 11:34:06 (1 row) ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match