Re: [HACKERS] TOAST compression
"Josh Berkus" wrote > > While I'm waiting to figure out how to get the size of the toast table, at > > least I can provide the speed of query with/without assumed compression on > > the 6K text columns. > > Check out the table_size view in the newsysviews project. Andrew computed the > regular, toast, and index sizes as a query. > Will pg_total_relation_size_oid()/pg_total_relation_size_name() do the job? Regards, Qingqing ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] TOAST compression
Luke, > While I'm waiting to figure out how to get the size of the toast table, at > least I can provide the speed of query with/without assumed compression on > the 6K text columns. Check out the table_size view in the newsysviews project. Andrew computed the regular, toast, and index sizes as a query. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] TOAST compression
While I'm waiting to figure out how to get the size of the toast table, at least I can provide the speed of query with/without assumed compression on the 6K text columns. To insure that we're actually accessing the data in the rows, I do a regexp query on the TOASTed rows: mpptestdb=# select count(*) from bigtable1 where b ~ 'a'; count 10 (1 row) Time: 700.574 ms That's 770MB of database data regexp'ed in 0.7 seconds, or about 1.1GB/second. Not bad, but slower by good measure than the I/O subsystem. Now we try the same table with PLAIN storage type: mpptestdb=# select count(*) from bigtable1 where b ~ 'a'; count 10 (1 row) Time: 284.146 ms That's the same 770MB in 0.28 seconds or 2.8GB/s, much better. - Luke ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] TOAST compression
Josh, On 2/26/06 8:04 PM, "Josh Berkus" wrote: > Check out SET STORAGE. I just altered the MIVP data generator in Bizgres MPP to produce the usual 15 column table but with a 6K row size. You'd only expect a few tens of bytes variance around the 6K, and the data is randomly chosen words from a seed file of about 14.5K. I generated a file with 100,000 rows, total size is 600MB. My tests are with Bizgres MPP 2.1 GA (based on Postgres 8.1.3) on 4 machines with 8 primary segments. So, with a column storage type of PLAIN, here's the filesystem resident size: mpptestdb=# select relname,8*relpages/128 as MB from pg_class where relname='bigtable1';; relname | mb ---+- bigtable1 | 789 And here are a couple of selects (obviously coming from the 64GB of RAM): mpptestdb=# select count(*) from bigtable1; count 10 (1 row) Time: 45.685 ms mpptestdb=# select count(*) from bigtable1; count 10 (1 row) Time: 25.024 ms Here it is with the default EXTENDED: mpptestdb=# select relname,8*relpages/128 as MB from pg_class where relname='bigtable1'; relname | mb ---+ bigtable1 | 20 Now I'm stuck - I can get the toast table OID, but my attempts to find out it's size are futile. Suggestions? - Luke ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] TOAST compression
Luke, > As Jim pointed out, we would need a real test to confirm the behavior, > I'm not yet acquainted with the toast compression, so it's harder for me > to compose a real test. Check out SET STORAGE. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] TOAST compression
Hannu, On 2/26/06 12:19 PM, "Hannu Krosing" <[EMAIL PROTECTED]> wrote: >> On DBT-3 data, I've just run some tests meant to simulate the speed >> differences of compression versus native I/O. My thought is that an >> external use of gzip on a binary dump file should be close to the speed of >> LZW on toasted fields, > > Your basic assumption si probbaly wrong :( > > gzip what ? "compression level" setting of gzip has big effect on both > compression speed and compression rate. And I suspect that even the > fastest level (gzip -1) compresses slower and better than postgresql's > lzcompress. I thought it might form a lower bound on speed. 75MB/s on an Opteron is really super fast compared to what I expected from LZ. And since gzip uses LZ compression, maybe they'll be in the same family of results, behavior-wise. I'd be more concerned about the implementation within the executor and how the "get tuple, decompress tuple, use tuple" cycling might destroy the efficiency of the scanning compared to decompressing large blocks of data. As Jim pointed out, we would need a real test to confirm the behavior, I'm not yet acquainted with the toast compression, so it's harder for me to compose a real test. > these are also somewhat bogus tests, if you would want them to be > comparable with dd below, you should have used 'time gzip -c > supplier.bin > /dev/null' Hmm - the file writing seems to matter on decompress, making decompression even faster: [EMAIL PROTECTED] tmp]# time gunzip tmp.bin.gz real0m2.254s user0m1.718s sys 0m0.536s [EMAIL PROTECTED] tmp]# time gzip tmp.bin real0m12.912s user0m12.555s sys 0m0.355s [EMAIL PROTECTED] tmp]# time gzip -c supplier.bin > /dev/null real0m12.582s user0m12.464s sys 0m0.115s [EMAIL PROTECTED] tmp]# time gunzip -c tmp.bin.gz > /dev/null real0m1.734s user0m1.690s sys 0m0.043s - Luke ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] TOAST compression
Ühel kenal päeval, P, 2006-02-26 kell 09:31, kirjutas Luke Lonergan: > Jim, > > On 2/26/06 8:00 AM, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote: > > > Any idea on how decompression time compares to IO bandwidth? In other > > words, how long does it take to decompress 1MB vs read that 1MB vs read > > whatever the uncompressed size is? > > On DBT-3 data, I've just run some tests meant to simulate the speed > differences of compression versus native I/O. My thought is that an > external use of gzip on a binary dump file should be close to the speed of > LZW on toasted fields, Your basic assumption si probbaly wrong :( gzip what ? "compression level" setting of gzip has big effect on both compression speed and compression rate. And I suspect that even the fastest level (gzip -1) compresses slower and better than postgresql's lzcompress. > so I just dumped the "supplier" table (see below) of > size 202MB in data pages to disk, then ran gzip/gunzip on the the binary > file. Second test - an 8k block dd from that same file, meant to simulate a > seq scan (it's faster by 25% than doing it in PG though): > > gzip/gunzip = > [EMAIL PROTECTED] ls -l supplier.bin > -rw-r--r-- 1 mppdemo1 mppdemo1 177494266 Feb 26 09:17 supplier.bin > > [EMAIL PROTECTED] time gzip supplier.bin > > real0m12.979s > user0m12.558s > sys 0m0.400s > [EMAIL PROTECTED] time gunzip supplier.bin > > real0m2.286s > user0m1.713s > sys 0m0.573s these are also somewhat bogus tests, if you would want them to be comparable with dd below, you should have used 'time gzip -c supplier.bin > /dev/null' > [EMAIL PROTECTED] time dd if=supplier.bin of=/dev/null bs=8k > 21666+1 records in > 21666+1 records out > > real0m0.138s > user0m0.003s > sys 0m0.135s Hannu ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] TOAST compression
Luke Lonergan wrote: > Jim, > > On 2/26/06 10:37 AM, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote: > > > So the cutover point (on your system with very fast IO) is 4:1 > > compression (is that 20 or 25%?). > > Actually the size of the gzipp'ed binary file on disk was 65MB, compared to > 177.5MB uncompressed, so the compression ratio is 37% (?), or 2.73:1. I doubt our algorithm would give the same compression (though I haven't really measured it). The LZ implementation we use is supposed to have lightning speed at the cost of a not-so-good compression ratio. > No, unfortunately not. O'Reilly's jobs data have 65K rows, so that would > work. How do we implement LZW compression on toasted fields? I've never > done it! See src/backend/utils/adt/pg_lzcompress.c -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] TOAST compression
Jim, On 2/26/06 10:37 AM, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote: > So the cutover point (on your system with very fast IO) is 4:1 > compression (is that 20 or 25%?). Actually the size of the gzipp'ed binary file on disk was 65MB, compared to 177.5MB uncompressed, so the compression ratio is 37% (?), or 2.73:1. > But that's assuming that PostgreSQL > can read data as fast as dd, which we all know isn't the case. Actually, I had factored that in already. The filesystem delivered 1,200MB/s out of cache in this case - the 300MB/s is what we routinely do from Postgres seqscan per instance on this system. > That's > also assuming a pretty top-notch IO subsystem. True - actually I'm pretty impressed with 75MB/s gunzip speed. > Based on that, I'd argue > that 10% is probably a better setting, though it would be good to test > an actual case (does dbt3 produce fields large enough to ensure that > most of them will be toasted?) No, unfortunately not. O'Reilly's jobs data have 65K rows, so that would work. How do we implement LZW compression on toasted fields? I've never done it! > Given the variables involved, maybe it makes sense to add a GUC? Dunno - I'm not sure how the current scheme works, this is new to me. We had considered using a compression mechanism for table data, but had some completely different ideas, more along the lines of a compressing heap store. The main problem as I see it is the CPU required to get there at reasonable performance as you point out. However, the trend is inevitable - we'll soon have more CPU than we could otherwise use to work with... - Luke ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] TOAST compression
On Sun, Feb 26, 2006 at 09:31:05AM -0800, Luke Lonergan wrote: > Note that this filesystem can do about 400MB/s, and we routinely see scan > rates of 300MB/s within PG, so the real comparision is: > > Direct seqscan at 300MB/s versus gunzip at 77.5MB/s So the cutover point (on your system with very fast IO) is 4:1 compression (is that 20 or 25%?). But that's assuming that PostgreSQL can read data as fast as dd, which we all know isn't the case. That's also assuming a pretty top-notch IO subsystem. Based on that, I'd argue that 10% is probably a better setting, though it would be good to test an actual case (does dbt3 produce fields large enough to ensure that most of them will be toasted?) Given the variables involved, maybe it makes sense to add a GUC? -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] TOAST compression
Jim, On 2/26/06 8:00 AM, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote: > Any idea on how decompression time compares to IO bandwidth? In other > words, how long does it take to decompress 1MB vs read that 1MB vs read > whatever the uncompressed size is? On DBT-3 data, I've just run some tests meant to simulate the speed differences of compression versus native I/O. My thought is that an external use of gzip on a binary dump file should be close to the speed of LZW on toasted fields, so I just dumped the "supplier" table (see below) of size 202MB in data pages to disk, then ran gzip/gunzip on the the binary file. Second test - an 8k block dd from that same file, meant to simulate a seq scan (it's faster by 25% than doing it in PG though): gzip/gunzip = [EMAIL PROTECTED] ls -l supplier.bin -rw-r--r-- 1 mppdemo1 mppdemo1 177494266 Feb 26 09:17 supplier.bin [EMAIL PROTECTED] time gzip supplier.bin real0m12.979s user0m12.558s sys 0m0.400s [EMAIL PROTECTED] time gunzip supplier.bin real0m2.286s user0m1.713s sys 0m0.573s [EMAIL PROTECTED] time dd if=supplier.bin of=/dev/null bs=8k 21666+1 records in 21666+1 records out real0m0.138s user0m0.003s sys 0m0.135s Interpretation = Zip speed: 177.5 MB in 13 seconds, or 13.6 MB/s Unzip speed: 177.5 MB in 2.29 seconds, or 77.5 MB/s Direct access speed: 177.5 MB in 0.138 seconds or 1,286 MB/s Note that this filesystem can do about 400MB/s, and we routinely see scan rates of 300MB/s within PG, so the real comparision is: Direct seqscan at 300MB/s versus gunzip at 77.5MB/s Background data = demo=# \d supplier Table "public.supplier" Column| Type | Modifiers -++--- s_suppkey | integer| not null s_name | character(25) | not null s_address | character varying(40) | not null s_nationkey | integer| not null s_phone | character(15) | not null s_acctbal | numeric(15,2) | not null s_comment | character varying(101) | not null demo=# select relname,8*relpages/128 as MB from pg_class order by relpages desc limit 6; relname | mb --+ lineitem | 123434 orders | 24907 partsupp | 14785 part | 3997 customer | 3293 supplier |202 (6 rows) Time: 2.024 ms demo=# copy supplier to '/tmp/supplier.bin' with binary; COPY Time: 7328.186 ms demo=# copy supplier to '/tmp/supplier.txt'; COPY Time: 5503.168 ms Note how the text file dumps faster than binary, and it's smaller at 148MB. demo=# select version(); version - PostgreSQL 8.1.3 (Bizgres MPP 2.1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 3.4.4 20050721 (Red Hat 3.4.4-2) compiled on Feb 23 2006 11:34:06 (1 row) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] TOAST compression
Neil Conway <[EMAIL PROTECTED]> writes: > toast_compress_datum() considers compression to be "successful" if the > compressed version of the datum is smaller than the uncompressed > version. I think this is overly generous: if compression reduces the > size of the datum by, say, 0.01%, it is likely a net loss to use the > compressed version of the datum since we'll need to pay for LZ > decompression every time that we de-TOAST it. This situation can occur > frequently when storing "mostly-uncompressible" data (compressed images, > encrypted data, etc.) -- some parts of the data will compress well (e.g. > metadata), but the vast majority will not. Does it really occur frequently? When dealing with already-compressed or encrypted data, the LZ transform actually makes the data larger by some small percentage. This will outweigh any savings on compressible headers or what have you, just because those are only a tiny part of the file to begin with. (Else the format designers would have found a way to compress them too.) So I'd expect the existing test to catch most of the real-world cases you cite. I'm not particularly inclined to worry about this without some hard evidence that it's a problem. You'd need some numerical evidence anyway to justify any specific threshold, else it's just as arbitrary as "is it smaller" ... and the latter at least requires a few instructions less to check. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] TOAST compression
On Sat, Feb 25, 2006 at 09:39:34PM -0500, Neil Conway wrote: > It's true that LZ decompression is fast, so we should probably use the > compressed version of the datum unless the reduction in size is very > small. I'm not sure precisely what that threshold should be, however. Any idea on how decompression time compares to IO bandwidth? In other words, how long does it take to decompress 1MB vs read that 1MB vs read whatever the uncompressed size is? -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] TOAST compression
Neil Conway wrote: > toast_compress_datum() considers compression to be "successful" if the > compressed version of the datum is smaller than the uncompressed > version. I think this is overly generous: if compression reduces the > size of the datum by, say, 0.01%, it is likely a net loss to use the > compressed version of the datum since we'll need to pay for LZ > decompression every time that we de-TOAST it. This situation can occur > frequently when storing "mostly-uncompressible" data (compressed images, > encrypted data, etc.) -- some parts of the data will compress well (e.g. > metadata), but the vast majority will not. > > It's true that LZ decompression is fast, so we should probably use the > compressed version of the datum unless the reduction in size is very > small. I'm not sure precisely what that threshold should be, however. > > Comments? 20%? 25% -- Bruce Momjian http://candle.pha.pa.us SRA OSS, Inc. http://www.sraoss.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] TOAST compression
toast_compress_datum() considers compression to be "successful" if the compressed version of the datum is smaller than the uncompressed version. I think this is overly generous: if compression reduces the size of the datum by, say, 0.01%, it is likely a net loss to use the compressed version of the datum since we'll need to pay for LZ decompression every time that we de-TOAST it. This situation can occur frequently when storing "mostly-uncompressible" data (compressed images, encrypted data, etc.) -- some parts of the data will compress well (e.g. metadata), but the vast majority will not. It's true that LZ decompression is fast, so we should probably use the compressed version of the datum unless the reduction in size is very small. I'm not sure precisely what that threshold should be, however. Comments? -Neil ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match