Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-11-05 Thread Dann Corbit
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:pgsql-general-
> [EMAIL PROTECTED] On Behalf Of Ivan Voras
> Sent: Wednesday, November 05, 2008 3:28 PM
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Are there plans to add data compression feature
> to postgresql?
> 
> Peter Eisentraut wrote:
> > Craig Ringer wrote:
> >> So - it's potentially even worth compressing the wire protocol for
> use
> >> on a 100 megabit LAN if a lightweight scheme like LZO can be used.
> >
> > LZO is under the GPL though.
> 
> But liblzf is BSD-style.
> 
> http://www.goof.com/pcg/marc/liblzf.html

Here is a 64 bit windows port of that library:
http://cap.connx.com/chess-engines/new-approach/liblzf34.zip

It has fantastic compression/decompression speed (100 MB well under a second to 
either compress or decompress) and I see about 50% compression.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-11-05 Thread Joshua D. Drake
On Thu, 2008-11-06 at 00:27 +0100, Ivan Voras wrote:
> Peter Eisentraut wrote:
> > Craig Ringer wrote:
> >> So - it's potentially even worth compressing the wire protocol for use
> >> on a 100 megabit LAN if a lightweight scheme like LZO can be used.

Yes compressing the wire protocol is a benefit. You can troll the
archives for when this has come up in the past. CMD at one time had a
hacked up version that proved compression was a benefit (even at 100Mb).
Alas it was ugly, :P... If it was done right, it would be a great
benefit to folks out there.

Joshua D. Drake

> > 
> > LZO is under the GPL though.
> 
> But liblzf is BSD-style.
> 
> http://www.goof.com/pcg/marc/liblzf.html
> 
-- 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-11-05 Thread Ivan Voras
Peter Eisentraut wrote:
> Craig Ringer wrote:
>> So - it's potentially even worth compressing the wire protocol for use
>> on a 100 megabit LAN if a lightweight scheme like LZO can be used.
> 
> LZO is under the GPL though.

But liblzf is BSD-style.

http://www.goof.com/pcg/marc/liblzf.html



signature.asc
Description: OpenPGP digital signature


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-11-03 Thread Craig Ringer
Scott Ribe wrote:
>> It doesn't matter that much, anyway, in that deflate would also do the
>> job quite well for any sort of site-to-site or user-to-site WAN link.
> 
> I used to use that, then switched to bzip. Thing is, if your client is
> really just issuing SQL, how much does it matter?

It depends a lot on what your requests are. If you have queries that
must return significant chunks of data to the client then compression
will help with total request time on a slow link, in that there's less
data to transfer so the last byte arrives sooner. Of course it's
generally preferable to avoid transferring hundreds of KB of data to the
client in the first place, but it's not always practical.

Additionally, not all connection types have effectively unlimited data
transfers. Many mobile networks, for example, tend to have limits on
monthly data transfers or charge per MB/KB transferred.

Wire compression would be nice for performance on slower networks, but
it's mostly appealing for reducing the impact on other users on a WAN,
reducing data transfer costs, reducing required WAN capacity, etc.

It's appealing because it looks like it should be possible to make it
quite simple to enable or disable, so it'd be a simple ODBC/JDBC
connection option.

> Compression can't help
> with latency.

Not with network round trip latency, no.

--
Craig Ringer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-11-03 Thread Scott Ribe
> It doesn't matter that much, anyway, in that deflate would also do the
> job quite well for any sort of site-to-site or user-to-site WAN link.

I used to use that, then switched to bzip. Thing is, if your client is
really just issuing SQL, how much does it matter? Compression can't help
with latency. Which is why I went with 3 tiers, so that all communication
with Postgres occurs on the server, and all communication between server &
client is binary, compressed, and a single request/response per user request
regardless of how many tables the data is pulled from.

-- 
Scott Ribe
[EMAIL PROTECTED]
http://www.killerbytes.com/
(303) 722-0567 voice



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-11-03 Thread Craig Ringer
Peter Eisentraut wrote:
> Craig Ringer wrote:
>> So - it's potentially even worth compressing the wire protocol for use
>> on a 100 megabit LAN if a lightweight scheme like LZO can be used.
> 
> LZO is under the GPL though.

Good point. I'm so used to libraries being under more appropriate
licenses like the LGPL or BSD license that I completely forgot to check.

It doesn't matter that much, anyway, in that deflate would also do the
job quite well for any sort of site-to-site or user-to-site WAN link.

--
Craig Ringer


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-11-03 Thread Peter Eisentraut

Craig Ringer wrote:
So - it's potentially even worth compressing the wire protocol for use 
on a 100 megabit LAN if a lightweight scheme like LZO can be used.


LZO is under the GPL though.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-11-02 Thread Scott Marlowe
On Sun, Nov 2, 2008 at 7:19 PM, Sam Mason <[EMAIL PROTECTED]> wrote:
> On Mon, Nov 03, 2008 at 10:01:31AM +0900, Craig Ringer wrote:
>> So - it's potentially even worth compressing the wire protocol for use
>> on a 100 megabit LAN if a lightweight scheme like LZO can be used.
>
> The problem is that then you're then dedicating most of a processor to
> doing the compression, one that would otherwise be engaged in doing
> useful work for other clients.

Considering the low cost of gigabit networks nowadays (even my old T42
thinkpad that's 4 years old has gigabit in it) it would be cheaper to
buy gig nics and cheap switches than to worry about the network
component most the time.  On Wans it's another story of course.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-11-02 Thread Sam Mason
On Mon, Nov 03, 2008 at 10:01:31AM +0900, Craig Ringer wrote:
> Sam Mason wrote:
> >Your lzop numbers look *very* low; the paper suggests
> >compression going up to ~0.3GB/s on a 2GHz Opteron.
> 
> Er ... ENOCOFFEE? . s/Mb(it)?/MB/g . And I'm normally *so* careful about 
> Mb/MB etc; this was just a complete thinko at some level. My apologies, 
> and thanks for catching that stupid error.

Nice to know we're all human here :)

> The paragraph should've read:
> 
> I get 19 MB/s (152 Mb/s) from gzip (deflate) on my 2.4GHz Core 2 Duo 
> laptop. With lzop (LZO) the machine achieves 45 MB/s (360 Mb/s). In both 
> cases only a single core is used. With 7zip (LZMA) it only manages 3.1 
> MB/s (24.8 Mb/s) using BOTH cores together.

Hum, I've just had a look and found that Debian has a version of a lzop
compression program.  I uncompressed a copy of the Postgres source for
a test and I'm getting around 120MBs when compressing on a 2.1GHz Core2
processor (72MB in 0.60 seconds, fast mode).  If I save the output
and recompress it I get about 40MB/s (22MB in 0.67 seconds), so the
compression rate seems to be very dependent on the type of data.  As
a test, I've just written some code that writes out (what I guess the
"LINENUMBER" test is in the X100 paper) a file consisting of small
integers (less than 2 decimal digits, i.e. lots of zero bytes) and
now get up to 0.4GB/s (200MB in 0.5 seconds), which nicely matches my
eyeballing of the figure in the paper.

It does point out that compression rates seem to be very data dependent!

> So - it's potentially even worth compressing the wire protocol for use 
> on a 100 megabit LAN if a lightweight scheme like LZO can be used.

The problem is that then you're then dedicating most of a processor to
doing the compression, one that would otherwise be engaged in doing
useful work for other clients.


BTW, the X100 work was about trying to become less IO bound; they had
a 350MB/s RAID array and were highly IO bound.  If I'm reading the
paper right, with their PFOR algorithm they got the final query (i.e.
decompressing and doing useful work) running at 500MB/s.


  Sam

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-11-02 Thread Craig Ringer

Tom Lane wrote:


Wire protocol compression support in PostgreSQL would probably still be
extremely useful for Internet or WAN based clients, though,


Use an ssh tunnel ... get compression *and* encryption, which you surely
should want on a WAN link.


An ssh tunnel, while very useful, is only suitable for more capable 
users and is far from transparent. It requires an additional setup step 
before connection to the database that's going to cause support problems 
and confuse users. It's also somewhat painful on Windows machines. 
Additionally, use of an SSH tunnel makes recovery after a connection is 
broken much, MUCH more difficult for an application to handle 
transparently automatically.


As you know, PostgreSQL supports SSL/TLS for encryption of wire 
communications, and you can use client certificates as an additional 
layer of authentication much as you can use an ssh key. It's clean and 
to the end user it's basically transparent. All the major clients, like 
the ODBC and JDBC drivers, already support it. Adding optional 
compression within that would be wonderful - and since the client and 
server are already designed to communicate through filters (for 
encryption) it shouldn't be that hard to stack another filter layer on top.


It's something I'm going to have to look at myself, actually, though I 
have some work on the qemu LSI SCSI driver that I *really* have to 
finish first.


--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-11-02 Thread Tom Lane
Craig Ringer <[EMAIL PROTECTED]> writes:
> I get 19 Mbit/s from gzip (deflate) on my 2.4GHz Core 2 Duo laptop. With
> lzop (LZO) the machine achieves 45 Mbit/s. In both cases only a single
> core is used. With 7zip (LZMA) it only manages 3.1 Mb/s using BOTH cores
> together.

It'd be interesting to know where pg_lzcompress fits in.

> Wire protocol compression support in PostgreSQL would probably still be
> extremely useful for Internet or WAN based clients, though,

Use an ssh tunnel ... get compression *and* encryption, which you surely
should want on a WAN link.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-11-02 Thread Craig Ringer

Sam Mason wrote:

On Mon, Nov 03, 2008 at 08:18:54AM +0900, Craig Ringer wrote:

Joris Dobbelsteen wrote:

Also I still have to see an compression algorithm that can sustain over
(or even anything close to, for that matter) 100MB/s on todays COTS
hardware. As TOAST provides compression, maybe that data can be
transmitted in compressed manner  (without recompression).



I get 19 Mbit/s from gzip (deflate) on my 2.4GHz Core 2 Duo laptop. With
lzop (LZO) the machine achieves 45 Mbit/s. In both cases only a single
core is used. With 7zip (LZMA) it only manages 3.1 Mb/s using BOTH cores
together.


Your lzop numbers look *very* low; the paper suggests
compression going up to ~0.3GB/s on a 2GHz Opteron.


Er ... ENOCOFFEE? . s/Mb(it)?/MB/g . And I'm normally *so* careful about 
Mb/MB etc; this was just a complete thinko at some level. My apologies, 
and thanks for catching that stupid error.


The paragraph should've read:

I get 19 MB/s (152 Mb/s) from gzip (deflate) on my 2.4GHz Core 2 Duo 
laptop. With lzop (LZO) the machine achieves 45 MB/s (360 Mb/s). In both 
cases only a single core is used. With 7zip (LZMA) it only manages 3.1 
MB/s (24.8 Mb/s) using BOTH cores together.


So - it's potentially even worth compressing the wire protocol for use 
on a 100 megabit LAN if a lightweight scheme like LZO can be used.


--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-11-02 Thread Sam Mason
On Mon, Nov 03, 2008 at 08:18:54AM +0900, Craig Ringer wrote:
> Joris Dobbelsteen wrote:
> > Also I still have to see an compression algorithm that can sustain over
> > (or even anything close to, for that matter) 100MB/s on todays COTS
> > hardware. As TOAST provides compression, maybe that data can be
> > transmitted in compressed manner  (without recompression).

> I get 19 Mbit/s from gzip (deflate) on my 2.4GHz Core 2 Duo laptop. With
> lzop (LZO) the machine achieves 45 Mbit/s. In both cases only a single
> core is used. With 7zip (LZMA) it only manages 3.1 Mb/s using BOTH cores
> together.

The algorithms in the MonetDB/X100 paper I posted upstream[1] appears
to be designed more for this use.  Their PFOR algorithm gets between
~0.4GB/s and ~1.7GB/s in compression and ~0.9GBs and 3GB/s in
decompression.  Your lzop numbers look *very* low; the paper suggests
compression going up to ~0.3GB/s on a 2GHz Opteron.  In fact, in an old
page for lzop[2] they were getting 5MB/s on a Pentium 133 so I don't
think I'm understanding what your numbers are.

I'll see if I can write some code that implements their algorithms and
send another mail.  If PFOR really is this fast then it may be good for
TOAST compression, though judging by the comments in pg_lzcompress.c it
may not be worth it as the time spent on compression gets lost in the
noise.


  Sam

  [1] http://old-www.cwi.nl/themes/ins1/publications/docs/ZuHeNeBo:ICDE:06.pdf
  [2] http://www.oberhumer.com/opensource/lzo/#speed

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-11-02 Thread Craig Ringer
Joris Dobbelsteen wrote:

> Also I still have to see an compression algorithm that can sustain over
> (or even anything close to, for that matter) 100MB/s on todays COTS
> hardware. As TOAST provides compression, maybe that data can be
> transmitted in compressed manner  (without recompression).

I did a few quick tests on compression speed, as I was curious about
just what sort of performance was available. I was under the impression
that modern hardware could easily top 100 Mbit/s with common compression
algorithms, and wanted to test that.

Based on the results I'd have to agree with the quoted claim. I was
apparently thinking of symmetric encryption throughput rather than
compression throughput.

I get 19 Mbit/s from gzip (deflate) on my 2.4GHz Core 2 Duo laptop. With
lzop (LZO) the machine achieves 45 Mbit/s. In both cases only a single
core is used. With 7zip (LZMA) it only manages 3.1 Mb/s using BOTH cores
together.

All tests were done on a 278MB block of data that was precached in RAM.
Output was to /dev/null except for the LZMA case (due to utility
limitations) in which case output was written to a tmpfs.

Perhaps a multi-core and/or SIMD-ized implementation of LZO (if such a
thing is possible or practical) might manage 100 Mbit/s, or you might
pull it off on an absolutely top of the range desktop (or server) CPU
like the 3.3 GHz Core 2 Duo. Maybe, but probably not without
considerable overclocking, which eliminates the "COTS" aspect rather
soundly.

Given that very few people have dedicated gzip (or other algorithm)
acceleration cards in their systems, it looks like it should be faster
to do transfers uncompressed over a network of any respectable speed.
Not entirely surprising, really, or it'd be used a lot more in common
file server protocols.

Wire protocol compression support in PostgreSQL would probably still be
extremely useful for Internet or WAN based clients, though, and there
are probably more than a few of those around. I know it'd benefit me
massively, as I have users using PostgreSQL over 3G cellular radio
(UMTS/HSDPA) where real-world speeds are around 0.1 - 1.5 Mbit/s, data
transfer limits are low and data transfer charges are high.

Compression would clearly need to be a negotiated connection option, though.

Interestingly, the Via thin clients at work, which have AES 256 (among
other things) implemented in hardware can encrypt to AES 256 at over 300
MB/s. Yes, megabytes, not megabits. Given that the laptop used in the
above testing only gets 95 MB/s, it makes you wonder about whether it'd
be worthwhile for CPU designers to offer a common compression algorithm
like LZO, deflate, or LZMA in hardware for server CPUs.

--
Craig Ringer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-11-02 Thread Joris Dobbelsteen

Gregory Stark wrote, On 01-11-08 14:02:

Ivan Sergio Borgonovo <[EMAIL PROTECTED]> writes:


But sorry I still can't get WHY compression as a whole and data
integrity are mutually exclusive.

...

[snip performance theory]


Postgres *guarantees* that as long as everything else works correctly it
doesn't lose data. Not that it minimizes the chances of losing data. It is
interesting to discuss hardening against unforeseen circumstances as well but
it's of secondary importance to first of all guaranteeing 100% that there is
no data loss in the expected scenarios.

That means Postgres has to guarantee 100% that if the power is lost mid-write
that it can recover all the data correctly. It does this by fsyncing logs of
some changes and depending on filesystems and drives behaving in certain ways
for others -- namely that a partially completed write will leave each byte
with either the new or old value. Compressed filesystems might break that
assumption making Postgres's guarantee void.


The guarentee YOU want from the underlaying file system is that, in case 
of, lets say, a power failure:


* Already existing data is not modified.
* Overwritten data might be corrupted, but its either old or new data.
* If an fsync completes, all written data IS commited to disk

If an (file) system CAN guarantee that, in any way possible, it is safe 
to use with PostGreSQL (considering my list is complete, of course).


As a side note: I consider the second assumption a bit too strong, but 
there are probably good reasons to do so.



I don't know how these hypothetical compressed filesystems are implemented so
I can't say whether they work or not. When I first wrote the comment I was
picturing a traditional filesystem with each block stored compressed. That
can't guarantee anything like this. 


Instead the discussion reverts to discussing file systems without having 
even a glance at their method of operation. No algorithm used by the 
file system is written down, but these are being discussed.



However later in the discussion I mentioned that ZFS with an 8k block size
could actually get this right since it never overwrites existing data, it
always writes to a new location and then changes metadata pointers. I expect
ext3 with data=journal might also be ok. These both have to make performance
sacrifices to get there though.


Instead, here we are going to specifics we needed a long time ago: ZFS 
takes 8kB as their optimal point(*), and never overwrite existing data. 
So it should be as safe as any other file system, if he is indeed correct.


Now, does a different block size (of ZFS or PostGreSQL) make any 
difference to that? No, it still guarentees the list above.


Performance is a discussion better left alone, since it is really really 
dependent on your workload, installation and more specifics. It could be 
better and it can be worse.


- Joris


(*) Larger block sizes improve compression ratio. However, you pay a 
bigger penalty on writes, as more must be read, processed and written.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-11-02 Thread Joris Dobbelsteen

Grzegorz Jaśkiewicz wrote, On 30-10-08 12:13:


it should, every book on encryption says, that if you compress your data 
before encryption - its better.


Those books also should mention that you should leave this subject to 
experts and have numerous examples on systems that follow the book, but 
are still broken. There are other techniques as well that make breaking 
it harder, such as the CBC and CTS modes.


Using compression consumes processing power and resources, easing DoS 
attacks a lot.


Also I still have to see an compression algorithm that can sustain over 
(or even anything close to, for that matter) 100MB/s on todays COTS 
hardware. As TOAST provides compression, maybe that data can be 
transmitted in compressed manner  (without recompression).


- Joris

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-11-01 Thread Gregory Stark

Ivan Sergio Borgonovo <[EMAIL PROTECTED]> writes:

> But sorry I still can't get WHY compression as a whole and data
> integrity are mutually exclusive.
...
> Now on *average* the write operations should be faster so the risk
> you'll be hit by an asteroid during the time a fsync has been
> requested and the time it returns should be shorter.
> If you're not fsyncing... you've no warranty that your changes
> reached your permanent storage.

Postgres *guarantees* that as long as everything else works correctly it
doesn't lose data. Not that it minimizes the chances of losing data. It is
interesting to discuss hardening against unforeseen circumstances as well but
it's of secondary importance to first of all guaranteeing 100% that there is
no data loss in the expected scenarios.

That means Postgres has to guarantee 100% that if the power is lost mid-write
that it can recover all the data correctly. It does this by fsyncing logs of
some changes and depending on filesystems and drives behaving in certain ways
for others -- namely that a partially completed write will leave each byte
with either the new or old value. Compressed filesystems might break that
assumption making Postgres's guarantee void.

I don't know how these hypothetical compressed filesystems are implemented so
I can't say whether they work or not. When I first wrote the comment I was
picturing a traditional filesystem with each block stored compressed. That
can't guarantee anything like this. 

However later in the discussion I mentioned that ZFS with an 8k block size
could actually get this right since it never overwrites existing data, it
always writes to a new location and then changes metadata pointers. I expect
ext3 with data=journal might also be ok. These both have to make performance
sacrifices to get there though.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-31 Thread Ron Mayer

Chris Browne wrote:

There's a way that compressed filesystems might *help* with a risk
factor, here...
By reducing the number of disk drives required to hold the data, you
may be reducing the risk of enough of them failing to invalidate the
RAID array.


And one more way.

If neither your database nor filesystem do checksums on
blocks (seems the compressing filesystems mostly do checksums, tho),
a one bit error may go undetected corrupting your data without you
knowing it.

With a filesystem compression, that one bit error is likely to grow
into something big enough to be detected immediately.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-31 Thread Bruce Momjian
Scott Marlowe wrote:
> On Thu, Oct 30, 2008 at 4:01 PM, Gregory Stark <[EMAIL PROTECTED]> wrote:
> > "Scott Marlowe" <[EMAIL PROTECTED]> writes:
> >
> >> I'm sure this makes for a nice brochure or power point presentation,
> >> but in the real world I can't imagine putting that much effort into it
> >> when compressed file systems seem the place to be doing this.
> >
> > I can't really see trusting Postgres on a filesystem that felt free to
> > compress portions of it. Would the filesystem still be able to guarantee 
> > that
> > torn pages won't "tear" across adjacent blocks? What about torn pages that
> > included hint bits being set?
> 
> I can't see PostgreSQL noticing it. PostgreSQL hands the OS a 512byte
> block, the OS compresses it and it's brethren as the go to disk,
> uncompresses as they come out, and as long as what you put in is what
> you get back it shouldn't really matter.

The question is whether a write of 512 writes to disk blocks that hold
data for other parts of the file;  in such a case we might not have the
full page write copies of those pages to restore, and the compressed
operating system might not be able to guarantee that the other parts of
the file will be restored if only part of the 512 gets on disk.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-31 Thread Ivan Sergio Borgonovo
On Fri, 31 Oct 2008 17:08:52 +
Gregory Stark <[EMAIL PROTECTED]> wrote:

> >> Invisible under normal operation sure, but when something fails
> >> the consequences will surely be different and I can't see how
> >> you could make a compressed filesystem safe without a huge
> >> performance hit.
> >
> > Pardon my naiveness but I can't get why compression and data
> > integrity should be always considered clashing factors.
> 
> Well the answer was in the next paragraph of my email, the one
> you've clipped out here.

Sorry I didn't want to hide your argument but just to cut the
length of the email.
Maybe I haven't been clear enough too. I'd consider compression at
the fs level more "risky" than compression at the DB level because
re-compression at fs level may more frequently spawn across more data
structures.

But sorry I still can't get WHY compression as a whole and data
integrity are mutually exclusive.

What I think is going to happen, not necessarily what really happens
is:
- you make a change to the DB
- you ask the underlying fs to write that change to the disk (fsync)
- the fs may decide it has to re-compress more than one block but I'd
think it still have to oblige to the fsync command and *start* to
put them on permanent storage.

Now on *average* the write operations should be faster so the risk
you'll be hit by an asteroid during the time a fsync has been
requested and the time it returns should be shorter.
If you're not fsyncing... you've no warranty that your changes
reached your permanent storage.

Unless compressed fs don't abide to fsync as I'd expect.

Furthermore you're starting from the 3 assumption that may
not be true:
1) partially written compressed data are completely unrecoverable.
2) you don't have concurrent physical writes to permanent storage
3) the data that should have reached the DB would have survived if
they were not sent to the DB

Compression change the granularity of physical writes on a single
write. But if you consider concurrent physical writes and
unrecoverable transmission of data... higher throughput should
reduce data loss.

If I think at changes as trains with wagons the chances a train can
be struck by an asteroid grow as much as the train is long.
When you use compression, small changes to a data structure *may*
result in longer trains leaving the station but on average you
*should* have shorter trains.

> > DB operation are supposed to be atomic if fsync actually does
> > what it is supposed to do.
> > So you'd have coherency assured by proper execution of "fsync"
> > going down to all HW levels before it reach permanent storage.

> fsync lets the application know when the data has reached disk.
> Once it returns you know the data on disk is coherent. What we're
> talking about is what to do if the power fails or the system
> crashes before that happens.

Yeah... actually successful fsync are at a higher integrity level
than just "let as much data as possible reach the disk and made it
so that they can be read later".

But still when you issue an fsync you're asking "put those data on
permanent storage". Until then the fs is free to keep manage them in
cache and modify/compress them there.
The faster they will reach the disk the lower the chances you'll
lose them.

Of course on the assumption that once an asteroid hit a wagon the
whole train is lost that's not ideal... but still the average length
of trains *should* be less and reduce the *average* chances they get
hit.

This *may* still not be the case and it depends on the pattern with
which data change.
If most of the time you're changing 1 bit followed by an fsync and
that requires 2 sectors rewrite that's bad.
The chances that this could happen are higher if compression takes
place at the fs level and not at the DB level since DB should be
more aware of which data can be efficiently compressed and what
could be the trade off in terms of data loss if something goes wrong
in a 2 sector write when without compression you'd just write one.
But I think you could still take advantage of fs compression
without sacrificing integrity choosing which tables should reside on
a compressed fs and which not and in some circumstances fs
compression may get better results than just TOAST.
eg. if there are several columns that are frequently updated
together...

I'd say that compression could be one more tool for managing data
integrity not that it will inevitably have a negative impact on it
(nor a positive one if not correctly managed).

What am I still missing?



-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-31 Thread Chris Browne
[EMAIL PROTECTED] ("Scott Marlowe") writes:
> I assume hardware failure rates are zero, until there is one.  Then I
> restore from a known good backup.  compressed file systems have little
> to do with that.

There's a way that compressed filesystems might *help* with a risk
factor, here...

By reducing the number of disk drives required to hold the data, you
may be reducing the risk of enough of them failing to invalidate the
RAID array.

If a RAID array is involved, where *some* failures may be silently
coped with, I could readily see this *improving* reliability, in most
cases.

This is at least *vaguely* similar to the way that aircraft have moved
from requiring rather large numbers of engines for cross-Atlantic
trips to requiring just 2.  

In the distant past, the engines were sufficiently unreliable that you
wanted to have at least 4 in order to be reasonably assured that you
could limp along with at least 2.

With increases in engine reliability, it's now considered preferable
to have *just* 2 engines, as having 4 means doubling the risk of there
being a failure.

Disk drives and jet engines are hardly the same thing, but I suspect
the analogy fits.
-- 
let name="cbbrowne" and tld="linuxfinances.info" in String.concat "@" 
[name;tld];;
http://linuxfinances.info/info/lisp.html
Why do they put Braille dots on the keypad of the drive-up ATM? 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-31 Thread Gregory Stark
Ivan Sergio Borgonovo <[EMAIL PROTECTED]> writes:

> On Fri, 31 Oct 2008 08:49:56 +
> Gregory Stark <[EMAIL PROTECTED]> wrote:
>
>> Invisible under normal operation sure, but when something fails the
>> consequences will surely be different and I can't see how you
>> could make a compressed filesystem safe without a huge performance
>> hit.
>
> Pardon my naiveness but I can't get why compression and data
> integrity should be always considered clashing factors.

Well the answer was in the next paragraph of my email, the one you've clipped
out here.

> DB operation are supposed to be atomic if fsync actually does what
> it is supposed to do.
> So you'd have coherency assured by proper execution of "fsync" going
> down to all HW levels before it reach permanent storage.

fsync lets the application know when the data has reached disk. Once it
returns you know the data on disk is coherent. What we're talking about is
what to do if the power fails or the system crashes before that happens.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-31 Thread Thomas Samson
On Fri, Oct 31, 2008 at 3:01 PM, Alvaro Herrera
<[EMAIL PROTECTED]>wrote:

> Scott Marlowe escribió:
> > On Thu, Oct 30, 2008 at 7:37 PM, Alvaro Herrera
> > <[EMAIL PROTECTED]> wrote:
> > > Scott Marlowe escribió:
> > >
> > >> What is the torn page problem?  Note I'm no big fan of compressed file
> > >> systems, but I can't imagine them not working with databases, as I've
> > >> seen them work quite reliably under exhange server running a db
> > >> oriented storage subsystem.  And I can't imagine them not being
> > >> invisible to an application, otherwise you'd just be asking for
> > >> trouble.
> > >
> > > Exchange, isn't that the thing that's very prone to corrupted
> databases?
> > > I've heard lots of horror stories about that (and also about how you
> > > have to defragment the database once in a while, so what kind of
> > > database it really is?)
> >
> > Sure, bash Microsoft it's easy.   But it doesn't address the point, is
> > a database safe on top of a compressed file system and if not, why?
>
> I'm not bashing Microsoft.  I'm just saying that your example
> application already shows signs that could, perhaps, be explained by the
> hypothesis put forward by Greg -- that a compressed filesystem is more
> prone to corruption.
>

Each added layer could lead to corruption/instability.

Yet, some people might be willing to try out  some of these layers
to enhance functionnality.

Postgresql already uses an OS, and even an fs! Why would it decide to not
recode it's own raw device handler ... like some serious db ;)

-- 
Thomas SAMSON
Simplicity does not precede complexity, but follows it.


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-31 Thread Ivan Sergio Borgonovo
On Fri, 31 Oct 2008 08:49:56 +
Gregory Stark <[EMAIL PROTECTED]> wrote:

> "Scott Marlowe" <[EMAIL PROTECTED]> writes:
> 
> > What is the torn page problem?  Note I'm no big fan of
> > compressed file systems, but I can't imagine them not working
> > with databases, as I've seen them work quite reliably under
> > exhange server running a db oriented storage subsystem.  And I
> > can't imagine them not being invisible to an application,
> > otherwise you'd just be asking for trouble.

> Invisible under normal operation sure, but when something fails the
> consequences will surely be different and I can't see how you
> could make a compressed filesystem safe without a huge performance
> hit.

Pardon my naiveness but I can't get why compression and data
integrity should be always considered clashing factors.

DB operation are supposed to be atomic if fsync actually does what
it is supposed to do.
So you'd have coherency assured by proper execution of "fsync" going
down to all HW levels before it reach permanent storage.

Now suppose your problem is "avoiding to lose data" not avoiding to
lose coherency.
eg. you're having a very fast stream of data coming from the LHC.
The faster you write to the disk the lower the chances to lose data
in case you incur in some kind of hardware failure during the write.

The fact you're choosing data compression or not depends on which
kind of failure you think is more probable on your hardware and
associated costs.

If you expect gamma rays cooking your SCSI cables or an asteroid
splashing your UPS, compression may be a good choice... it will make
your data reach your permanent storage faster.
If you expect your permanent storage to store data in a not reliable
way (and not report back) a loss of 1 sector may correspond to larger
loss of data.

Another thing I think should be put in the equation of understanding
where is your risk of data loss would be to factor in if your "data
source" has some form of "data persistence".
If it has you could introduce one more layer of "fsyncing", that
means, your data source is not going to wipe the original copy till
your DB report back that everything went fine (no asteroid etc...).
etc... so data compression may be just one more tool to manage your
budget for asteroid shelters.

An annoyance of compression may be that while compression *on
average* may let you put data faster on permanent storage it
increase uncertainty on instant speed of transfer, especially if fs
level and db level compression are not aware of each other and fs
level compression is less aware of the data that is worth to
compress.
If I had to push more for data compression I'd make it data-type
aware and switchable (or auto-switchable based on ANALYZE or stats
results).

Of course if you expect to have faulty "permanent storage", data
compression *may* not be a good bet... but still it depends on
hardware cost, rate of compression, specific kind of failure...
eg. the more you compress the more RAID becomes cheaper...


I understand Tom that DBA are paid to be paranoid and I really
really really appreciate data stored in a format that doesn't require
a long queue of tools to be read. I do really hate dependencies that
translates in hours of *boring* work if something turn bad.

BTW I gave a glance to MonetDB papers posted earlier and it seems
that compression algorithms are strongly read-only search optimised.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-31 Thread Alvaro Herrera
Scott Marlowe escribió:
> On Thu, Oct 30, 2008 at 7:37 PM, Alvaro Herrera
> <[EMAIL PROTECTED]> wrote:
> > Scott Marlowe escribió:
> >
> >> What is the torn page problem?  Note I'm no big fan of compressed file
> >> systems, but I can't imagine them not working with databases, as I've
> >> seen them work quite reliably under exhange server running a db
> >> oriented storage subsystem.  And I can't imagine them not being
> >> invisible to an application, otherwise you'd just be asking for
> >> trouble.
> >
> > Exchange, isn't that the thing that's very prone to corrupted databases?
> > I've heard lots of horror stories about that (and also about how you
> > have to defragment the database once in a while, so what kind of
> > database it really is?)
> 
> Sure, bash Microsoft it's easy.   But it doesn't address the point, is
> a database safe on top of a compressed file system and if not, why?

I'm not bashing Microsoft.  I'm just saying that your example
application already shows signs that could, perhaps, be explained by the
hypothesis put forward by Greg -- that a compressed filesystem is more
prone to corruption.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-31 Thread Scott Marlowe
On Fri, Oct 31, 2008 at 2:49 AM, Gregory Stark <[EMAIL PROTECTED]> wrote:
>
> "Scott Marlowe" <[EMAIL PROTECTED]> writes:
>
>> What is the torn page problem?  Note I'm no big fan of compressed file
>> systems, but I can't imagine them not working with databases, as I've
>> seen them work quite reliably under exhange server running a db
>> oriented storage subsystem.  And I can't imagine them not being
>> invisible to an application, otherwise you'd just be asking for
>> trouble.
>
> Invisible under normal operation sure, but when something fails the
> consequences will surely be different and I can't see how you could make a
> compressed filesystem safe without a huge performance hit.

While I'm quite willing to concede that a crashed machine can cause
corruption in a compressed file system you wouldn't otherwise see, I'm
also willing to admit there are times, much like the OP was talking
about, where that's an acceptable loss, like Data Warehousing.

No way would I run a db for data that mattered on a compressed file system.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-31 Thread Scott Marlowe
On Thu, Oct 30, 2008 at 9:43 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
> "Scott Marlowe" <[EMAIL PROTECTED]> writes:
>> Sure, bash Microsoft it's easy.   But it doesn't address the point, is
>> a database safe on top of a compressed file system and if not, why?
>
> It is certainly *less* safe than it is on top of an uncompressed
> filesystem.  Any given hardware failure will affect more stored bits
> (if the compression is effective) in a less predictable way.

Agreed.  But I wasn't talking about hardware failures earlier, and
someone made the point that a compressed file system, without hardware
failure, was likely to eat your data.  And I still don't think that's
true.  Keep in mind a lot of the talk on this so far has been on data
warehouses, which are mostly static and well backed up.  If you could
reduce the size on disk by a factor of 2 or 3, then it's worth taking
a small chance on having to recreate the whole db should something go
wrong.

To put it another way, if you find out you've got corrupted blocks in
your main db, due to bad main memory or CPU or something, are you
going to fix the bad blocks and memory and just keep going? Of course
not, you're going to reinstall from a clean backup to a clean machine.
 You can"t trust the data that the machine was mangling, whether it
was on a compressed volume or not.  So now your argument is one of
degree, which wasn't the discussion point I was trying to make.

> If you assume that hardware failure rates are below your level of
> concern, this doesn't matter.

I assume hardware failure rates are zero, until there is one.  Then I
restore from a known good backup.  compressed file systems have little
to do with that.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-31 Thread Gregory Stark

"Scott Marlowe" <[EMAIL PROTECTED]> writes:

> What is the torn page problem?  Note I'm no big fan of compressed file
> systems, but I can't imagine them not working with databases, as I've
> seen them work quite reliably under exhange server running a db
> oriented storage subsystem.  And I can't imagine them not being
> invisible to an application, otherwise you'd just be asking for
> trouble.

Invisible under normal operation sure, but when something fails the
consequences will surely be different and I can't see how you could make a
compressed filesystem safe without a huge performance hit.

The torn page problem is what happens if the system loses power or crashes
when only part of the data written has made it to disk. If you're compressing
or encrypting data then you can't expect the old data portion and the new data
portion to make sense together.

So for example if Postgres sets a hint bit on one tuple in a block, then
writes out that block and the filesystem recompresses it, the entire block
will change. If the system crashes when only 4k of it has reached disk then
when we read in that block it will fail decompression. 

And if the block size of the compressed filesystem is larger than the
PostgreSQL block size your problems are even more severe. Even a regular
WAL-logged write to a database block can cause the subsequent database block
to become unreadable if power is lost before the entire set of database blocks
within the filesystem block is written.

The only way I could see this working is if you use a filesystem which logs
data changes like ZFS or ext3 with data=journal. Even then you have to be very
careful to make the filesystem block size that the journal treats as atomic
match the Postgres block size or you'll still be in trouble.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-30 Thread Tom Lane
"Scott Marlowe" <[EMAIL PROTECTED]> writes:
> Sure, bash Microsoft it's easy.   But it doesn't address the point, is
> a database safe on top of a compressed file system and if not, why?

It is certainly *less* safe than it is on top of an uncompressed
filesystem.  Any given hardware failure will affect more stored bits
(if the compression is effective) in a less predictable way.

If you assume that hardware failure rates are below your level of
concern, this doesn't matter.  But DBAs are paid to be paranoid.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-30 Thread Scott Marlowe
On Thu, Oct 30, 2008 at 7:37 PM, Alvaro Herrera
<[EMAIL PROTECTED]> wrote:
> Scott Marlowe escribió:
>
>> What is the torn page problem?  Note I'm no big fan of compressed file
>> systems, but I can't imagine them not working with databases, as I've
>> seen them work quite reliably under exhange server running a db
>> oriented storage subsystem.  And I can't imagine them not being
>> invisible to an application, otherwise you'd just be asking for
>> trouble.
>
> Exchange, isn't that the thing that's very prone to corrupted databases?
> I've heard lots of horror stories about that (and also about how you
> have to defragment the database once in a while, so what kind of
> database it really is?)

Sure, bash Microsoft it's easy.   But it doesn't address the point, is
a database safe on top of a compressed file system and if not, why?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-30 Thread Alvaro Herrera
Scott Marlowe escribió:

> What is the torn page problem?  Note I'm no big fan of compressed file
> systems, but I can't imagine them not working with databases, as I've
> seen them work quite reliably under exhange server running a db
> oriented storage subsystem.  And I can't imagine them not being
> invisible to an application, otherwise you'd just be asking for
> trouble.

Exchange, isn't that the thing that's very prone to corrupted databases?
I've heard lots of horror stories about that (and also about how you
have to defragment the database once in a while, so what kind of
database it really is?)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-30 Thread Scott Marlowe
On Thu, Oct 30, 2008 at 6:03 PM, Gregory Stark <[EMAIL PROTECTED]> wrote:
> "Scott Marlowe" <[EMAIL PROTECTED]> writes:
>> Sounds kinda hand wavy to me.  If compressed file systems didn't give
>> you back what you gave them I couldn't imagine them being around for
>> very long.
>
> I don't know, NFS has lasted quite a while.
>
> So you tell me, I write 512 bytes of data to a compressed filesystem, how does
> it handle the torn page problem? Is it going to have to WAL log all data
> operations again?

What is the torn page problem?  Note I'm no big fan of compressed file
systems, but I can't imagine them not working with databases, as I've
seen them work quite reliably under exhange server running a db
oriented storage subsystem.  And I can't imagine them not being
invisible to an application, otherwise you'd just be asking for
trouble.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-30 Thread Gregory Stark
"Scott Marlowe" <[EMAIL PROTECTED]> writes:

> On Thu, Oct 30, 2008 at 4:41 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
>> "Scott Marlowe" <[EMAIL PROTECTED]> writes:
>>> On Thu, Oct 30, 2008 at 4:01 PM, Gregory Stark <[EMAIL PROTECTED]> wrote:
 I can't really see trusting Postgres on a filesystem that felt free to
 compress portions of it. Would the filesystem still be able to guarantee 
 that
 torn pages won't "tear" across adjacent blocks? What about torn pages that
 included hint bits being set?
>>
>>> I can't see PostgreSQL noticing it. PostgreSQL hands the OS a 512byte
>>> block, the OS compresses it and it's brethren as the go to disk,
>>> uncompresses as they come out, and as long as what you put in is what
>>> you get back it shouldn't really matter.
>>
>> I think Greg's issue is exactly about what guarantees you'll have left
>> after the data that comes back fails to be the data that went in.
>
> Sounds kinda hand wavy to me.  If compressed file systems didn't give
> you back what you gave them I couldn't imagine them being around for
> very long.

I don't know, NFS has lasted quite a while.

So you tell me, I write 512 bytes of data to a compressed filesystem, how does
it handle the torn page problem? Is it going to have to WAL log all data
operations again?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-30 Thread Scott Marlowe
On Thu, Oct 30, 2008 at 4:41 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
> "Scott Marlowe" <[EMAIL PROTECTED]> writes:
>> On Thu, Oct 30, 2008 at 4:01 PM, Gregory Stark <[EMAIL PROTECTED]> wrote:
>>> I can't really see trusting Postgres on a filesystem that felt free to
>>> compress portions of it. Would the filesystem still be able to guarantee 
>>> that
>>> torn pages won't "tear" across adjacent blocks? What about torn pages that
>>> included hint bits being set?
>
>> I can't see PostgreSQL noticing it. PostgreSQL hands the OS a 512byte
>> block, the OS compresses it and it's brethren as the go to disk,
>> uncompresses as they come out, and as long as what you put in is what
>> you get back it shouldn't really matter.
>
> I think Greg's issue is exactly about what guarantees you'll have left
> after the data that comes back fails to be the data that went in.

Sounds kinda hand wavy to me.  If compressed file systems didn't give
you back what you gave them I couldn't imagine them being around for
very long.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-30 Thread Tom Lane
"Scott Marlowe" <[EMAIL PROTECTED]> writes:
> On Thu, Oct 30, 2008 at 4:01 PM, Gregory Stark <[EMAIL PROTECTED]> wrote:
>> I can't really see trusting Postgres on a filesystem that felt free to
>> compress portions of it. Would the filesystem still be able to guarantee that
>> torn pages won't "tear" across adjacent blocks? What about torn pages that
>> included hint bits being set?

> I can't see PostgreSQL noticing it. PostgreSQL hands the OS a 512byte
> block, the OS compresses it and it's brethren as the go to disk,
> uncompresses as they come out, and as long as what you put in is what
> you get back it shouldn't really matter.

I think Greg's issue is exactly about what guarantees you'll have left
after the data that comes back fails to be the data that went in.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-30 Thread Scott Marlowe
On Thu, Oct 30, 2008 at 4:01 PM, Gregory Stark <[EMAIL PROTECTED]> wrote:
> "Scott Marlowe" <[EMAIL PROTECTED]> writes:
>
>> I'm sure this makes for a nice brochure or power point presentation,
>> but in the real world I can't imagine putting that much effort into it
>> when compressed file systems seem the place to be doing this.
>
> I can't really see trusting Postgres on a filesystem that felt free to
> compress portions of it. Would the filesystem still be able to guarantee that
> torn pages won't "tear" across adjacent blocks? What about torn pages that
> included hint bits being set?

I can't see PostgreSQL noticing it. PostgreSQL hands the OS a 512byte
block, the OS compresses it and it's brethren as the go to disk,
uncompresses as they come out, and as long as what you put in is what
you get back it shouldn't really matter.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-30 Thread Tom Lane
Chris Browne <[EMAIL PROTECTED]> writes:
> [EMAIL PROTECTED] (Tom Lane) writes:
>> We already have the portions of this behavior that seem to me to be
>> likely to be worthwhile (such as NULL elimination and compression of
>> large field values).  Shaving a couple bytes from a bigint doesn't
>> strike me as interesting.

> I expect that there would be value in doing this with the inet type,
> to distinguish between the smaller IPv4 addresses and the larger IPv6
> ones.  We use the inet type (surprise! ;-)) and would benefit from
> having it "usually smaller" (notably since IPv6 addresses are a
> relative rarity, at this point).

Uh ... inet already does that.  Now it's true you could save a byte or
two more with a bespoke IPv4-only type, but the useful lifespan of such a
type probably isn't very long.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-30 Thread Gregory Stark
"Scott Marlowe" <[EMAIL PROTECTED]> writes:

> I'm sure this makes for a nice brochure or power point presentation,
> but in the real world I can't imagine putting that much effort into it
> when compressed file systems seem the place to be doing this.

I can't really see trusting Postgres on a filesystem that felt free to
compress portions of it. Would the filesystem still be able to guarantee that
torn pages won't "tear" across adjacent blocks? What about torn pages that
included hint bits being set?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-30 Thread Chris Browne
[EMAIL PROTECTED] (Tom Lane) writes:
> We already have the portions of this behavior that seem to me to be
> likely to be worthwhile (such as NULL elimination and compression of
> large field values).  Shaving a couple bytes from a bigint doesn't
> strike me as interesting.

I expect that there would be value in doing this with the inet type,
to distinguish between the smaller IPv4 addresses and the larger IPv6
ones.  We use the inet type (surprise! ;-)) and would benefit from
having it "usually smaller" (notably since IPv6 addresses are a
relative rarity, at this point).

That doesn't contradict you; just points out one of the cases where
there might be some value in *a* form of compression...

(Of course, this may already be done; I'm not remembering just now...)
-- 
output = ("cbbrowne" "@" "cbbrowne.com")
http://cbbrowne.com/info/nonrdbms.html
Fatal Error: Found [MS-Windows] System -> Repartitioning Disk for
Linux...  
-- <[EMAIL PROTECTED]> Christopher Browne

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-30 Thread Joshua D. Drake

Grzegorz Jaśkiewicz wrote:



On Thu, Oct 30, 2008 at 3:27 PM, Christophe <[EMAIL PROTECTED] 
> wrote:


I'm a bit surprised to hear that; what would pg be doing, unique to
it, that would cause it to be slower on a RAID-1 cluster than on a
plain drive?

yes, it is slower on mirror-raid from single drive.
I can give you all the /proc/* dumps if you want, as far as computer 
goes, it isn't anything fancy. dual way p4, and sata drives of some sort.


O.k. that doesn't actually surprise me all that much. Software RAID 1 on 
SATA Drives for specific workloads would be slower than a single drive. 
It should still be faster for reads assuming some level of concurrency 
but not likely faster for a single thread. Writes would be expected to 
be slower because you are managing across two spindles, identical writes 
and SATA is slow for that type of thing.


Joshua D. Drake

 

 





--
GJ



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-30 Thread Grzegorz Jaśkiewicz
On Thu, Oct 30, 2008 at 3:27 PM, Christophe <[EMAIL PROTECTED]> wrote:

> I'm a bit surprised to hear that; what would pg be doing, unique to it,
> that would cause it to be slower on a RAID-1 cluster than on a plain drive?
>
yes, it is slower on mirror-raid from single drive.
I can give you all the /proc/* dumps if you want, as far as computer goes,
it isn't anything fancy. dual way p4, and sata drives of some sort.


>
>



-- 
GJ


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-30 Thread Joshua D. Drake

Grzegorz Jaśkiewicz wrote:


What? PostgreSQL is slower on RAID? Care to define that better?

up to 8.3 it was massively slower on raid1 (software raid on linux), 
starting from 8.3 things got lot lot better (we speak 3x speed 
improvement here), but it still isn't same as on 'plain' drive. 



Slower on RAID1 than what and doing what?

Joshua D. Drake

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-30 Thread Christophe


On Oct 30, 2008, at 8:10 AM, Grzegorz Jaśkiewicz wrote:
up to 8.3 it was massively slower on raid1 (software raid on  
linux), starting from 8.3 things got lot lot better (we speak 3x  
speed improvement here), but it still isn't same as on 'plain' drive.


I'm a bit surprised to hear that; what would pg be doing, unique to  
it, that would cause it to be slower on a RAID-1 cluster than on a  
plain drive?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-30 Thread Grzegorz Jaśkiewicz
On Thu, Oct 30, 2008 at 2:58 PM, Joshua D. Drake <[EMAIL PROTECTED]>wrote:

> Grzegorz Jaśkiewicz wrote:
>
>> currently postgresql is slower on RAID, so something tells me that little
>> bit of compression underneeth will make it far more worse, than better. But
>> I guess, Tom will be the man to know more about it.
>>
>
> What? PostgreSQL is slower on RAID? Care to define that better?
>
> up to 8.3 it was massively slower on raid1 (software raid on linux),
starting from 8.3 things got lot lot better (we speak 3x speed improvement
here), but it still isn't same as on 'plain' drive.



-- 
GJ


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-30 Thread Joshua D. Drake

Grzegorz Jaśkiewicz wrote:
currently postgresql is slower on RAID, so something tells me that 
little bit of compression underneeth will make it far more worse, than 
better. But I guess, Tom will be the man to know more about it.


What? PostgreSQL is slower on RAID? Care to define that better?



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-30 Thread 小波 顾
Yes, we are in a data warehouse like environments, where the database server is 
used to hold very large volumn of read only historical data, CPU, memory, I/O 
and network are all OK now except storage space, the only goal of compression 
is to reduce storage consumption.
 



> Date: Thu, 30 Oct 2008 10:53:27 +1100> From: [EMAIL PROTECTED]> To: 
> pgsql-general@postgresql.org> Subject: Re: [GENERAL] Are there plans to add 
> data compression feature to postgresql?> > Tom Lane wrote:> > 
> =?utf-8?Q?=E5=B0=8F=E6=B3=A2_=E9=A1=BE?= <[EMAIL PROTECTED]> writes:> > > >> 
> [ snip a lot of marketing for SQL Server ]> >> > >> > I think the part of 
> this you need to pay attention to is> >> > > >> Of course, nothing is 
> entirely free, and this reduction in space and> >> time come at the expense 
> of using CPU cycles.> >> > >> > We already have the portions of this behavior 
> that seem to me to be> > likely to be worthwhile (such as NULL elimination 
> and compression of> > large field values). Shaving a couple bytes from a 
> bigint doesn't> > strike me as interesting.> > Think about it on a fact table 
> for a warehouse. A few bytes per bigint > multiplied by several 
> billions/trillions of bigints (not an exaggeration > in a DW) and you're 
> talking some significant storage savi
 ng on the main > storage hog in a DW. Not to mention the performance 
_improvements_ you > can get, even with some CPU overhead for dynamic 
decompression, if the > planner/optimiser understands how to work with the 
compression index/map > to perform things like range/partition elimination etc. 
Admittedly this > depends heavily on the storage mechanics and optimisation 
techniques of > the DB, but there is value to be had there ... IBM is seeing 
typical > storage savings in the 40-60% range, mostly based on boring, > 
bog-standard int, char and varchar data.> > The IDUG (so DB2 users themselves, 
not IBM's marketing) had a > competition to see what was happening in the real 
world, take a look if > interested: 
http://www.idug.org/wps/portal/idug/compressionchallenge> > Other big benefits 
come with XML ... but that is even more dependent on > the starting point. 
Oracle and SQL Server will see big benefits in > compression with this, because 
their XML technology is so > mind-bogglin
 gly broken in the first place.> > So there's certainly utility in this kind of 
feature ... but whether it > rates above some of the other great stuff in the 
PostgreSQL pipeline is > questionable.> > Ciao> Fuzzy> :-)> > 
> Dazed and confused about 
technology for 20 years> http://fuzzydata.wordpress.com/> > > -- > Sent via 
pgsql-general mailing list (pgsql-general@postgresql.org)> To make changes to 
your subscription:> http://www.postgresql.org/mailpref/pgsql-general
_
News, entertainment and everything you care about at Live.com. Get it now!
http://www.live.com/getstarted.aspx

Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-30 Thread Andrew Sullivan
On Thu, Oct 30, 2008 at 10:53:27AM +1100, Grant Allen wrote:
> Other big benefits come with XML ... but that is even more dependent on the 
> starting point.  Oracle and SQL Server will see big benefits in compression 
> with this, because their XML technology is so mind-bogglingly broken in the 
> first place.

It seems to me that for this use case, you can already get the
interesting compression advantages in Postgres, and have been getting
them since TOAST was introduced back when the 8k row limit was broken.
It's recently been enhanced, ISTR, so that you can SET STORAGE with
better granularity.

Indeed, it seems to me that in some ways, the big databases are only
catching up with Postgres now on this front.  That alone oughta be
news :)


-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-30 Thread Grzegorz Jaśkiewicz
currently postgresql is slower on RAID, so something tells me that little
bit of compression underneeth will make it far more worse, than better. But
I guess, Tom will be the man to know more about it.


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-30 Thread Sam Mason
On Thu, Oct 30, 2008 at 03:50:20PM +1100, Grant Allen wrote:
> One other thing I forgot to mention:  Compression by the DB trumps 
> filesystem compression in one very important area - shared_buffers! (or 
> buffer_cache, bufferpool or whatever your favourite DB calls its working 
> memory for caching data).  Because the data stays compressed in the 
> block/page when cached by the database in one of its buffers, you get 
> more bang for you memory buck in many circumstances!  Just another angle 
> to contemplate :-)

The database research project known as MonetDB/X100 has been looking at
this recently; the first paper below gives a bit of an introduction into
the design of the database and the second into the effects of different
compression schemes:

  http://www.cwi.nl/htbin/ins1/publications?request=pdf&key=ZuBoNeHe:DEBULL:05
  http://www.cwi.nl/htbin/ins1/publications?request=pdf&key=ZuHeNeBo:ICDE:06

The important thing seems to be is that you don't want a storage
efficient compression scheme, decent RAID subsystems demand a very
lightweight scheme that can be decompressed at several GB/s (i.e. two or
three cycles per tuple, not 50 to 100 like traditional schemes like zlib
or bzip).  It's very interesting reading (references to "commercial DBMS
`X'" being somewhat comical), but it's a *long* way from being directly
useful to Postgres.

It's interesting to bear in mind some of the things they talk about when
writing new code, the importance of designing cache conscious algorithms
(and then when writing the code) seem to have stuck in my mind the most.
Am I just old fashioned, or is this focus on cache conscious design
quite a new thing and somewhat undervalued in the rest of the software
world?


  Sam

p.s. if you're interested, there are more papers about MonetDB here:

  
http://monetdb.cwi.nl/projects/monetdb/Development/Research/Articles/index.html

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-30 Thread Grzegorz Jaśkiewicz
it should, every book on encryption says, that if you compress your data
before encryption - its better.


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-29 Thread Joshua D. Drake

Steve Atkins wrote:

The one place where Compression is an immediate benefit is the wire. 
It is easy to forget that one of our number one bottlenecks (even at 
gigabit) is the amount of data we are pushing over the wire.


Wouldn't "ssl_ciphers=NULL-MD5" or somesuch give zlib compression over 
the wire?


I don't think so.

Joshua D. Drake



Cheers,
  Steve





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-29 Thread Steve Atkins


On Oct 29, 2008, at 10:43 PM, Joshua D. Drake wrote:


Steve Atkins wrote:

On Oct 29, 2008, at 9:50 PM, Grant Allen wrote:



One other thing I forgot to mention:  Compression by the DB trumps  
filesystem compression in one very important area -  
shared_buffers! (or buffer_cache, bufferpool or whatever your  
favourite DB calls its working memory for caching data).  Because  
the data stays compressed in the block/page when cached by the  
database in one of its buffers, you get more bang for you memory  
buck in many circumstances!  Just another angle to contemplate :-)
The additional latency added by decompression is reasonably small  
compared with traditional disk access time. It's rather large  
compared to memory access time.


The one place where Compression is an immediate benefit is the wire.  
It is easy to forget that one of our number one bottlenecks (even at  
gigabit) is the amount of data we are pushing over the wire.


Wouldn't "ssl_ciphers=NULL-MD5" or somesuch give zlib compression over  
the wire?


Cheers,
  Steve


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-29 Thread Joshua D. Drake

Steve Atkins wrote:


On Oct 29, 2008, at 9:50 PM, Grant Allen wrote:



One other thing I forgot to mention:  Compression by the DB trumps 
filesystem compression in one very important area - shared_buffers! 
(or buffer_cache, bufferpool or whatever your favourite DB calls its 
working memory for caching data).  Because the data stays compressed 
in the block/page when cached by the database in one of its buffers, 
you get more bang for you memory buck in many circumstances!  Just 
another angle to contemplate :-)


The additional latency added by decompression is reasonably small 
compared with traditional disk access time. It's rather large compared 
to memory access time.


The one place where Compression is an immediate benefit is the wire. It 
is easy to forget that one of our number one bottlenecks (even at 
gigabit) is the amount of data we are pushing over the wire.


Joshua D. Drake



Cheers,
  Steve





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-29 Thread Steve Atkins


On Oct 29, 2008, at 9:50 PM, Grant Allen wrote:



One other thing I forgot to mention:  Compression by the DB trumps  
filesystem compression in one very important area - shared_buffers!  
(or buffer_cache, bufferpool or whatever your favourite DB calls its  
working memory for caching data).  Because the data stays compressed  
in the block/page when cached by the database in one of its buffers,  
you get more bang for you memory buck in many circumstances!  Just  
another angle to contemplate :-)


The additional latency added by decompression is reasonably small  
compared with traditional disk access time. It's rather large compared  
to memory access time.


Cheers,
  Steve


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-29 Thread Grant Allen

Ron Mayer wrote:

Grant Allen wrote:
...warehouse...DB2...IBM is seeing typical storage savings in the 
40-60% range


Sounds about the same as what compressing file systems claim:

http://opensolaris.org/os/community/zfs/whatis/
 "ZFS provides built-in compression. In addition to
  reducing space usage by 2-3x, compression also reduces
  the amount of I/O by 2-3x. For this reason, enabling
  compression actually makes some workloads go faster.

I do note that Netezza got a lot of PR around their
compression release; claiming it doubled performance.
Wonder if they added that at the file system or higher
in the DB.



I just so happen to have access to a Netezza system :-) I'll see if I 
can find out.


One other thing I forgot to mention:  Compression by the DB trumps 
filesystem compression in one very important area - shared_buffers! (or 
buffer_cache, bufferpool or whatever your favourite DB calls its working 
memory for caching data).  Because the data stays compressed in the 
block/page when cached by the database in one of its buffers, you get 
more bang for you memory buck in many circumstances!  Just another angle 
to contemplate :-)


Ciao
Fuzzy
:-)


Dazed and confused about technology for 20 years
http://fuzzydata.wordpress.com/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-29 Thread Ron Mayer

Grant Allen wrote:
...warehouse...DB2...IBM is seeing typical 
storage savings in the 40-60% range


Sounds about the same as what compressing file systems claim:

http://opensolaris.org/os/community/zfs/whatis/
 "ZFS provides built-in compression. In addition to
  reducing space usage by 2-3x, compression also reduces
  the amount of I/O by 2-3x. For this reason, enabling
  compression actually makes some workloads go faster.

I do note that Netezza got a lot of PR around their
compression release; claiming it doubled performance.
Wonder if they added that at the file system or higher
in the DB.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-29 Thread Grant Allen

Tom Lane wrote:

=?utf-8?Q?=E5=B0=8F=E6=B3=A2_=E9=A1=BE?= <[EMAIL PROTECTED]> writes:
  

[ snip a lot of marketing for SQL Server ]



I think the part of this you need to pay attention to is

  

Of course, nothing is entirely free, and this reduction in space and
time come at the expense of using CPU cycles.



We already have the portions of this behavior that seem to me to be
likely to be worthwhile (such as NULL elimination and compression of
large field values).  Shaving a couple bytes from a bigint doesn't
strike me as interesting.


Think about it on a fact table for a warehouse.  A few bytes per bigint 
multiplied by several billions/trillions of bigints (not an exaggeration 
in a DW) and you're talking some significant storage saving on the main 
storage hog in a DW.  Not to mention the performance _improvements_ you 
can get, even with some CPU overhead for dynamic decompression, if the 
planner/optimiser understands how to work with the compression index/map 
to perform things like range/partition elimination etc.  Admittedly this 
depends heavily on the storage mechanics and optimisation techniques of 
the DB, but there is value to be had there ... IBM is seeing typical 
storage savings in the 40-60% range, mostly based on boring, 
bog-standard int, char and varchar data.


The IDUG (so DB2 users themselves, not IBM's marketing) had a 
competition to see what was happening in the real world, take a look if 
interested: http://www.idug.org/wps/portal/idug/compressionchallenge


Other big benefits come with XML ... but that is even more dependent on 
the starting point.  Oracle and SQL Server will see big benefits in 
compression with this, because their XML technology is so 
mind-bogglingly broken in the first place.


So there's certainly utility in this kind of feature ... but whether it 
rates above some of the other great stuff in the PostgreSQL pipeline is 
questionable.


Ciao
Fuzzy
:-)


Dazed and confused about technology for 20 years
http://fuzzydata.wordpress.com/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-29 Thread Scott Marlowe
On Wed, Oct 29, 2008 at 10:09 AM, 小波 顾 <[EMAIL PROTECTED]> wrote:
>
> Data Compression
>
> The new data compression feature in SQL Server 2008 reduces the size of
> tables, indexes or a subset of their partitions by storing fixed-length data
> types in variable length storage format and by reducing the redundant data.
> The space savings achieved depends on the schema and the data distribution.
> Based on our testing with various data warehouse databases, we have seen a
> reduction in the size of real user databases up to 87% (a 7 to 1 compression
> ratio) but more commonly you should expect a reduction in the range of
> 50-70% (a compression ratio between roughly 2 to 1 and 3 to 1).

I'm sure this makes for a nice brochure or power point presentation,
but in the real world I can't imagine putting that much effort into it
when compressed file systems seem the place to be doing this.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-29 Thread Grzegorz Jaśkiewicz
I can imagine my big stats tables , with 300-400M rows, all big ints, that
 - mostly - require that sort of length. Gain, none, hassle 100%.


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-29 Thread justin


小波 顾 wrote:


Data Compression  MSSQL 2008 technots .  Your results depend on 
your workload, database, and hardware

Sounds cool but i wonder what real world results are??

For IO bound systems lots of pluses
but for CPU bound workloads it would suck



Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-29 Thread Tom Lane
=?utf-8?Q?=E5=B0=8F=E6=B3=A2_=E9=A1=BE?= <[EMAIL PROTECTED]> writes:
> [ snip a lot of marketing for SQL Server ]

I think the part of this you need to pay attention to is

> Of course, nothing is entirely free, and this reduction in space and
> time come at the expense of using CPU cycles.

We already have the portions of this behavior that seem to me to be
likely to be worthwhile (such as NULL elimination and compression of
large field values).  Shaving a couple bytes from a bigint doesn't
strike me as interesting.

(Note: you could build a user-defined type that involved a one-byte
length indicator followed by however many bytes of the bigint you
needed.  So someone who thought this might be worthwhile could do it
for themselves.  I don't see it being a win, though.)

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-29 Thread 小波 顾
://msdn.microsoft.com/en-us/library/cc278097.aspx

Date: Wed, 29 Oct 2008 15:35:44 +From: [EMAIL PROTECTED]: [EMAIL 
PROTECTED]: Re: [GENERAL] Are there plans to add data compression feature to 
postgresql?CC: [EMAIL PROTECTED]; pgsql-general@postgresql.org
2008/10/29 小波 顾 <[EMAIL PROTECTED]>

1. Little integers of types take 8 bytes in the past now only take 4 or 2 bytes 
if there are not so large.
So what actually happen if I have a table with few mills of values that fit in 
2 bytes, but all of the sudent I am going to add another column with something 
that requires 8 bytes ? update on all columns ? I am actually even against 
varchars in my databases, so something like that sounds at least creepy.

 -- GJ
_
Invite your mail contacts to join your friends list with Windows Live Spaces. 
It's easy!
http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us

Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-29 Thread Grzegorz Jaśkiewicz
2008/10/29 小波 顾 <[EMAIL PROTECTED]>

> 1. Little integers of types take 8 bytes in the past now only take 4 or 2
> bytes if there are not so large.
>
So what actually happen if I have a table with few mills of values that fit
in 2 bytes, but all of the sudent I am going to add another column with
something that requires 8 bytes ? update on all columns ? I am actually even
against varchars in my databases, so something like that sounds at least
creepy.

-- 
GJ


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-29 Thread 小波 顾
Sorry for following up so late, actually I mean compression features like what 
other commercial RDBMS have, such as DB2 9.5 or SQL Server 2008. In those 
databases, all data types in all tables can be compressed, following are two 
features we think very useful:
1. Little integers of types take 8 bytes in the past now only take 4 or 2 bytes 
if there are not so large.
2. If two values have the same text or pattern, only one is stored, and that 
one is compressed with traditional compress methods too.



To: [EMAIL PROTECTED]: Re: [GENERAL] Are there plans to add data compression 
feature to postgresql?From: [EMAIL PROTECTED]: Mon, 27 Oct 2008 10:19:31 
+Note that most data stored in the TOAST table is compressed. IE a Text 
type with length greater than around 2K will be stored in the TOAST table.  By 
default data in the TOAST table is compressed,  this can be overriden. However 
I expect that compression will reduce the performance of certain queries. 
http://www.postgresql.org/docs/8.3/interactive/storage-toast.html Out of 
interested, in what context did you want compression? 



Ron Mayer <[EMAIL PROTECTED]> Sent by: [EMAIL PROTECTED] 
27/10/2008 07:34 





To
小波 顾 <[EMAIL PROTECTED]> 


cc
"pgsql-general@postgresql.org"  


Subject
Re: [GENERAL] Are there plans to add data compression feature to postgresql?




You might want to try using a file system (ZFS, NTFS) thatdoes compression, 
depending on what you're trying to compress.-- Sent via pgsql-general mailing 
list (pgsql-general@postgresql.org)To make changes to your 
subscription:http://www.postgresql.org/mailpref/pgsql-general
**
If you are not the intended recipient of this email please do not send it on
to others, open any attachments or file the email locally. 
Please inform the sender of the error and then delete the original email.
For more information, please refer to http://www.shropshire.gov.uk/privacy.nsf
**
 
_
Explore the seven wonders of the world
http://search.msn.com/results.aspx?q=7+wonders+world&mkt=en-US&form=QBRE

Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-27 Thread Chris . Ellis
Note that most data stored in the TOAST table is compressed.

IE a Text type with length greater than around 2K will be stored in the 
TOAST table.  By default data in the TOAST table is compressed,  this can 
be overriden.

However I expect that compression will reduce the performance of certain 
queries.

http://www.postgresql.org/docs/8.3/interactive/storage-toast.html

Out of interested, in what context did you want compression?




Ron Mayer <[EMAIL PROTECTED]> 
Sent by: [EMAIL PROTECTED]
27/10/2008 07:34

To
小波 顾 <[EMAIL PROTECTED]>
cc
"pgsql-general@postgresql.org" 
Subject
Re: [GENERAL] Are there plans to add data compression feature to 
postgresql?






You might want to try using a file system (ZFS, NTFS) that
does compression, depending on what you're trying to compress.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



**
If you are not the intended recipient of this email please do not send it on
to others, open any attachments or file the email locally. 
Please inform the sender of the error and then delete the original email.
For more information, please refer to http://www.shropshire.gov.uk/privacy.nsf
**



Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-27 Thread Ron Mayer
You might want to try using a file system (ZFS, NTFS) that
does compression, depending on what you're trying to compress.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-26 Thread Scott Marlowe
2008/10/26 Martin Gainty <[EMAIL PROTECTED]>:
> Scott-
>
> Straight from Postgres doc
> The zlib compression library will be used by default. If you don't want to
> use it then you must specify the --without-zlib option for configure. Using
> this option disables support for compressed archives in pg_dump and
> pg_restore.

I was thinking more along the lines of the automatic compression of
text types over 4k or so when they are moved out of line and into
toast tables.

The original question was a little vague though wasn't it?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-26 Thread Martin Gainty

Scott-

Straight from Postgres doc
The zlib compression 
library will be used by default. If you don't want to use it then you must 
specify the --without-zlib option for configure. Using this option disables 
support for compressed 
archives in pg_dump and pg_restore. Martin 
__ 
Disclaimer and confidentiality note 
Everything in this e-mail and any attachments relates to the official business 
of Sender. This transmission is of a confidential nature and Sender does not 
endorse distribution to any party other than intended recipient. Sender does 
not necessarily endorse content contained within this transmission. 


> Date: Sun, 26 Oct 2008 10:37:01 -0600
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> Subject: Re: [GENERAL] Are there plans to add data compression feature to 
> postgresql?
> CC: pgsql-general@postgresql.org
> 
> On Sun, Oct 26, 2008 at 9:54 AM, 小波 顾 <[EMAIL PROTECTED]> wrote:
> > Are there plans to add data compression feature to postgresql?
> 
> There already is data compression in postgresql.
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

_
Stay organized with simple drag and drop from Windows Live Hotmail.
http://windowslive.com/Explore/hotmail?ocid=TXT_TAGLM_WL_hotmail_102008

Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-26 Thread Scott Marlowe
On Sun, Oct 26, 2008 at 9:54 AM, 小波 顾 <[EMAIL PROTECTED]> wrote:
> Are there plans to add data compression feature to postgresql?

There already is data compression in postgresql.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general