On Mon, Jan 17, 2011 at 11:05:09AM +0100, Magnus Hagander wrote: > On Mon, Jan 17, 2011 at 09:13, Itagaki Takahiro > <itagaki.takah...@gmail.com> wrote: > > 2011/1/17 KaiGai Kohei <kai...@ak.jp.nec.com>: > >> Are you talking about an idea to apply toast id as an alternative key? > > > > No, probably. I'm just talking about whether "diff -q A.txt B.txt" and > > "diff -q A.gz ?B.gz" always returns the same result or not.
Interesting. > > ... I found it depends on version of gzip. So, if we use such logic, > > we cannot improve toast compression logic because the data is migrated > > by pg_upgrade. > > Yeah, that might be a bad tradeoff. > > I wonder if we can trust the *equality* test, but not the inequality? > E.g. if compressed(A) == compressed(B) we know they're the same, but > if compressed(A) != compressed(B) we don't know they're not they still > might be. Exactly. > I guess with two different versions or even completely different > algorithms we could end up with exactly the same compressed value for > different plaintexts (it's not a cryptographic hash after all), so > that's probably not an acceptable comparison either. It's safe to assume that will never happen. If compressed(A) == compressed(B) when A != B, we would have a lossy compression algorithm. As you say, though, _inequality_ implies nothing for an arbitrary decompressor. One can trivially construct many inputs to the zlib decompressor that yield the same output. "gzip -1" ... "gzip -9" do this, for example. So the main win here would come if we tightly controlled the compressor, such that we could infer something from compressed(A) != compressed(B). That would be an intriguing path to explore. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers