On 27.10.2012 14:27, Amit Kapila wrote:
On Saturday, October 27, 2012 4:03 AM Noah Misch wrote:
In my previous review, I said:

        Given [not relying on the executor to know which columns changed],
why not
        treat the tuple as an opaque series of bytes and not worry about
datum
        boundaries?  When several narrow columns change together, say a
sequence
        of sixteen smallint columns, you will use fewer binary delta
commands by
        representing the change with a single 32-byte substitution.  If an
UPDATE
        changes just part of a long datum, the delta encoding algorithm
will still
        be able to save considerable space.  That case arises in many
forms:
        changing one word in a long string, changing one element in a long
array,
        changing one field of a composite-typed column.  Granted, this
makes the
        choice of delta encoding algorithm more important.

We may be leaving considerable savings on the table by assuming that
column
boundaries are the only modified-range boundaries worth recognizing.
What is
your willingness to explore general algorithms for choosing such
boundaries?
Such an investigation may, of course, be a dead end.

For this patch I am interested to go with delta encoding approach based on
column boundaries.

However I shall try to do it separately and if it gives positive results
then I will share with hackers.
I will try with VCDiff once or let me know if you have any other algorithm
in mind.

One idea is to use the LZ format in the WAL record, but use your memcmp() code to construct it. I believe the slow part in LZ compression is in trying to locate matches in the "history", so if you just replace that with your code that's aware of the column boundaries and uses simple memcmp() to detect what parts changed, you could create LZ compressed output just as quickly as the custom encoded format. It would leave the door open for making the encoding smarter or to do actual compression in the future, without changing the format and the code to decode it.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to