Hi,
Some quick review comments:
On 2014-02-13 18:14:54 +0530, Amit Kapila wrote:
> + /*
> + * EWT can be generated for all new tuple versions created by Update
> + * operation. Currently we do it when both the old and new tuple
> versions
> + * are on same page, because during recovery if the page containing old
> + * tuple is corrupt, it should not cascade that corruption to other
> pages.
> + * Under the general assumption that for long runs most updates tend to
> + * create new tuple version on same page, there should not be
> significant
> + * impact on WAL reduction or performance.
> + *
> + * We should not generate EWT when we need to backup the whole block in
> + * WAL as in that case there is no saving by reduced WAL size.
> + */
> +
> + if (RelationIsEnabledForWalCompression(reln) &&
> + (oldbuf == newbuf) &&
> + !XLogCheckBufferNeedsBackup(newbuf))
> + {
> + uint32 enclen;
You should note that thew check for RelationIsEnabledForWalCompression()
here is racy and that that's ok because the worst that can happen is
that a uselessly generated delta.
> xlrec.target.node = reln->rd_node;
> xlrec.target.tid = oldtup->t_self;
> xlrec.old_xmax = HeapTupleHeaderGetRawXmax(oldtup->t_data);
> @@ -6619,6 +6657,8 @@ log_heap_update(Relation reln, Buffer oldbuf,
> xlrec.newtid = newtup->t_self;
> if (new_all_visible_cleared)
> xlrec.flags |= XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED;
> + if (compressed)
> + xlrec.flags |= XLOG_HEAP_DELTA_ENCODED;
I think this also needs to unset XLOG_HEAP_CONTAINS_NEW_TUPLE and
conditional on !need_tuple_data.
> /*
> + * Determine whether the buffer referenced has to be backed up. Since we
> don't
> + * yet have the insert lock, fullPageWrites and forcePageWrites could change
> + * later, but will not cause any problem because this function is used only
> to
> + * identify whether EWT is required for update.
> + */
> +bool
> +XLogCheckBufferNeedsBackup(Buffer buffer)
> +{
Should note very, very boldly that this can only be used in contexts
where a race is acceptable.
> diff --git a/src/backend/utils/adt/pg_rbcompress.c
> b/src/backend/utils/adt/pg_rbcompress.c
> new file mode 100644
> index 0000000..877ccd7
> --- /dev/null
> +++ b/src/backend/utils/adt/pg_rbcompress.c
> @@ -0,0 +1,355 @@
> +/* ----------
> + * pg_rbcompress.c -
> + *
> + * This is a delta encoding scheme specific to PostgreSQL and
> designed
> + * to compress similar tuples. It can be used as it is or extended
> for
> + * other purpose in PostgrSQL if required.
> + *
> + * Currently, this just checks for a common prefix and/or suffix,
> but
> + * the output format is similar to the LZ format used in
> pg_lzcompress.c.
> + *
> + * Copyright (c) 1999-2014, PostgreSQL Global Development Group
> + *
> + * src/backend/utils/adt/pg_rbcompress.c
> + * ----------
> + */
This needs significantly more explanations about the algorithm and the
reasoning behind it.
> +static const PGRB_Strategy strategy_default_data = {
> + 32, /* Data chunks
> less than 32 bytes are not
> + * compressed */
> + INT_MAX, /* No upper limit on
> what we'll try to
> + * compress */
> + 35, /* Require 25%
> compression rate, or not worth
> + * it */
> +};
compression rate looks like it's mismatch between comment and code.
> +/* ----------
> + * pgrb_out_ctrl -
> + *
> + * Outputs the last and allocates a new control byte if needed.
> + * ----------
> + */
> +#define pgrb_out_ctrl(__ctrlp,__ctrlb,__ctrl,__buf) \
> +do { \
> + if ((__ctrl & 0xff) == 0)
> \
> + {
> \
> + *(__ctrlp) = __ctrlb;
> \
> + __ctrlp = (__buf)++;
> \
> + __ctrlb = 0;
> \
> + __ctrl = 1;
> \
> + }
> \
> +} while (0)
> +
double underscore variables are reserved for the compiler and os.
> +/* ----------
> + * pgrb_out_literal -
> + *
> + * Outputs a literal byte to the destination buffer including the
> + * appropriate control bit.
> + * ----------
> + */
> +#define pgrb_out_literal(_ctrlp,_ctrlb,_ctrl,_buf,_byte) \
> +do { \
> + pgrb_out_ctrl(_ctrlp,_ctrlb,_ctrl,_buf);
> \
> + *(_buf)++ = (unsigned char)(_byte);
> \
> + _ctrl <<= 1;
> \
> +} while (0)
> +
> +
> +/* ----------
> + * pgrb_out_tag -
> + *
> + * Outputs a backward reference tag of 2-4 bytes (depending on
> + * offset and length) to the destination buffer including the
> + * appropriate control bit.
> + * ----------
> + */
> +#define pgrb_out_tag(_ctrlp,_ctrlb,_ctrl,_buf,_len,_off) \
> +do { \
> + pgrb_out_ctrl(_ctrlp,_ctrlb,_ctrl,_buf);
> \
> + _ctrlb |= _ctrl;
> \
> + _ctrl <<= 1;
> \
> + if (_len > 17)
> \
> + {
> \
> + (_buf)[0] = (unsigned char)((((_off) & 0xf00) >> 4) | 0x0f);
> \
> + (_buf)[1] = (unsigned char)(((_off) & 0xff));
> \
> + (_buf)[2] = (unsigned char)((_len) - 18);
> \
> + (_buf) += 3;
> \
> + } else {
> \
> + (_buf)[0] = (unsigned char)((((_off) & 0xf00) >> 4) | ((_len) -
> 3)); \
> + (_buf)[1] = (unsigned char)((_off) & 0xff);
> \
> + (_buf) += 2;
> \
> + }
> \
> +} while (0)
> +
What's the reason to use macros here? Just use inline functions when
dealing with file-local stuff.
> +/* ----------
> + * pgrb_delta_encode - find common prefix/suffix between inputs and encode.
> + *
> + * source is the input data to be compressed
> + * slen is the length of source data
> + * history is the data which is used as reference for compression
> + * hlen is the length of history data
> + * The encoded result is written to dest, and its length is returned in
> + * finallen.
> + * The return value is TRUE if compression succeeded,
> + * FALSE if not; in the latter case the contents of dest
> + * are undefined.
> + * ----------
> + */
> +bool
> +pgrb_delta_encode(const char *source, int32 slen,
> + const char *history, int32 hlen,
> + char *dest, uint32 *finallen,
> + const PGRB_Strategy *strategy)
> +{
> + unsigned char *bp = ((unsigned char *) dest);
> + unsigned char *bstart = bp;
> + const char *dp = source;
> + const char *dend = source + slen;
> + const char *hp = history;
> + unsigned char ctrl_dummy = 0;
> + unsigned char *ctrlp = &ctrl_dummy;
> + unsigned char ctrlb = 0;
> + unsigned char ctrl = 0;
> + int32 result_size;
> + int32 result_max;
> + int32 need_rate;
> + int prefixlen;
> + int suffixlen;
> +
> + /*
> + * Tuples of length greater than PGRB_HISTORY_SIZE are not allowed for
> + * delta encode as this is the maximum size of history offset.
> + * XXX: still true?
> + */
Why didn't you define a maximum tuple size in the strategy definition
above then?
> + if (hlen >= PGRB_HISTORY_SIZE || hlen < PGRB_MIN_MATCH)
> + return false;
> +
> + /*
> + * Our fallback strategy is the default.
> + */
> + if (strategy == NULL)
> + strategy = PGRB_strategy_default;
>
> + /*
> + * If the strategy forbids compression (at all or if source chunk size
> out
> + * of range), fail.
> + */
> + if (slen < strategy->min_input_size ||
> + slen > strategy->max_input_size)
> + return false;
> +
> + need_rate = strategy->min_comp_rate;
> + if (need_rate < 0)
> + need_rate = 0;
> + else if (need_rate > 99)
> + need_rate = 99;
Is there really need for all this stuff here? This is so specific to the
usecase that I have significant doubts that all the pglz boiler plate
makes much sense.
> + /*
> + * Compute the maximum result size allowed by the strategy, namely the
> + * input size minus the minimum wanted compression rate. This had
> better
> + * be <= slen, else we might overrun the provided output buffer.
> + */
> + /*if (slen > (INT_MAX / 100))
> + {
> + /* Approximate to avoid overflow */
> + /*result_max = (slen / 100) * (100 - need_rate);
> + }
> + else
> + {
> + result_max = (slen * (100 - need_rate)) / 100;
> + }*/
err?
> +--
> +-- Test to update continuos and non continuos columns
> +--
*continuous
I have to admit, I have serious doubts about this approach. I have a
very hard time believing this won't cause performance regression in many
common cases... More importantly I don't think doing the compression on
this level is that interesting. I know Heikki argued for it, but I think
extending the bitmap that's computed for HOT to cover all columns and
doing this on a column level sounds much more sensible to me.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers