On Wed, Jun 26, 2019 at 12:31:13PM -0400, Tom Lane wrote:
Tomas Vondra <tomas.von...@2ndquadrant.com> writes:
On Wed, Jun 26, 2019 at 11:26:21AM -0400, Tom Lane wrote:
#define SizeOfMCVList(ndims,nitems)     \
is both woefully underdocumented and completely at variance with
reality.  It doesn't seem to be accounting for the actual data values.

I agree about the macro being underdocumented, but AFAICS it's used
correctly to check the expected length. It can't include the data values
directly, because that's variable amount of data - and it's encoded in not
yet verified part of the data.

Well, it should have some other name then.  Or *at least* a comment.
It's unbelievably misleading as it stands.


True.

That being said, maybe this is unnecessarily defensive and we should just
trust the values not being corrupted.

No, I'm on board with checking the lengths.  I just don't like how
hard it is to discern what's being checked.


Understood.

I think that part of the problem here is that the way this code is
written, "maxaligned" is no such thing.  What you're actually maxaligning
seems to be the offset from the start of the data area of a varlena value,

I don't think so. The pointers should be maxaligned with respect to the
whole varlena value, which is what 'raw' points to.

[ squint ... ]  OK, I think I misread this:

statext_mcv_deserialize(bytea *data)
{
...
        /* pointer to the data part (skip the varlena header) */
        ptr = VARDATA_ANY(data);
        raw = (char *) data;

I think this is confusing in itself --- I read it as "raw = (char *) ptr"
and I think most other people would assume that too based on the order
of operations.  It'd read better as

        /* remember start of datum for maxalign reference */
        raw = (char *) data;

        /* pointer to the data part (skip the varlena header) */
        ptr = VARDATA_ANY(data);


Yeah, that'd have been better.

Another problem with this code is that it flat doesn't work for
non-4-byte-header varlenas: it'd do the alignment differently than the
serialization side did.  That's okay given that the two extant call sites
are guaranteed to pass detoasted datums.  But using VARDATA_ANY gives a
completely misleading appearance of being ready to deal with short-header
varlenas, and heaven forbid there should be any comment to discourage
future coders from trying.  So really what I'd like to see here is

        /* remember start of datum for maxalign reference */
        raw = (char *) data;

        /* alignment logic assumes full-size datum header */
        Assert(VARATT_IS_4B(data));

        /* pointer to the data part (skip the varlena header) */
        ptr = VARDATA_ANY(data);

Or, of course, this could all go away if we got rid of the
bogus maxaligning...


OK. Attached is a patch ditching the alignment in serialized data. I've
ditched the macros to access parts of serialized data, and everything
gets copied.

The main complication is with varlena values, which may or may not have
4B headers (for now there's the PG_DETOAST_DATUM call, but as you
mentioned we may want to remove it in the future). So I've stored the
length as uint32 separately, followed by the full varlena value (thanks
to that the deserialization is simpler). Not sure if that's the best
solution, though, because this way we store the length twice.

I've kept the alignment in the deserialization code, because there it
allows us to allocate the whole value as a single chunk, which I think
is useful (I admit I don't have any measurements to demonstrate that).
But if we decide to rework this later, we can - it's just in-memory
representation, not on-disk.

Is this roughly what you had in mind?

FWIW I'm sure some of the comments are stale and/or need clarification,
but it's a bit too late over here, so I'll look into that tomorrow.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index 5fe61ea0a4..93c774ea1c 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -52,15 +52,7 @@
  *      ndim * (sizeof(uint16) + sizeof(bool)) + 2 * sizeof(double)
  */
 #define ITEM_SIZE(ndims)       \
-       MAXALIGN((ndims) * (sizeof(uint16) + sizeof(bool)) + 2 * sizeof(double))
-
-/*
- * Macros for convenient access to parts of a serialized MCV item.
- */
-#define ITEM_INDEXES(item)                     ((uint16 *) (item))
-#define ITEM_NULLS(item,ndims)         ((bool *) (ITEM_INDEXES(item) + 
(ndims)))
-#define ITEM_FREQUENCY(item,ndims)     ((double *) (ITEM_NULLS(item, ndims) + 
(ndims)))
-#define ITEM_BASE_FREQUENCY(item,ndims)        ((double *) 
(ITEM_FREQUENCY(item, ndims) + 1))
+       ((ndims) * (sizeof(uint16) + sizeof(bool)) + 2 * sizeof(double))
 
 /*
  * Used to compute size of serialized MCV list representation.
@@ -68,10 +60,15 @@
 #define MinSizeOfMCVList               \
        (VARHDRSZ + sizeof(uint32) * 3 + sizeof(AttrNumber))
 
+/*
+ * Size of the serialized MCV list, exluding the space needed for
+ * deduplicated per-dimension values. That's because the macro is
+ * used in places when it's not yet safe to access the data.
+ */
 #define SizeOfMCVList(ndims,nitems)    \
-       (MAXALIGN(MinSizeOfMCVList + sizeof(Oid) * (ndims)) + \
-        MAXALIGN((ndims) * sizeof(DimensionInfo)) + \
-        MAXALIGN((nitems) * ITEM_SIZE(ndims)))
+       ((MinSizeOfMCVList + sizeof(Oid) * (ndims)) + \
+        ((ndims) * sizeof(DimensionInfo)) + \
+        ((nitems) * ITEM_SIZE(ndims)))
 
 static MultiSortSupport build_mss(VacAttrStats **stats, int numattrs);
 
@@ -491,19 +488,16 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats 
**stats)
        int                     i;
        int                     dim;
        int                     ndims = mcvlist->ndimensions;
-       int                     itemsize = ITEM_SIZE(ndims);
 
        SortSupport ssup;
        DimensionInfo *info;
 
        Size            total_length;
 
-       /* allocate the item just once */
-       char       *item = palloc0(itemsize);
-
        /* serialized items (indexes into arrays, etc.) */
-       char       *raw;
+       bytea      *raw;
        char       *ptr;
+       char       *endptr PG_USED_FOR_ASSERTS_ONLY;
 
        /* values per dimension (and number of non-NULL values) */
        Datum     **values = (Datum **) palloc0(sizeof(Datum *) * ndims);
@@ -597,8 +591,17 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats 
**stats)
                 */
                info[dim].nvalues = ndistinct;
 
-               if (info[dim].typlen > 0)       /* fixed-length data types */
+               if (info[dim].typbyval) /* by-value data types */
+               {
+                       info[dim].nbytes = info[dim].nvalues * info[dim].typlen;
+                       /* we copy the data into the item, so we don't need 
extra space */
+                       info[dim].nbytes_aligned = 0;
+               }
+               else if (info[dim].typlen > 0)          /* fixed-length by-ref 
*/
+               {
                        info[dim].nbytes = info[dim].nvalues * info[dim].typlen;
+                       info[dim].nbytes_aligned = info[dim].nvalues * 
MAXALIGN(info[dim].typlen);
+               }
                else if (info[dim].typlen == -1)        /* varlena */
                {
                        info[dim].nbytes = 0;
@@ -609,7 +612,11 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats 
**stats)
                                values[dim][i] = 
PointerGetDatum(PG_DETOAST_DATUM(values[dim][i]));
 
                                len = VARSIZE_ANY(values[dim][i]);
-                               info[dim].nbytes += MAXALIGN(len);
+                               info[dim].nbytes += sizeof(uint32);     /* 
length */
+                               info[dim].nbytes += len;                        
/* value (with header) */
+
+                               /* space needed for max-aligned copies 
(deserialization) */
+                               info[dim].nbytes_aligned += MAXALIGN(len);
                        }
                }
                else if (info[dim].typlen == -2)        /* cstring */
@@ -619,11 +626,15 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats 
**stats)
                        {
                                Size            len;
 
-                               /* c-strings include terminator, so +1 byte */
                                values[dim][i] = 
PointerGetDatum(PG_DETOAST_DATUM(values[dim][i]));
 
+                               /* c-strings include terminator, so +1 byte */
                                len = strlen(DatumGetCString(values[dim][i])) + 
1;
-                               info[dim].nbytes += MAXALIGN(len);
+                               info[dim].nbytes += sizeof(uint32);     /* 
length */
+                               info[dim].nbytes += len;                        
/* value */
+
+                               /* space needed for max-aligned copies 
(deserialization) */
+                               info[dim].nbytes_aligned += MAXALIGN(len);
                        }
                }
 
@@ -635,34 +646,33 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats 
**stats)
         * Now we can finally compute how much space we'll actually need for the
         * whole serialized MCV list (varlena header, MCV header, dimension info
         * for each attribute, deduplicated values and items).
-        *
-        * The header fields are copied one by one, so that we don't need any
-        * explicit alignment (we copy them while deserializing). All fields 
after
-        * this need to be properly aligned, for direct access.
         */
-       total_length = MAXALIGN(VARHDRSZ + (3 * sizeof(uint32))
-                                                       + sizeof(AttrNumber) + 
(ndims * sizeof(Oid)));
+       total_length = (3 * sizeof(uint32))                     /* magic + type 
+ nitems */
+                                       + sizeof(AttrNumber)            /* 
ndimensions */
+                                       + (ndims * sizeof(Oid));        /* 
types attributes */
 
        /* dimension info */
-       total_length += MAXALIGN(ndims * sizeof(DimensionInfo));
+       total_length += ndims * sizeof(DimensionInfo);
 
        /* add space for the arrays of deduplicated values */
        for (i = 0; i < ndims; i++)
-               total_length += MAXALIGN(info[i].nbytes);
+               total_length += info[i].nbytes;
 
        /*
-        * And finally the items (no additional alignment needed, we start at
-        * proper alignment and the itemsize formula uses MAXALIGN)
+        * And finally account for the items (those are fixed-length, thanks to
+        * replacing values with uint16 indexes into the deduplicated arrays).
         */
-       total_length += mcvlist->nitems * itemsize;
+       total_length += mcvlist->nitems * ITEM_SIZE(dim);
 
        /*
         * Allocate space for the whole serialized MCV list (we'll skip bytes, 
so
         * we set them to zero to make the result more compressible).
         */
-       raw = palloc0(total_length);
-       SET_VARSIZE(raw, total_length);
+       raw = (bytea *) palloc0(VARHDRSZ + total_length);
+       SET_VARSIZE(raw, VARHDRSZ + total_length);
+
        ptr = VARDATA(raw);
+       endptr = ptr + total_length;
 
        /* copy the MCV list header fields, one by one */
        memcpy(ptr, &mcvlist->magic, sizeof(uint32));
@@ -680,12 +690,9 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats 
**stats)
        memcpy(ptr, mcvlist->types, sizeof(Oid) * ndims);
        ptr += (sizeof(Oid) * ndims);
 
-       /* the header may not be exactly aligned, so make sure it is */
-       ptr = raw + MAXALIGN(ptr - raw);
-
-       /* store information about the attributes */
+       /* store information about the attributes (data amounts, ...) */
        memcpy(ptr, info, sizeof(DimensionInfo) * ndims);
-       ptr += MAXALIGN(sizeof(DimensionInfo) * ndims);
+       ptr += sizeof(DimensionInfo) * ndims;
 
        /* Copy the deduplicated values for all attributes to the output. */
        for (dim = 0; dim < ndims; dim++)
@@ -723,17 +730,27 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats 
**stats)
                        }
                        else if (info[dim].typlen == -1)        /* varlena */
                        {
-                               int                     len = 
VARSIZE_ANY(value);
+                               uint32          len = VARSIZE_ANY(value);
 
+                               /* copy the length */
+                               memcpy(ptr, &len, sizeof(uint32));
+                               ptr += sizeof(uint32);
+
+                               /* value (with the varlena header) */
                                memcpy(ptr, DatumGetPointer(value), len);
-                               ptr += MAXALIGN(len);
+                               ptr += len;
                        }
                        else if (info[dim].typlen == -2)        /* cstring */
                        {
-                               Size            len = 
strlen(DatumGetCString(value)) + 1;       /* terminator */
+                               uint32          len = (uint32) 
strlen(DatumGetCString(value)) + 1;
+
+                               /* copy the length */
+                               memcpy(ptr, &len, sizeof(uint32));
+                               ptr += sizeof(uint32);
 
+                               /* value */
                                memcpy(ptr, DatumGetCString(value), len);
-                               ptr += MAXALIGN(len);
+                               ptr += len;
                        }
 
                        /* no underflows or overflows */
@@ -742,62 +759,65 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats 
**stats)
 
                /* we should get exactly nbytes of data for this dimension */
                Assert((ptr - start) == info[dim].nbytes);
-
-               /* make sure the pointer is aligned correctly after each 
dimension */
-               ptr = raw + MAXALIGN(ptr - raw);
        }
 
        /* Serialize the items, with uint16 indexes instead of the values. */
        for (i = 0; i < mcvlist->nitems; i++)
        {
                MCVItem    *mcvitem = &mcvlist->items[i];
+               int                     itemlen = ITEM_SIZE(dim);
 
                /* don't write beyond the allocated space */
-               Assert(ptr <= raw + total_length - itemsize);
+               Assert(ptr <= (endptr - itemlen));
+
+               /* copy NULL and frequency flags into the serialized MCV */
+               memcpy(ptr, mcvitem->isnull, sizeof(bool) * ndims);
+               ptr += sizeof(bool) * ndims;
+
+               memcpy(ptr, &mcvitem->frequency, sizeof(double));
+               ptr += sizeof(double);
 
-               /* reset the item (we only allocate it once and reuse it) */
-               memset(item, 0, itemsize);
+               memcpy(ptr, &mcvitem->base_frequency, sizeof(double));
+               ptr += sizeof(double);
 
+               /* store the indexes last */
                for (dim = 0; dim < ndims; dim++)
                {
+                       uint16          index = 0;
                        Datum      *value;
 
                        /* do the lookup only for non-NULL values */
-                       if (mcvitem->isnull[dim])
-                               continue;
+                       if (!mcvitem->isnull[dim])
+                       {
+                               value = (Datum *) 
bsearch_arg(&mcvitem->values[dim], values[dim],
+                                                                               
          info[dim].nvalues, sizeof(Datum),
+                                                                               
          compare_scalars_simple, &ssup[dim]);
 
-                       value = (Datum *) bsearch_arg(&mcvitem->values[dim], 
values[dim],
-                                                                               
  info[dim].nvalues, sizeof(Datum),
-                                                                               
  compare_scalars_simple, &ssup[dim]);
+                               Assert(value != NULL);  /* serialization or 
deduplication error */
 
-                       Assert(value != NULL);  /* serialization or 
deduplication error */
+                               /* compute index within the deduplicated array 
*/
+                               index = (uint16) (value - values[dim]);
 
-                       /* compute index within the array */
-                       ITEM_INDEXES(item)[dim] = (uint16) (value - 
values[dim]);
+                               /* check the index is within expected bounds */
+                               Assert(index < info[dim].nvalues);
+                       }
 
-                       /* check the index is within expected bounds */
-                       Assert(ITEM_INDEXES(item)[dim] < info[dim].nvalues);
+                       /* copy the index into the serialized MCV */
+                       memcpy(ptr, &index, sizeof(uint16));
+                       ptr += sizeof(uint16);
                }
 
-               /* copy NULL and frequency flags into the item */
-               memcpy(ITEM_NULLS(item, ndims), mcvitem->isnull, sizeof(bool) * 
ndims);
-               memcpy(ITEM_FREQUENCY(item, ndims), &mcvitem->frequency, 
sizeof(double));
-               memcpy(ITEM_BASE_FREQUENCY(item, ndims), 
&mcvitem->base_frequency, sizeof(double));
-
-               /* copy the serialized item into the array */
-               memcpy(ptr, item, itemsize);
-
-               ptr += itemsize;
+               /* make sure we don't overflow the allocated value */
+               Assert(ptr <= endptr);
        }
 
        /* at this point we expect to match the total_length exactly */
-       Assert((ptr - raw) == total_length);
+       Assert(ptr == endptr);
 
-       pfree(item);
        pfree(values);
        pfree(counts);
 
-       return (bytea *) raw;
+       return raw;
 }
 
 /*
@@ -816,6 +836,7 @@ statext_mcv_deserialize(bytea *data)
        MCVList    *mcvlist;
        char       *raw;
        char       *ptr;
+       char       *endptr PG_USED_FOR_ASSERTS_ONLY;
 
        int                     ndims,
                                nitems;
@@ -849,8 +870,9 @@ statext_mcv_deserialize(bytea *data)
        mcvlist = (MCVList *) palloc0(offsetof(MCVList, items));
 
        /* pointer to the data part (skip the varlena header) */
-       ptr = VARDATA_ANY(data);
        raw = (char *) data;
+       ptr = VARDATA_ANY(raw);
+       endptr = (char *) raw + VARSIZE_ANY(data);
 
        /* get the header and perform further sanity checks */
        memcpy(&mcvlist->magic, ptr, sizeof(uint32));
@@ -909,12 +931,11 @@ statext_mcv_deserialize(bytea *data)
        memcpy(mcvlist->types, ptr, sizeof(Oid) * ndims);
        ptr += (sizeof(Oid) * ndims);
 
-       /* ensure alignment of the pointer (after the header fields) */
-       ptr = raw + MAXALIGN(ptr - raw);
-
        /* Now it's safe to access the dimension info. */
-       info = (DimensionInfo *) ptr;
-       ptr += MAXALIGN(ndims * sizeof(DimensionInfo));
+       info = palloc(ndims * sizeof(DimensionInfo));
+
+       memcpy(info, ptr, ndims * sizeof(DimensionInfo));
+       ptr += (ndims * sizeof(DimensionInfo));
 
        /* account for the value arrays */
        for (dim = 0; dim < ndims; dim++)
@@ -926,7 +947,7 @@ statext_mcv_deserialize(bytea *data)
                Assert(info[dim].nvalues >= 0);
                Assert(info[dim].nbytes >= 0);
 
-               expected_size += MAXALIGN(info[dim].nbytes);
+               expected_size += info[dim].nbytes;
        }
 
        /*
@@ -955,15 +976,18 @@ statext_mcv_deserialize(bytea *data)
                map[dim] = (Datum *) palloc(sizeof(Datum) * info[dim].nvalues);
 
                /* space needed for a copy of data for by-ref types */
-               if (!info[dim].typbyval)
-                       datalen += MAXALIGN(info[dim].nbytes);
+               datalen += info[dim].nbytes_aligned;
        }
 
        /*
-        * Now resize the MCV list so that the allocation includes all the data
+        * Now resize the MCV list so that the allocation includes all the data.
+        *
         * Allocate space for a copy of the data, as we can't simply reference 
the
-        * original data - it may disappear while we're still using the MCV 
list,
-        * e.g. due to catcache release. Only needed for by-ref types.
+        * serialized data - it's not aligned properly, and it may disappear 
while
+        * we're still using the MCV list, e.g. due to catcache release.
+        *
+        * We do care about alignment here, because we will allocate all the 
pieces
+        * at once, but then use pointers to different parts.
         */
        mcvlen = MAXALIGN(offsetof(MCVList, items) + (sizeof(MCVItem) * 
nitems));
 
@@ -1024,7 +1048,7 @@ statext_mcv_deserialize(bytea *data)
 
                                        /* just point into the array */
                                        map[dim][i] = PointerGetDatum(dataptr);
-                                       dataptr += info[dim].typlen;
+                                       dataptr += MAXALIGN(info[dim].typlen);
                                }
                        }
                        else if (info[dim].typlen == -1)
@@ -1032,10 +1056,13 @@ statext_mcv_deserialize(bytea *data)
                                /* varlena */
                                for (i = 0; i < info[dim].nvalues; i++)
                                {
-                                       Size            len = VARSIZE_ANY(ptr);
+                                       uint32          len;
+
+                                       memcpy(&len, ptr, sizeof(uint32));
+                                       ptr += sizeof(uint32);
 
                                        memcpy(dataptr, ptr, len);
-                                       ptr += MAXALIGN(len);
+                                       ptr += len;
 
                                        /* just point into the array */
                                        map[dim][i] = PointerGetDatum(dataptr);
@@ -1047,10 +1074,13 @@ statext_mcv_deserialize(bytea *data)
                                /* cstring */
                                for (i = 0; i < info[dim].nvalues; i++)
                                {
-                                       Size            len = (strlen(ptr) + 
1);        /* don't forget the \0 */
+                                       uint32          len;
+
+                                       memcpy(&len, ptr, sizeof(uint32));
+                                       ptr += sizeof(uint32);
 
                                        memcpy(dataptr, ptr, len);
-                                       ptr += MAXALIGN(len);
+                                       ptr += len;
 
                                        /* just point into the array */
                                        map[dim][i] = PointerGetDatum(dataptr);
@@ -1067,9 +1097,6 @@ statext_mcv_deserialize(bytea *data)
 
                /* check we consumed input data for this dimension exactly */
                Assert(ptr == (start + info[dim].nbytes));
-
-               /* ensure proper alignment of the data */
-               ptr = raw + MAXALIGN(ptr - raw);
        }
 
        /* we should have also filled the MCV list exactly */
@@ -1078,7 +1105,6 @@ statext_mcv_deserialize(bytea *data)
        /* deserialize the MCV items and translate the indexes to Datums */
        for (i = 0; i < nitems; i++)
        {
-               uint16     *indexes = NULL;
                MCVItem    *item = &mcvlist->items[i];
 
                item->values = (Datum *) valuesptr;
@@ -1087,27 +1113,35 @@ statext_mcv_deserialize(bytea *data)
                item->isnull = (bool *) isnullptr;
                isnullptr += MAXALIGN(sizeof(bool) * ndims);
 
+               memcpy(item->isnull, ptr, sizeof(bool) * ndims);
+               ptr += sizeof(bool) * ndims;
 
-               /* just point to the right place */
-               indexes = ITEM_INDEXES(ptr);
+               memcpy(&item->frequency, ptr, sizeof(double));
+               ptr += sizeof(double);
 
-               memcpy(item->isnull, ITEM_NULLS(ptr, ndims), sizeof(bool) * 
ndims);
-               memcpy(&item->frequency, ITEM_FREQUENCY(ptr, ndims), 
sizeof(double));
-               memcpy(&item->base_frequency, ITEM_BASE_FREQUENCY(ptr, ndims), 
sizeof(double));
+               memcpy(&item->base_frequency, ptr, sizeof(double));
+               ptr += sizeof(double);
 
-               /* translate the values */
+               /* finally translate the indexes (for non-NULL only) */
                for (dim = 0; dim < ndims; dim++)
-                       if (!item->isnull[dim])
-                               item->values[dim] = map[dim][indexes[dim]];
+               {
+                       uint16  index;
+
+                       memcpy(&index, ptr, sizeof(uint16));
+                       ptr += sizeof(uint16);
 
-               ptr += ITEM_SIZE(ndims);
+                       if (item->isnull[dim])
+                               continue;
+
+                       item->values[dim] = map[dim][index];
+               }
 
                /* check we're not overflowing the input */
-               Assert(ptr <= (char *) raw + VARSIZE_ANY(data));
+               Assert(ptr <= endptr);
        }
 
        /* check that we processed all the data */
-       Assert(ptr == raw + VARSIZE_ANY(data));
+       Assert(ptr == endptr);
 
        /* release the buffers used for mapping */
        for (dim = 0; dim < ndims; dim++)
diff --git a/src/include/statistics/extended_stats_internal.h 
b/src/include/statistics/extended_stats_internal.h
index fb03c52f50..6778746db1 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -36,6 +36,7 @@ typedef struct DimensionInfo
 {
        int                     nvalues;                /* number of 
deduplicated values */
        int                     nbytes;                 /* number of bytes 
(serialized) */
+       int                     nbytes_aligned; /* deserialized data with 
alignment */
        int                     typlen;                 /* pg_type.typlen */
        bool            typbyval;               /* pg_type.typbyval */
 } DimensionInfo;

Reply via email to