On Wed, Jun 26, 2019 at 09:49:46AM +0200, Tomas Vondra wrote:
On Tue, Jun 25, 2019 at 11:52:28PM -0400, Tom Lane wrote:
I'm seeing a reproducible bus error here:

#0  0x00417420 in statext_mcv_serialize (mcvlist=0x62223450, stats=Variable 
"stats" is not available.
)
  at mcv.c:785
785                     memcpy(ITEM_BASE_FREQUENCY(item, ndims), 
&mcvitem->base_frequency, sizeof(double));

What appears to be happening is that since ITEM_BASE_FREQUENCY is defined as

#define ITEM_BASE_FREQUENCY(item,ndims) ((double *) (ITEM_FREQUENCY(item, 
ndims) + 1))

the compiler is assuming that the first argument to memcpy is
double-aligned, and it is generating code that depends on that being
true, and of course it isn't true and kaboom.

You can *not* cast something to an aligned pointer type if it's not
actually certain to be aligned suitably for that type.  In this example,
even if you wrote "(char *)" in front of this, it wouldn't save you;
the compiler would still be entitled to believe that the intermediate
cast value meant something.  The casts in the underlying macros
ITEM_FREQUENCY and so on are equally unsafe.


OK. So the solution is to ditch the casts altogether, and then do plain
pointer arithmetics like this:

#define ITEM_INDEXES(item)                      (item)
#define ITEM_NULLS(item,ndims)          (ITEM_INDEXES(item) + (ndims))
#define ITEM_FREQUENCY(item,ndims)      (ITEM_NULLS(item, ndims) + (ndims))
#define ITEM_BASE_FREQUENCY(item,ndims) (ITEM_FREQUENCY(item, ndims) + 
sizeof(double))

Or is that still relying on alignment, somehow?


Attached is a patch that should (hopefully) fix this. It essentially
treats the item as (char *) and does all pointer arithmetics without any
additional casts. So there are no intermediate casts.

I have no way to test this, so I may either wait for you to test this
first, or push and wait. It seems to fail only on a very small number of
buildfarm animals, so having a confirmation would be nice.

The fix keeps the binary format as is, so the serialized MCV items are
max-aligned. That means we can access the uint16 indexes directly, but we
need to copy the rest of the fields (because those may not be aligned). In
hindsight that seems a bit silly, we might as well copy everything, not
care about the alignment and maybe save a few more bytes. But that would
require catversion bump. OTOH we may beed to do that anyway, to fix the
pg_mcv_list_items() signature (as discussed in the other MCV thread).

regards
--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index 5fe61ea0a4..01ea05b486 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -57,10 +57,10 @@
 /*
  * Macros for convenient access to parts of a serialized MCV item.
  */
-#define ITEM_INDEXES(item)                     ((uint16 *) (item))
-#define ITEM_NULLS(item,ndims)         ((bool *) (ITEM_INDEXES(item) + 
(ndims)))
-#define ITEM_FREQUENCY(item,ndims)     ((double *) (ITEM_NULLS(item, ndims) + 
(ndims)))
-#define ITEM_BASE_FREQUENCY(item,ndims)        ((double *) 
(ITEM_FREQUENCY(item, ndims) + 1))
+#define ITEM_INDEXES(item)                     ((char *) item)
+#define ITEM_NULLS(item,ndims)         (ITEM_INDEXES(item) + (ndims) * 
sizeof(uint16))
+#define ITEM_FREQUENCY(item,ndims)     (ITEM_NULLS(item, ndims) + (ndims) * 
sizeof(bool))
+#define ITEM_BASE_FREQUENCY(item,ndims)        (ITEM_FREQUENCY(item, ndims) + 
sizeof(double))
 
 /*
  * Used to compute size of serialized MCV list representation.
@@ -751,6 +751,7 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats 
**stats)
        for (i = 0; i < mcvlist->nitems; i++)
        {
                MCVItem    *mcvitem = &mcvlist->items[i];
+               uint16     *indexes = (uint16 *) ITEM_INDEXES(item);
 
                /* don't write beyond the allocated space */
                Assert(ptr <= raw + total_length - itemsize);
@@ -773,10 +774,10 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats 
**stats)
                        Assert(value != NULL);  /* serialization or 
deduplication error */
 
                        /* compute index within the array */
-                       ITEM_INDEXES(item)[dim] = (uint16) (value - 
values[dim]);
+                       indexes[dim] = (uint16) (value - values[dim]);
 
                        /* check the index is within expected bounds */
-                       Assert(ITEM_INDEXES(item)[dim] < info[dim].nvalues);
+                       Assert(indexes[dim] < info[dim].nvalues);
                }
 
                /* copy NULL and frequency flags into the item */
@@ -1087,10 +1088,13 @@ statext_mcv_deserialize(bytea *data)
                item->isnull = (bool *) isnullptr;
                isnullptr += MAXALIGN(sizeof(bool) * ndims);
 
+               /*
+                * To access indexes, we can just point to the right place (we 
can
+                * do this, because we ensure alignment during serialization).
+                */
+               indexes = (uint16 *) ITEM_INDEXES(ptr);
 
-               /* just point to the right place */
-               indexes = ITEM_INDEXES(ptr);
-
+               /* The remaining fields may not be aligned, though. */
                memcpy(item->isnull, ITEM_NULLS(ptr, ndims), sizeof(bool) * 
ndims);
                memcpy(&item->frequency, ITEM_FREQUENCY(ptr, ndims), 
sizeof(double));
                memcpy(&item->base_frequency, ITEM_BASE_FREQUENCY(ptr, ndims), 
sizeof(double));

Reply via email to