On Thu, 23 Feb 2023, Jakub Jelinek wrote:

> On Thu, Feb 23, 2023 at 03:02:01PM +0000, Richard Biener wrote:
> > > >         * vec.h (auto_vec<T, N>): Turn m_data storage into
> > > >         uninitialized unsigned char.
> > > 
> > > Given that we actually never reference the m_data array anywhere,
> > > it is just to reserve space, I think even the alignas(T) there is
> > > useless.  The point is that m_auto has as data members:
> > >   vec_prefix m_vecpfx;
> > >   T m_vecdata[1];
> > > and we rely on it (admittedly -fstrict-flex-arrays{,=2,=3} or
> > > -fsanitize=bound-sstrict incompatible) being treated as
> > > flexible array member flowing into the m_data storage after it.
> > 
> > Doesn't the array otherwise eventually overlap with tail padding
> > in m_auto?  Or does an array of T never produce tail padding?
> 
> The array can certainly overlap with tail padding in m_auto if any.
> But whether m_data is aligned to alignof (T) or not doesn't change anything
> on it.
> m_vecpfx is struct { unsigned m_alloc : 31, m_using_auto_storage : 1, m_num; 
> },
> so I think there is on most arches tail padding if T has smaller alignment
> than int, so typically char/short or structs with the same size/alignments.
> If that happens, alignof (auto_vec_x.m_auto) will be alignof (int),
> there can be 2 or 3 padding bytes, but because sizeof (auto_vec_x.m_auto)
> is 3 * sizeof (int), m_data will have offset always aligned to alignof (T).
> If alignof (T) >= alignof (int), then there won't be any tail padding
> at the end of m_auto, there could be padding between m_vecpfx and
> m_vecdata, sizeof (auto_vec_x.m_auto) will be a multiple of sizeof (T) and
> so m_data will be again already properly aligned.
> 
> So, I think your patch is fine without alignas(T), the rest is just that
> there is more work to do incrementally, even for the case you want to
> deal with (the point 1) in particular).

Looking at vec<T, A, vl_embed>::operator[] which just does

template<typename T, typename A>
inline const T &
vec<T, A, vl_embed>::operator[] (unsigned ix) const
{
  gcc_checking_assert (ix < m_vecpfx.m_num);
  return m_vecdata[ix];
} 

the whole thing looks fragile at best - we basically have

struct auto_vec
{
  struct vec<vl_embed>
  {
...
    T m_vecdata[1];
  } m_auto;
  T m_data[N-1];
};

and access m_auto.m_vecdata[] as if it extends to m_data.  That's
not something supported by the middle-end - not by design at least.
auto_vec *p; p->m_auto.m_vecdata[i] would never alias
p->m_data[j], in practice we might not see this though.  Also
get_ref_base_and_extent will compute a maxsize/size of sizeof(T)
for any m_auto.m_vecdata[i] access, but I think we nowhere
actually replace 'i' by zero based on this knowledge, but we'd
perform CSE with earlier m_auto.m_vecdata[0] stores, so that
might be something one could provoke.  Doing a self-test like

static __attribute__((noipa)) void
test_auto_alias (int i)
{ 
  auto_vec<int, 8> v;
  v.quick_grow (2);
  v[0] = 1;
  v[1] = 2;
  int val = v[i];
  ASSERT_EQ (val, 2);
} 

shows

  _27 = &_25->m_vecdata[0];
  *_27 = 1;
...
  _7 = &_12->m_vecdata[i.235_3];
  val_13 = *_7;

which is safe in middle-end rules though.  So what "saves" us
here is that we always return a reference and never a value.
There's the ::iterate member function which fails to do this,
the ::quick_push function does

  T *slot = &m_vecdata[m_vecpfx.m_num++];
  *slot = obj;

with

static __attribute__((noipa)) void
test_auto_alias (int i)
{ 
  auto_vec<int, 8> v;
  v.quick_grow (2);
  v[0] = 1;
  v[1] = 2;
  int val;
  for (int ix = i; v.iterate (ix, &val); ix++)
    ;
  ASSERT_EQ (val, 2);
} 

I get that optimzied to a FAIL.  I have a "fix" for this.
unordered_remove has a similar issue accesing the last element.
There are a few functions using the [] access member which is
at least sub-optimal due to repeated bounds checking but also safe.

I suppose if auto_vec would be a union of vec<vl_embed> and
a storage member with the vl_embed active that would work, but then
that's likely not something C++11 supports.

So I think to support auto_vec we'd need to make the m_vecdata[]
member in vec<vl_embed> of templated size (defaulted to 1)
and get rid of the m_data member in auto_vec instead.  Or have
another C++ way of increasing the size of auto_vec without
actually adding any member?

The vec<vl_embed> data accesses then would need to go through
a wrapper obtaining a correctly typed pointer to m_vecdata[]
since we'd like to have that as unsigned char[] to avoid the
initialization.

> > Yes, I'm not proposing to fix non-POD support.  I want to make
> > as-if-POD stuff like std::pair to work like it was intended.
> > 
> > > Oh, and perhaps we should start marking such spots in GCC with
> > > strict_flex_array attribute to make it clear where we rely on the
> > > non-strict behavior.
> > 
> > I think we never access the array directly as array, do we?
> 
> Sure, the attribute should go to m_vecdata array, not to m_data.
> And to op array in gimple_statement_with_ops, operands array in
> operands, ops array in tree_omp_clause, val in tree_int_cst,
> str in tree_string, elts in tree_vector, a in tree_vec, elem in rtvec_def,
> elem in hwivec_def, u.{fld,hwint} in rtx_def and various others, we
> use this stuff everywhere.  Also often use GTY length similarly to the
> proposed element_count...

Here's a patch to fix vec.h to adhere to GCC middle-ends interpretation
of array accesses - &array[i] is just address arithmetic and out-of-bounds
accesses are OK.  But that doesn't prevent other host compilers from
miscompiling stage1 I guess, I'm not sure if there's any standard
conforming way to compute the address of an element in m_data from
the address of m_vecdata?

Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.


>From b7d47dd1c166d216eba7160f11087312f8b5bbef Mon Sep 17 00:00:00 2001
From: Richard Biener <rguent...@suse.de>
Date: Fri, 24 Feb 2023 09:54:09 +0100
Subject: [PATCH] Fix vec.h alias problems
To: gcc-patches@gcc.gnu.org

With auto_vec we have embedded storage that spans two members,
the m_auto.m_vecdata[1] and the m_data[N-1] arrays and accesses
to the data are all based on m_auto.m_vecdata[].  That's OK for
GCC as long as the accesses are done indirectly through a pointer
since the address computation based on m_auto.m_vecdata[] is
considered OK by GCC (but not by the language standards).

The following fixes the few places that failed to enfoce this
indirection and adds one self-test that failed before.

As I was here it also fixes ::contains to avoid repeated bounds
checking and the same issue in :;lower_bound which also suffers
from unnecessary copying around values.

        * vec.h (vec<T, A, vl_embed>::lower_bound): Adjust to
        take a const reference to the object, access m_vecdata
        directly.
        (vec<T, A, vl_embed>::contains): Likewise.
        (vec<T, A, vl_embed>::iterate): Perform accesses to
        m_vecdata indirectly.
        (vec<T, A, vl_embed>::unordered_remove): Likewise.
        * vec.cc (test_auto_alias): New.
        (vec_cc_tests): Call it.
---
 gcc/vec.cc | 17 +++++++++++++++++
 gcc/vec.h  | 21 ++++++++++++++-------
 2 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/gcc/vec.cc b/gcc/vec.cc
index 511e6dff50d..afba66cb3d0 100644
--- a/gcc/vec.cc
+++ b/gcc/vec.cc
@@ -568,6 +568,22 @@ test_auto_delete_vec ()
   ASSERT_EQ (dtor_count, 2);
 }
 
+/* Verify accesses to m_vecdata are done indirectly.  */
+
+static void
+test_auto_alias ()
+{
+  volatile int i = 1;
+  auto_vec<int, 8> v;
+  v.quick_grow (2);
+  v[0] = 1;
+  v[1] = 2;
+  int val;
+  for (int ix = i; v.iterate (ix, &val); ix++)
+    ;
+  ASSERT_EQ (val, 2);
+}
+
 /* Run all of the selftests within this file.  */
 
 void
@@ -587,6 +603,7 @@ vec_cc_tests ()
   test_qsort ();
   test_reverse ();
   test_auto_delete_vec ();
+  test_auto_alias ();
 }
 
 } // namespace selftest
diff --git a/gcc/vec.h b/gcc/vec.h
index cee96254a31..b3ec977c23a 100644
--- a/gcc/vec.h
+++ b/gcc/vec.h
@@ -614,7 +614,7 @@ public:
   T *bsearch (const void *key, int (*compar)(const void *, const void *));
   T *bsearch (const void *key,
              int (*compar)(const void *, const void *, void *), void *);
-  unsigned lower_bound (T, bool (*)(const T &, const T &)) const;
+  unsigned lower_bound (const T &, bool (*)(const T &, const T &)) const;
   bool contains (const T &search) const;
   static size_t embedded_size (unsigned);
   void embedded_init (unsigned, unsigned = 0, unsigned = 0);
@@ -929,7 +929,8 @@ vec<T, A, vl_embed>::iterate (unsigned ix, T *ptr) const
 {
   if (ix < m_vecpfx.m_num)
     {
-      *ptr = m_vecdata[ix];
+      const T *slot = &m_vecdata[ix];
+      *ptr = *slot;
       return true;
     }
   else
@@ -1118,7 +1119,9 @@ inline void
 vec<T, A, vl_embed>::unordered_remove (unsigned ix)
 {
   gcc_checking_assert (ix < length ());
-  m_vecdata[ix] = m_vecdata[--m_vecpfx.m_num];
+  const T *last_slot = &m_vecdata[--m_vecpfx.m_num];
+  T *slot = &m_vecdata[ix];
+  *slot = *last_slot;
 }
 
 
@@ -1249,8 +1252,11 @@ vec<T, A, vl_embed>::contains (const T &search) const
 {
   unsigned int len = length ();
   for (unsigned int i = 0; i < len; i++)
-    if ((*this)[i] == search)
-      return true;
+    {
+      const T *slot = &m_vecdata[i];
+      if (*slot == search)
+       return true;
+    }
 
   return false;
 }
@@ -1262,7 +1268,8 @@ vec<T, A, vl_embed>::contains (const T &search) const
 
 template<typename T, typename A>
 unsigned
-vec<T, A, vl_embed>::lower_bound (T obj, bool (*lessthan)(const T &, const T 
&))
+vec<T, A, vl_embed>::lower_bound (const T &obj,
+                                 bool (*lessthan)(const T &, const T &))
   const
 {
   unsigned int len = length ();
@@ -1273,7 +1280,7 @@ vec<T, A, vl_embed>::lower_bound (T obj, bool 
(*lessthan)(const T &, const T &))
       half = len / 2;
       middle = first;
       middle += half;
-      T middle_elem = (*this)[middle];
+      const T &middle_elem = this->m_vecdata[middle];
       if (lessthan (middle_elem, obj))
        {
          first = middle;
-- 
2.35.3

Reply via email to