Memory access in GIMPLE

Krister Walfridsson via Gcc Wed, 02 Apr 2025 17:19:53 -0700

I have more questions about GIMPLE memory semantics for smtgcc.

As before, each section starts with a description of the semantics I'veimplemented (or plan to implement), followed by concrete questions ifrelevant. Let me know if the described semantics are incorrect orincomplete.



Accessing memory
----------------

Memory access in GIMPLE is done using GIMPLE_ASSIGN statements where thelhs and/or rhs is a memory reference expression (such as MEM_REF). Whenboth lhs and rhs access memory, one of the following must hold --otherwise the access is UB:

 1. There is no overlap between lhs and rhs
 2. lhs and rhs represent the same address

A memory access is also UB in the following cases:
 * Any accessed byte is outside valid memory
 * The pointer violates the alignment requirements
 * The pointer provenance doesn't match the object
 * The type is incorrect from a TBAA perspective
 * It's a store to constant memory

smtgcc requires -fno-strict-aliasing for now, so I'll ignore TBAA in thismail. Provenance has its own issues, which I'll come back to in a separatemail.



Checking memory access is within bounds
---------------------------------------

A memory access may be represented by a chain of memory referenceexpressions such as MEM_REF, ARRAY_REF, COMPONENT_REF, etc. For example,accessing a structure:


  struct s {
    int x, y;
  };

as:

  int foo (struct s * p)
  {
    int _3;

    <bb 2> :
    _3 = p_1(D)->x;
    return _3;
  }

involves a MEM_REF for the whole object and a COMPONENT_REF to select thefield. Conceptually, we load the entire structure and then pick out theelement -- so all bytes of the structure must be in valid memory.


We could also do the access as:

  int foo (struct s * p)
  {
    int * q;
    int _3;

    <bb 2> :
    q_2 = &p_1(D)->x;
    _3 = *q_2;
    return _3;
  }

This calculates the address of the element, and then reads it as aninteger, so only the four bytes of x must be in valid memory.


In other words, the compiler is not allowed to optimize:
  q_2 = &p_1(D)->x;
  _3 = *q_2;
to
  _3 = p_1(D)->x;

Question: Describing the first case as conceptually reading the wholestructure makes sense for loads. But I assume the same requirement -- thatthe entire object must be in valid memory -- also applies for stores. Isthat correct?



Allowed out-of-bounds read?
---------------------------
Compile the function below for x86_64 with "-O3 -march=x86-64-v2":

  int f(int *a)
  {
    for (int i = 0; i < 100; i++)
      if (a[i])
        return 1;
    return 0;
  }

The vectorizer transforms this into code that processes one scalar elementat a time until the pointer is 16-byte aligned, then switches to vectorloads.


The original code is well-defined when called like this:

  int a[2] __attribute__((aligned(16))) = {1, 0};
  f(a);

But the vectorized version ends up reading 8 bytes out of bounds.

This out-of-bounds read is harmless in practice -- it stays within thesame memory page, so the extra bytes are accessable. But it's invalidunder the smtgcc memory model.

Question: Is this considered a valid access in GIMPLE? If so, what are therules for allowed out-of-bounds memory reads?



Alignment check
---------------

Question: smtgcc currently gets the alignment requirements by callingget_object_alignment on the tree expression returned fromgimple_assign_lhs (for stores) or gimple_assign_rhs1 (for loads). Is thatthe correct way to get the required alignment?



   /Krister

Memory access in GIMPLE

Reply via email to