================
@@ -44,24 +44,30 @@ void PointerSubChecker::checkPreStmt(const BinaryOperator 
*B,
 
   const MemRegion *LR = LV.getAsRegion();
   const MemRegion *RR = RV.getAsRegion();
-
-  if (!(LR && RR))
-    return;
-
-  const MemRegion *BaseLR = LR->getBaseRegion();
-  const MemRegion *BaseRR = RR->getBaseRegion();
-
-  if (BaseLR == BaseRR)
+  if (!LR || !RR)
     return;
 
-  // Allow arithmetic on different symbolic regions.
-  if (isa<SymbolicRegion>(BaseLR) || isa<SymbolicRegion>(BaseRR))
-    return;
+  const auto *ElemLR = dyn_cast<ElementRegion>(LR);
+  const auto *ElemRR = dyn_cast<ElementRegion>(RR);
+  // FIXME: We want to verify that these are elements of an array.
+  // Because behavior of ElementRegion it may be confused with a cast.
+  // There is not a simple way to distinguish it from array element (check the
+  // types?). Because this missing check a warning is missing in the rare case
+  // when two casted pointers to the same region (variable) are subtracted.
----------------
NagyDonat wrote:

I discussed your examples and questions with @whisperity and Zoltán Porkoláb, 
and I'm trying to summarize what we determined. Note that I'm mostly working 
from the most recent C++ draft standard (http://eel.is/c++draft/), so some the 
conclusions may be invalid in C or older versions of C++ (but I tried to 
highlight the differences that I know about).

(1) In your first example the step `int *b = a` is invalid because the 
`int[3][3]` array `a` decays to an `int (*)[3]` (pointer to array of 3 `int`s) 
and that type is not interconvertible with a plain `int *` (see [ 
[basic.compound] note 5](http://eel.is/c++draft/basic.compound#note-5) in the 
C++ draft standard). (And if you define `b` as `int (*b)[3] = a`, then the 
pointer subtraction will become invalid.)

(2) In the second example using a `long *` to initialize `char *` or `short *` 
variables is _usually_ an error, it's accepted under old C, but in C++ you need 
an explicit cast and e.g. modern GCC also produces an error 
(`-Wincompatible-pointer-types` a warning that's by default an error) when it 
compiles this.

I'd guess that the actual pointer arithmetic _is_ standard-compliant, but the 
relevant parts of the standard ([[expr.reinterpret.cast] 
7](http://eel.is/c++draft/expr.post#expr.reinterpret.cast-7), 
[[expr.static.cast] 
14](http://eel.is/c++draft/expr.post#expr.static.cast-14))are too vague to give 
a confident answer.

Note that under C++ _accessing_ the element `short b[1]` would be a violation 
of [[basic.lval] part 
11](http://eel.is/c++draft/basic.lval#def:type-accessible), but based on 
[[defns.access]](http://eel.is/c++draft/defns.access) I'd say that the 
expression `&b[1]` is _not_ an access of `b[1]`.

(3) In your third example `&a[2][2] - &a[1][1]` is clearly working in 
practically all implementations, but we're fairly sure that it's not compliant 
with the C++ standard, because [ [expr.add] part 
5.2](https://www.eel.is/c++draft/expr.add#5.2) speaks about "array elements _i_ 
and _j_ of the same array object _x_" (and later "_i_ - _j_" which shows that 
"_i_" and "_j_" are scalar numbers) -- while in your example `a[2][2]` and 
`a[1][1]` are not (numerically indexed) elements within the same array. (This 
is coherent with [[dcl.array]](http://eel.is/c++draft/dcl.array) where an 
"array" implicitly means a one-dimensional array structure (whose elements may 
be other arrays).)

Nevertheless, it may be reasonable to avoid emitting warnings on this kind of 
code, because that could be better for the users.

(4) The definition of "_what exactly an "array object" is_" and "what is an 
_element_ of an array" primarily appears at [[dcl.array] part 
6](http://eel.is/c++draft/dcl.array#6):

> An object of type “array of N U” consists of a contiguously allocated 
> non-empty set of N subobjects of type U, known as the _elements_ of the 
> array, and numbered 0 to N-1.

(This clearly shows that the _element of an element of_ an array is not an 
_element of_ the array. The notion of "multi-dimensional arrays" only appears 
in the _Example_ [[dcl.array] Example 
4](http://eel.is/c++draft/dcl.array#example-4) with word choices that suggests 
that this is just a mathematical/intuitive notion and not something that's 
well-defined in the standard.)

This general definition is augmented by [[basic.compound] part 3, sentence 
11](http://eel.is/c++draft/basic.compound#3.sentence-11) which states that:

> For purposes of pointer arithmetic 
> ([[expr.add]](http://eel.is/c++draft/expr.add)) and comparison 
> ([[expr.rel]](http://eel.is/c++draft/expr.rel), 
> [[expr.eq]](http://eel.is/c++draft/expr.eq)), a pointer past the end of the 
> last element of an array x of n elements is considered to be equivalent to a 
> pointer to a hypothetical array element n of x and an object of type T that 
> is not an array element is considered to belong to an array with one element 
> of type T.

(5) Yes, the sub-arrays of a multidimensional array are separate "array 
objects" that have their own elements -- but they are also single elements 
within the same bigger array. This means that if we have `int arr[3][3]`, then
- `int x = &(arr[2]) - &(arr[1])` is valid as the difference of two `int 
(*)[3]` pointers that point to elements of the same array `arr`
- `int y = &(arr[2][1]) - &(arr[2][2])` is valid as the difference of two `int 
*` pointers that point to elements of the same array `arr[2]`
- `int z = &(arr[1][1]) - &(arr[2][2])` is invalid, because one pointer points 
to an element of `arr[1]`, while the other points to element of `arr[2]`
- `int w = arr[2] - arr[1]` is also invalid, because these arrays decay to `int 
*` pointer and this difference is equivalent to `&(arr[2][0]) - &(arr[1][0])`

(6) "_If an address (of a variable) is converted to a char * is this the same 
array object as the original variable (for example l)?_" -- My intuition is 
that here we're dealing with an imagined `char[]` array object that "covers" 
the same memory region as the original variable, and any pointer trickery that 
start from the original object and produce `char *` pointers pointing into the 
original object essentially produces pointers to elements of this imagined 
`char[]` array. (We might say that this imagined `char[]` array is the [_object 
representation_](http://eel.is/c++draft/basic.types.general#4) of the array, 
although that's probably not exactly accurate.)

(7)
> Invalid indexing like `int a[3][3]; int x = a[0][4];` is undefined behavior  
> but why should then be allowed to use pointers to such an object and index it 
> like an one-dimensional array, [...]

I'm not exactly sure what you're speaking about here. I completely agree that 
`int a[3][3]; int x = a[0][4];` is undefined behavior, but there may be 
multiple ways to convert the multidimensional array into a single-dimensional 
one and some of them might be legitimate.

(8)
> or convert an array pointer into an array with different type and index it

This seems to be forbidden by [[expr.add] part 
6](http://eel.is/c++draft/expr.add#6). Note that the [definition of the 
subscript operator](eel.is/c++draft/expr.post#expr.sub) is based on pointer 
addition.

(9)
> We should allow anything that points into the same memory block (that is a 
> variable or any array), or only allow pointers to the same array with same 
> type and same size.
I don't understand this sentence; but note that [[expr.add] part 
2.2](http://eel.is/c++draft/expr.add#2.2) demands that in pointer subtraction 
the two operands must be pointers to (cv-qualified or cv-unqualified versions 
of) the same completely-defined object type.

https://github.com/llvm/llvm-project/pull/93676
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to