[PATCH] D110927: [analyzer] Access stored value of a constant array through a pointer to another type

2021-11-23 Thread Denys Petrov via Phabricator via cfe-commits
ASDenysPetrov abandoned this revision.
ASDenysPetrov added a comment.

Temporary suspended this revision in favor of making a new checker 
//StrictAliasingChecker//, which would define an access to values through 
unpermited types as Unefined Behavior according to certain statements of the 
current Standard.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110927/new/

https://reviews.llvm.org/D110927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D110927: [analyzer] Access stored value of a constant array through a pointer to another type

2021-11-17 Thread Denys Petrov via Phabricator via cfe-commits
ASDenysPetrov added a comment.

@steakhal Please, read the discussion started from here D104285#2943449 
. It's directly relates to this patch 
and what we've been arguing about.

I'm still hesitating about this patch.
On one hand we have the fact that almost all compilers ignore some Standard's 
paragraphs about UB in terms of casts.
E.g. they consider next snippets as OK, but they are NOT according to the 
Standard :

- `int arr[1][2][3]; int *ptr = (int*)arr; ptr[4] = 42;` // **int(*)[2][3]** 
can't be aliased by **int***
- `int i; signed char *ptr = (signed char*) ptr[2] = 42;` // **int*** can't 
be aliased by **signed char***
- `int arr[3][3]; int (*ptr)[8] = (int(*)[6])arr; ptr[1][1] = 42;` // `ptr[1]` 
can't go further then aliased by **int(*)[6]**

I've checked all those examples on Godbolt
On the other hand introducing this patch will show unexpected warnings to users 
which they can't reproduce in a real life.

I can't choose the way to act.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110927/new/

https://reviews.llvm.org/D110927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D110927: [analyzer] Access stored value of a constant array through a pointer to another type

2021-11-17 Thread Denys Petrov via Phabricator via cfe-commits
ASDenysPetrov added a comment.

@NoQ

> If we ever prove that strict aliasing is violated on a given execution path 
> (while being enabled), the ideal thing to do is to terminate the analysis 
> immediately by generating a sink. We can then optionally develop a checker 
> that emits a warning in such cases.

thinking about warnings I assume that the Store will produce `UndefinedVals` 
and //Undef-related// checkers will catch them. But yes, you're right. They 
wouldn't know anything about the origin of such `UndefinedVals`.

> For the cases where you eliminate possibilities through recognizing strict 
> aliasing, I wonder if a note can be added to the bug report to notify the 
> user that the strict aliasing rule was invoked to add a certain assumption.

I didn't elaborate an idea with a checker yet but, I think, it can be the next 
step. What about a mechanism which would store the reason of UndefinedVal for 
existing checkers? It could make any checker more detailed and flexible.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110927/new/

https://reviews.llvm.org/D110927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D110927: [analyzer] Access stored value of a constant array through a pointer to another type

2021-11-16 Thread Artem Dergachev via Phabricator via cfe-commits
NoQ added a comment.

Taking advantage of strict aliasing is good as long as it produces strictly 
smaller analysis space (less paths, more constrained states). I.e., we can use 
it for eliminating possibilities, but not for discovering possibilities.

If we ever prove that strict aliasing is violated on a given execution path 
(while being enabled), the ideal thing to do is to terminate the analysis 
immediately by generating a sink. We can then optionally develop a checker that 
emits a warning in such cases.

For the cases where you eliminate possibilities through recognizing strict 
aliasing, I wonder if a note can be added to the bug report to notify the user 
that the strict aliasing rule was invoked to add a certain assumption.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110927/new/

https://reviews.llvm.org/D110927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D110927: [analyzer] Access stored value of a constant array through a pointer to another type

2021-11-12 Thread Balázs Benics via Phabricator via cfe-commits
steakhal added inline comments.



Comment at: clang/test/Analysis/initialization.cpp:295-299
+void glob_cast_opposite_sign1() {
+  auto *ptr = (unsigned int *)glob_arr2;
+  auto x1 = ptr[0]; // no-warning
+  auto x2 = ptr[1]; // expected-warning{{garbage or undefined}}
+}

ASDenysPetrov wrote:
> steakhal wrote:
> > ASDenysPetrov wrote:
> > > steakhal wrote:
> > > > ASDenysPetrov wrote:
> > > > > steakhal wrote:
> > > > > > I think it's not correct.
> > > > > > 
> > > > > > `glob_arr2` refers to an object of dynamic type `int[2]`.
> > > > > > And the pointer decayed from it is `int *`, which has //similar 
> > > > > > type// to `unsigned *` what you are using to access the memory.
> > > > > > Since they are //similar//, this operation should work for all the 
> > > > > > valid indices, thus `ptr[0]`, `ptr[1]`, `ptr[2]`, `ptr[3]` should 
> > > > > > all be valid.
> > > > > > 
> > > > > > 
> > > > > > glob_arr2 refers to an object of dynamic type int[2].
> > > > > `glob_arr2` has an extent of 4.
> > > > > > And the pointer decayed from it is int *, which has similar type to 
> > > > > > unsigned * what you are using to access the memory.
> > > > > Yes, they are the same and it perfectly suits to 
> > > > > http://eel.is/c++draft/basic.lval#11 . But still you can't access 
> > > > > other element then the first one according to 
> > > > > http://eel.is/c++draft/basic.compound#3 : //For purposes of pointer 
> > > > > arithmetic ([expr.add]) and comparison ([expr.rel], [expr.eq]), [...] 
> > > > > an object of type T that is not an array element is considered to 
> > > > > belong to an array with one element of type T. //
> > > > > `unsigned int` and `int` are different types according to 
> > > > > http://eel.is/c++draft/basic.fundamental#2 . The object of type 
> > > > > `unsigned int` is NOT an array, beacuse there is no array of type 
> > > > > `unsigned int`. Hence you can only only access the first and a single 
> > > > > element of type `unsigned int`.
> > > > > 
> > > > Yes, `glob_arr` has 4 elements, sorry for this typo.
> > > > 
> > > > ---
> > > > I disagree with the second part though. It seems like gcc disagrees as 
> > > > well: https://godbolt.org/z/5o7ozvPar
> > > > ```lang=C++
> > > > auto foo(unsigned ()[4], int ()[4]) {
> > > > p[0] = 2;
> > > > q[0] = 1;
> > > > return p[0]; // memory read! thus, p and q can alias according to 
> > > > g++.
> > > > }
> > > > ```
> > > > `gcc` still thinks that `p` and `q` can refer to the same memory region 
> > > > even if the `-fstrict-aliasing` flag is present.
> > > > In other circumstances, it would produce a `mov eax, 2` instead of a 
> > > > memory load if `p` and `q` cannot alias.
> > > >I disagree with the second part though. It seems like gcc disagrees as 
> > > >well: https://godbolt.org/z/5o7ozvPar
> > > I see how all the compilers handle this. I've switched several vendors on 
> > > Godbolt. They all have the similar behavior. You are right from compiler 
> > > perspective, but it's not quite the same in terms of C++ abstract machine.
> > > Your example is correct, it works OK and is consistent with the Standard:
> > > ```
> > > p[0] = 2;
> > > q[0] = 1;
> > > ```
> > > This one also works as expected but goes against the Standard:
> > > ```
> > > p[1] = 2;
> > > q[1] = 1;
> > > ```
> > > I want you watch this particular part about access via aliased pointers: 
> > > https://youtu.be/_qzMpk-22cc?t=2623 For me it seems correct, at least I 
> > > can't argue against of it now.
> > But in this example we have an array, thus pointer arithmetic should be 
> > fine according to the video.
> > Could you find the mentioned email discussion? I would be really curious.
> > BTW from the analyzer user's perspective, it would be 'better' to find 
> > strict aliasing violations which actually matter - now, and risking 
> > miscompilations.
> I'm not enough confident about that to debate now. As you can see we started 
> arguing as well. That means that the Standard really leaves the subject for 
> misinterpretation.
> OK, let's bring it to the form in which compilers operate. I mean, let's 
> legitimate indirections with a different offsets through aliased types: `auto 
> x = ((unsigened)arr)[42]; // OK`.
> I think it will be enough for this patch for now.
So, we agree that `glob_cast_opposite_sign1()` should have no warnings. At 
least for now. I would be okay with this direction.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110927/new/

https://reviews.llvm.org/D110927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D110927: [analyzer] Access stored value of a constant array through a pointer to another type

2021-11-12 Thread Denys Petrov via Phabricator via cfe-commits
ASDenysPetrov added inline comments.



Comment at: clang/test/Analysis/initialization.cpp:295-299
+void glob_cast_opposite_sign1() {
+  auto *ptr = (unsigned int *)glob_arr2;
+  auto x1 = ptr[0]; // no-warning
+  auto x2 = ptr[1]; // expected-warning{{garbage or undefined}}
+}

steakhal wrote:
> ASDenysPetrov wrote:
> > steakhal wrote:
> > > ASDenysPetrov wrote:
> > > > steakhal wrote:
> > > > > I think it's not correct.
> > > > > 
> > > > > `glob_arr2` refers to an object of dynamic type `int[2]`.
> > > > > And the pointer decayed from it is `int *`, which has //similar 
> > > > > type// to `unsigned *` what you are using to access the memory.
> > > > > Since they are //similar//, this operation should work for all the 
> > > > > valid indices, thus `ptr[0]`, `ptr[1]`, `ptr[2]`, `ptr[3]` should all 
> > > > > be valid.
> > > > > 
> > > > > 
> > > > > glob_arr2 refers to an object of dynamic type int[2].
> > > > `glob_arr2` has an extent of 4.
> > > > > And the pointer decayed from it is int *, which has similar type to 
> > > > > unsigned * what you are using to access the memory.
> > > > Yes, they are the same and it perfectly suits to 
> > > > http://eel.is/c++draft/basic.lval#11 . But still you can't access other 
> > > > element then the first one according to 
> > > > http://eel.is/c++draft/basic.compound#3 : //For purposes of pointer 
> > > > arithmetic ([expr.add]) and comparison ([expr.rel], [expr.eq]), [...] 
> > > > an object of type T that is not an array element is considered to 
> > > > belong to an array with one element of type T. //
> > > > `unsigned int` and `int` are different types according to 
> > > > http://eel.is/c++draft/basic.fundamental#2 . The object of type 
> > > > `unsigned int` is NOT an array, beacuse there is no array of type 
> > > > `unsigned int`. Hence you can only only access the first and a single 
> > > > element of type `unsigned int`.
> > > > 
> > > Yes, `glob_arr` has 4 elements, sorry for this typo.
> > > 
> > > ---
> > > I disagree with the second part though. It seems like gcc disagrees as 
> > > well: https://godbolt.org/z/5o7ozvPar
> > > ```lang=C++
> > > auto foo(unsigned ()[4], int ()[4]) {
> > > p[0] = 2;
> > > q[0] = 1;
> > > return p[0]; // memory read! thus, p and q can alias according to g++.
> > > }
> > > ```
> > > `gcc` still thinks that `p` and `q` can refer to the same memory region 
> > > even if the `-fstrict-aliasing` flag is present.
> > > In other circumstances, it would produce a `mov eax, 2` instead of a 
> > > memory load if `p` and `q` cannot alias.
> > >I disagree with the second part though. It seems like gcc disagrees as 
> > >well: https://godbolt.org/z/5o7ozvPar
> > I see how all the compilers handle this. I've switched several vendors on 
> > Godbolt. They all have the similar behavior. You are right from compiler 
> > perspective, but it's not quite the same in terms of C++ abstract machine.
> > Your example is correct, it works OK and is consistent with the Standard:
> > ```
> > p[0] = 2;
> > q[0] = 1;
> > ```
> > This one also works as expected but goes against the Standard:
> > ```
> > p[1] = 2;
> > q[1] = 1;
> > ```
> > I want you watch this particular part about access via aliased pointers: 
> > https://youtu.be/_qzMpk-22cc?t=2623 For me it seems correct, at least I 
> > can't argue against of it now.
> But in this example we have an array, thus pointer arithmetic should be fine 
> according to the video.
> Could you find the mentioned email discussion? I would be really curious.
> BTW from the analyzer user's perspective, it would be 'better' to find strict 
> aliasing violations which actually matter - now, and risking miscompilations.
I'm not enough confident about that to debate now. As you can see we started 
arguing as well. That means that the Standard really leaves the subject for 
misinterpretation.
OK, let's bring it to the form in which compilers operate. I mean, let's 
legitimate indirections with a different offsets through aliased types: `auto x 
= ((unsigened)arr)[42]; // OK`.
I think it will be enough for this patch for now.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110927/new/

https://reviews.llvm.org/D110927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D110927: [analyzer] Access stored value of a constant array through a pointer to another type

2021-11-11 Thread Balázs Benics via Phabricator via cfe-commits
steakhal added inline comments.



Comment at: clang/test/Analysis/initialization.cpp:295-299
+void glob_cast_opposite_sign1() {
+  auto *ptr = (unsigned int *)glob_arr2;
+  auto x1 = ptr[0]; // no-warning
+  auto x2 = ptr[1]; // expected-warning{{garbage or undefined}}
+}

ASDenysPetrov wrote:
> steakhal wrote:
> > ASDenysPetrov wrote:
> > > steakhal wrote:
> > > > I think it's not correct.
> > > > 
> > > > `glob_arr2` refers to an object of dynamic type `int[2]`.
> > > > And the pointer decayed from it is `int *`, which has //similar type// 
> > > > to `unsigned *` what you are using to access the memory.
> > > > Since they are //similar//, this operation should work for all the 
> > > > valid indices, thus `ptr[0]`, `ptr[1]`, `ptr[2]`, `ptr[3]` should all 
> > > > be valid.
> > > > 
> > > > 
> > > > glob_arr2 refers to an object of dynamic type int[2].
> > > `glob_arr2` has an extent of 4.
> > > > And the pointer decayed from it is int *, which has similar type to 
> > > > unsigned * what you are using to access the memory.
> > > Yes, they are the same and it perfectly suits to 
> > > http://eel.is/c++draft/basic.lval#11 . But still you can't access other 
> > > element then the first one according to 
> > > http://eel.is/c++draft/basic.compound#3 : //For purposes of pointer 
> > > arithmetic ([expr.add]) and comparison ([expr.rel], [expr.eq]), [...] an 
> > > object of type T that is not an array element is considered to belong to 
> > > an array with one element of type T. //
> > > `unsigned int` and `int` are different types according to 
> > > http://eel.is/c++draft/basic.fundamental#2 . The object of type `unsigned 
> > > int` is NOT an array, beacuse there is no array of type `unsigned int`. 
> > > Hence you can only only access the first and a single element of type 
> > > `unsigned int`.
> > > 
> > Yes, `glob_arr` has 4 elements, sorry for this typo.
> > 
> > ---
> > I disagree with the second part though. It seems like gcc disagrees as 
> > well: https://godbolt.org/z/5o7ozvPar
> > ```lang=C++
> > auto foo(unsigned ()[4], int ()[4]) {
> > p[0] = 2;
> > q[0] = 1;
> > return p[0]; // memory read! thus, p and q can alias according to g++.
> > }
> > ```
> > `gcc` still thinks that `p` and `q` can refer to the same memory region 
> > even if the `-fstrict-aliasing` flag is present.
> > In other circumstances, it would produce a `mov eax, 2` instead of a memory 
> > load if `p` and `q` cannot alias.
> >I disagree with the second part though. It seems like gcc disagrees as well: 
> >https://godbolt.org/z/5o7ozvPar
> I see how all the compilers handle this. I've switched several vendors on 
> Godbolt. They all have the similar behavior. You are right from compiler 
> perspective, but it's not quite the same in terms of C++ abstract machine.
> Your example is correct, it works OK and is consistent with the Standard:
> ```
> p[0] = 2;
> q[0] = 1;
> ```
> This one also works as expected but goes against the Standard:
> ```
> p[1] = 2;
> q[1] = 1;
> ```
> I want you watch this particular part about access via aliased pointers: 
> https://youtu.be/_qzMpk-22cc?t=2623 For me it seems correct, at least I can't 
> argue against of it now.
But in this example we have an array, thus pointer arithmetic should be fine 
according to the video.
Could you find the mentioned email discussion? I would be really curious.
BTW from the analyzer user's perspective, it would be 'better' to find strict 
aliasing violations which actually matter - now, and risking miscompilations.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110927/new/

https://reviews.llvm.org/D110927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D110927: [analyzer] Access stored value of a constant array through a pointer to another type

2021-11-11 Thread Denys Petrov via Phabricator via cfe-commits
ASDenysPetrov added a comment.

@steakhal

> I think I know.

Great! Thank you!




Comment at: clang/test/Analysis/initialization.cpp:295-299
+void glob_cast_opposite_sign1() {
+  auto *ptr = (unsigned int *)glob_arr2;
+  auto x1 = ptr[0]; // no-warning
+  auto x2 = ptr[1]; // expected-warning{{garbage or undefined}}
+}

steakhal wrote:
> ASDenysPetrov wrote:
> > steakhal wrote:
> > > I think it's not correct.
> > > 
> > > `glob_arr2` refers to an object of dynamic type `int[2]`.
> > > And the pointer decayed from it is `int *`, which has //similar type// to 
> > > `unsigned *` what you are using to access the memory.
> > > Since they are //similar//, this operation should work for all the valid 
> > > indices, thus `ptr[0]`, `ptr[1]`, `ptr[2]`, `ptr[3]` should all be valid.
> > > 
> > > 
> > > glob_arr2 refers to an object of dynamic type int[2].
> > `glob_arr2` has an extent of 4.
> > > And the pointer decayed from it is int *, which has similar type to 
> > > unsigned * what you are using to access the memory.
> > Yes, they are the same and it perfectly suits to 
> > http://eel.is/c++draft/basic.lval#11 . But still you can't access other 
> > element then the first one according to 
> > http://eel.is/c++draft/basic.compound#3 : //For purposes of pointer 
> > arithmetic ([expr.add]) and comparison ([expr.rel], [expr.eq]), [...] an 
> > object of type T that is not an array element is considered to belong to an 
> > array with one element of type T. //
> > `unsigned int` and `int` are different types according to 
> > http://eel.is/c++draft/basic.fundamental#2 . The object of type `unsigned 
> > int` is NOT an array, beacuse there is no array of type `unsigned int`. 
> > Hence you can only only access the first and a single element of type 
> > `unsigned int`.
> > 
> Yes, `glob_arr` has 4 elements, sorry for this typo.
> 
> ---
> I disagree with the second part though. It seems like gcc disagrees as well: 
> https://godbolt.org/z/5o7ozvPar
> ```lang=C++
> auto foo(unsigned ()[4], int ()[4]) {
> p[0] = 2;
> q[0] = 1;
> return p[0]; // memory read! thus, p and q can alias according to g++.
> }
> ```
> `gcc` still thinks that `p` and `q` can refer to the same memory region even 
> if the `-fstrict-aliasing` flag is present.
> In other circumstances, it would produce a `mov eax, 2` instead of a memory 
> load if `p` and `q` cannot alias.
>I disagree with the second part though. It seems like gcc disagrees as well: 
>https://godbolt.org/z/5o7ozvPar
I see how all the compilers handle this. I've switched several vendors on 
Godbolt. They all have the similar behavior. You are right from compiler 
perspective, but it's not quite the same in terms of C++ abstract machine.
Your example is correct, it works OK and is consistent with the Standard:
```
p[0] = 2;
q[0] = 1;
```
This one also works as expected but goes against the Standard:
```
p[1] = 2;
q[1] = 1;
```
I want you watch this particular part about access via aliased pointers: 
https://youtu.be/_qzMpk-22cc?t=2623 For me it seems correct, at least I can't 
argue against of it now.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110927/new/

https://reviews.llvm.org/D110927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D110927: [analyzer] Access stored value of a constant array through a pointer to another type

2021-11-10 Thread Balázs Benics via Phabricator via cfe-commits
steakhal added a comment.

In D110927#3118936 , @ASDenysPetrov 
wrote:

>> You could have a parameter, and take its address to accomplish your 
>> reinterpret casts and type puns.
>
> Do you mean: ...
> If so, IMO it doesn't matter.

I see. Sorry about the confusion I caused.

In D110927#3115795 , @ASDenysPetrov 
wrote:

> Does anyone know how to check the status of`-fno-strict-aliasing` flag from 
> CSA side?

I think I know.
Within the `AnalysisConsumer::AnalysisConsumer()` you have access to the 
`CompilerInstance` object. Using that you can acquire the 
`CI.getCodeGenOpts().RelaxedAliasing` bool member, which represents exactly 
what we need, according to the 
`clang/include/clang/Basic/CodeGenOptions.def:211`:

  CODEGENOPT(RelaxedAliasing   , 1, 0) ///< Set when -fno-strict-aliasing is 
enabled.

I think you should simply save a `const` reference to the `CodeGenOptions` 
object from the `AnalysisConsumer`.




Comment at: clang/test/Analysis/initialization.cpp:295-299
+void glob_cast_opposite_sign1() {
+  auto *ptr = (unsigned int *)glob_arr2;
+  auto x1 = ptr[0]; // no-warning
+  auto x2 = ptr[1]; // expected-warning{{garbage or undefined}}
+}

ASDenysPetrov wrote:
> steakhal wrote:
> > I think it's not correct.
> > 
> > `glob_arr2` refers to an object of dynamic type `int[2]`.
> > And the pointer decayed from it is `int *`, which has //similar type// to 
> > `unsigned *` what you are using to access the memory.
> > Since they are //similar//, this operation should work for all the valid 
> > indices, thus `ptr[0]`, `ptr[1]`, `ptr[2]`, `ptr[3]` should all be valid.
> > 
> > 
> > glob_arr2 refers to an object of dynamic type int[2].
> `glob_arr2` has an extent of 4.
> > And the pointer decayed from it is int *, which has similar type to 
> > unsigned * what you are using to access the memory.
> Yes, they are the same and it perfectly suits to 
> http://eel.is/c++draft/basic.lval#11 . But still you can't access other 
> element then the first one according to 
> http://eel.is/c++draft/basic.compound#3 : //For purposes of pointer 
> arithmetic ([expr.add]) and comparison ([expr.rel], [expr.eq]), [...] an 
> object of type T that is not an array element is considered to belong to an 
> array with one element of type T. //
> `unsigned int` and `int` are different types according to 
> http://eel.is/c++draft/basic.fundamental#2 . The object of type `unsigned 
> int` is NOT an array, beacuse there is no array of type `unsigned int`. Hence 
> you can only only access the first and a single element of type `unsigned 
> int`.
> 
Yes, `glob_arr` has 4 elements, sorry for this typo.

---
I disagree with the second part though. It seems like gcc disagrees as well: 
https://godbolt.org/z/5o7ozvPar
```lang=C++
auto foo(unsigned ()[4], int ()[4]) {
p[0] = 2;
q[0] = 1;
return p[0]; // memory read! thus, p and q can alias according to g++.
}
```
`gcc` still thinks that `p` and `q` can refer to the same memory region even if 
the `-fstrict-aliasing` flag is present.
In other circumstances, it would produce a `mov eax, 2` instead of a memory 
load if `p` and `q` cannot alias.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110927/new/

https://reviews.llvm.org/D110927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D110927: [analyzer] Access stored value of a constant array through a pointer to another type

2021-11-09 Thread Denys Petrov via Phabricator via cfe-commits
ASDenysPetrov added a comment.

In D110927#3117728 , @steakhal wrote:

> For testing this I would recommend using a separate test file.
> That being said, you should avoid globals even in tests when you can. The 
> distance between its declaration and use just makes it harder to comprehend 
> and reason about.

I'll add a new tests file.

> You could have a parameter, and take its address to accomplish your 
> reinterpret casts and type puns.

Do you mean:

  void foo(signed char * ptr) {
ptr = (signed char *)glob_arr2;
  }

instead of

  void foo() {
auto *ptr = (signed char *)glob_arr2;
  }

?
If so, IMO it doesn't matter.

> BTW your patch crashes on the test suite:
> `initialization.cpp:242`:

I caught it. Thanks! I wonder how it slipped through unnoticed.




Comment at: clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1661
+/// E.g. for `const int[1][2][3];` returns `int`.
+QualType getConstantArrayRootElement(const ConstantArrayType *CAT) {
+  assert(CAT && "ConstantArrayType should not be null");

steakhal wrote:
> Maybe //unwrapConstantArrayTypes()//?
I think about //getConstantArrayRoot**Type**//. IMO such name distinctly tells 
its intention and purpose.



Comment at: clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1790
+  // paper P1839R0 be considered by the committee.
+  if ((Index != 0))
+return false;

steakhal wrote:
> Even though you are technically correct, how can you know that the pointer 
> you try to dereference actually points to the beginning of the object?
> Consider something like this:
> 
> ```lang=C++
> void callback(void *payload) {
>   // lets skip the first char object...
>   int *num = (int*)((char*)payload + 1);
>   clang_analyzer_dump(num); // Element{Element{SymRegion{payload}, 1, char}, 
> 0, int}
> }
> ```
>  how can you know that the pointer you try to dereference actually points to 
> the beginning of the object?
We look at the offset of the region. In your example it is //1//. And it is UB. 
Unfortinatelly the Standard forbids to do such accesses due to the different 
strict access rules. I recommend this talk https://youtu.be/_qzMpk-22cc . I 
took inspiration from there.



Comment at: clang/test/Analysis/initialization.cpp:295-299
+void glob_cast_opposite_sign1() {
+  auto *ptr = (unsigned int *)glob_arr2;
+  auto x1 = ptr[0]; // no-warning
+  auto x2 = ptr[1]; // expected-warning{{garbage or undefined}}
+}

steakhal wrote:
> I think it's not correct.
> 
> `glob_arr2` refers to an object of dynamic type `int[2]`.
> And the pointer decayed from it is `int *`, which has //similar type// to 
> `unsigned *` what you are using to access the memory.
> Since they are //similar//, this operation should work for all the valid 
> indices, thus `ptr[0]`, `ptr[1]`, `ptr[2]`, `ptr[3]` should all be valid.
> 
> 
> glob_arr2 refers to an object of dynamic type int[2].
`glob_arr2` has an extent of 4.
> And the pointer decayed from it is int *, which has similar type to unsigned 
> * what you are using to access the memory.
Yes, they are the same and it perfectly suits to 
http://eel.is/c++draft/basic.lval#11 . But still you can't access other element 
then the first one according to http://eel.is/c++draft/basic.compound#3 : //For 
purposes of pointer arithmetic ([expr.add]) and comparison ([expr.rel], 
[expr.eq]), [...] an object of type T that is not an array element is 
considered to belong to an array with one element of type T. //
`unsigned int` and `int` are different types according to 
http://eel.is/c++draft/basic.fundamental#2 . The object of type `unsigned int` 
is NOT an array, beacuse there is no array of type `unsigned int`. Hence you 
can only only access the first and a single element of type `unsigned int`.




Comment at: clang/test/Analysis/initialization.cpp:311-314
+void glob_cast_invalid3() {
+  auto *ptr = (char32_t *)glob_arr2;
+  auto x = ptr[0]; // expected-warning{{garbage or undefined}}
+}

steakhal wrote:
> Please also try `char8_t`.
Correct. We should check it. It should be a different type as well 
(http://eel.is/c++draft/basic.fundamental#9).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110927/new/

https://reviews.llvm.org/D110927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D110927: [analyzer] Access stored value of a constant array through a pointer to another type

2021-11-09 Thread Balázs Benics via Phabricator via cfe-commits
steakhal added a comment.

For testing this I would recommend using a separate test file.
That being said, you should avoid globals even in tests when you can. The 
distance between its declaration and use just makes it harder to comprehend and 
reason about.
You could have a parameter, and take its address to accomplish your reinterpret 
casts and type puns.

BTW your patch crashes on the test suite:
`initialization.cpp:242`:

  void glob_array_typedef1() {
clang_analyzer_eval(glob_arr8[0][0] == 1); // <-- crashes here
// ...
  }

  clang: ../../clang/lib/AST/ASTContext.cpp:10230: clang::QualType 
clang::ASTContext::getCorrespondingUnsignedType(clang::QualType) const: 
Assertion `(T->hasSignedIntegerRepresentation() || T->isSignedFixedPointType()) 
&& "Unexpected type"' failed.
  Called by clang::ASTContext::getCorrespondingUnsignedType() from 
RegionStoreManager::canAccessStoredValue()




Comment at: clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1661
+/// E.g. for `const int[1][2][3];` returns `int`.
+QualType getConstantArrayRootElement(const ConstantArrayType *CAT) {
+  assert(CAT && "ConstantArrayType should not be null");

Maybe //unwrapConstantArrayTypes()//?



Comment at: clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1790
+  // paper P1839R0 be considered by the committee.
+  if ((Index != 0))
+return false;

Even though you are technically correct, how can you know that the pointer you 
try to dereference actually points to the beginning of the object?
Consider something like this:

```lang=C++
void callback(void *payload) {
  // lets skip the first char object...
  int *num = (int*)((char*)payload + 1);
  clang_analyzer_dump(num); // Element{Element{SymRegion{payload}, 1, char}, 0, 
int}
}
```



Comment at: clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1794-1795
+  // - is char, uchar, std::byte
+  if ((ThroughT == Ctx.CharTy) || (ThroughT == Ctx.UnsignedCharTy) ||
+  ThroughT->isStdByteType())
+return true;





Comment at: clang/test/Analysis/initialization.cpp:295-299
+void glob_cast_opposite_sign1() {
+  auto *ptr = (unsigned int *)glob_arr2;
+  auto x1 = ptr[0]; // no-warning
+  auto x2 = ptr[1]; // expected-warning{{garbage or undefined}}
+}

I think it's not correct.

`glob_arr2` refers to an object of dynamic type `int[2]`.
And the pointer decayed from it is `int *`, which has //similar type// to 
`unsigned *` what you are using to access the memory.
Since they are //similar//, this operation should work for all the valid 
indices, thus `ptr[0]`, `ptr[1]`, `ptr[2]`, `ptr[3]` should all be valid.





Comment at: clang/test/Analysis/initialization.cpp:311-314
+void glob_cast_invalid3() {
+  auto *ptr = (char32_t *)glob_arr2;
+  auto x = ptr[0]; // expected-warning{{garbage or undefined}}
+}

Please also try `char8_t`.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110927/new/

https://reviews.llvm.org/D110927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D110927: [analyzer] Access stored value of a constant array through a pointer to another type

2021-11-08 Thread Denys Petrov via Phabricator via cfe-commits
ASDenysPetrov added a comment.

Ping.
Does anyone know how to check the status of`-fno-strict-aliasing` flag from CSA 
side?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110927/new/

https://reviews.llvm.org/D110927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D110927: [analyzer] Access stored value of a constant array through a pointer to another type

2021-11-01 Thread Denys Petrov via Phabricator via cfe-commits
ASDenysPetrov updated this revision to Diff 383812.
ASDenysPetrov added a comment.

Updated according to comments.
TODO: make the feature `-fno-strict-aliasing` dependent.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110927/new/

https://reviews.llvm.org/D110927

Files:
  clang/lib/StaticAnalyzer/Core/RegionStore.cpp
  clang/test/Analysis/initialization.cpp

Index: clang/test/Analysis/initialization.cpp
===
--- clang/test/Analysis/initialization.cpp
+++ clang/test/Analysis/initialization.cpp
@@ -4,6 +4,10 @@
 void clang_analyzer_dump(T x);
 void clang_analyzer_eval(int);
 
+namespace std {
+enum class byte : unsigned char {};
+};
+
 struct S {
   int a = 3;
 };
@@ -256,3 +260,55 @@
   clang_analyzer_eval(glob_arr9[1][2] == 7); // expected-warning{{TRUE}}
   clang_analyzer_eval(glob_arr9[1][3] == 0); // expected-warning{{TRUE}}
 }
+
+void glob_cast_same1() {
+  auto *ptr = (int *)glob_arr2;
+  auto x1 = ptr[0]; // no-warning
+  auto x2 = ptr[1]; // no-warning
+}
+
+void glob_cast_char1() {
+  const auto *ptr = (char *)glob_arr2; // 1-dim array to char*
+  auto x1 = ptr[0];// no-warning
+  auto x2 = ptr[1];// expected-warning{{garbage or undefined}}
+}
+
+void glob_cast_char2() {
+  const auto *ptr = (char *)glob_arr5; // 2-dim array to char*
+  auto x1 = ptr[0];// no-warning
+  // FIXME: Should warn {{garbage or undefined}}.
+  auto x2 = ptr[1]; // no-warning
+}
+
+void glob_cast_uchar1() {
+  auto *ptr = (unsigned char *)glob_arr2;
+  auto x1 = ptr[0]; // no-warning
+  auto x2 = ptr[1]; // expected-warning{{garbage or undefined}}
+}
+
+void glob_cast_byte1() {
+  auto *ptr = (const std::byte *)glob_arr2;
+  auto x1 = ptr[0]; // no-warning
+  auto x2 = ptr[1]; // expected-warning{{garbage or undefined}}
+}
+
+void glob_cast_opposite_sign1() {
+  auto *ptr = (unsigned int *)glob_arr2;
+  auto x1 = ptr[0]; // no-warning
+  auto x2 = ptr[1]; // expected-warning{{garbage or undefined}}
+}
+
+void glob_cast_invalid1() {
+  auto *ptr = (signed char *)glob_arr2;
+  auto x = ptr[0]; // expected-warning{{garbage or undefined}}
+}
+
+void glob_cast_invalid2() {
+  using T = short *;
+  auto x = ((T)glob_arr2)[0]; // expected-warning{{garbage or undefined}}
+}
+
+void glob_cast_invalid3() {
+  auto *ptr = (char32_t *)glob_arr2;
+  auto x = ptr[0]; // expected-warning{{garbage or undefined}}
+}
Index: clang/lib/StaticAnalyzer/Core/RegionStore.cpp
===
--- clang/lib/StaticAnalyzer/Core/RegionStore.cpp
+++ clang/lib/StaticAnalyzer/Core/RegionStore.cpp
@@ -446,6 +446,8 @@
   QualType ElemT);
   SVal getSValFromStringLiteral(const StringLiteral *SL, uint64_t Offset,
 QualType ElemT);
+  bool canAccessStoredValue(QualType OrigT, QualType ThroughT,
+uint64_t Index) const;
 
 public: // Part of public interface to class.
 
@@ -1651,6 +1653,20 @@
   return Extents;
 }
 
+/// This is a helper function for `getConstantValFromConstArrayInitializer`.
+///
+/// Return a root type of the n-dimensional array.
+///
+/// E.g. for `const int[1][2][3];` returns `int`.
+QualType getConstantArrayRootElement(const ConstantArrayType *CAT) {
+  assert(CAT && "ConstantArrayType should not be null");
+  while (const auto *DummyCAT =
+ dyn_cast(CAT->getElementType())) {
+CAT = DummyCAT;
+  }
+  return CAT->getElementType();
+}
+
 /// This is a helper function for `getConstantValFromConstArrayInitializer`.
 ///
 /// Return an array of offsets from nested ElementRegions. The array is never
@@ -1736,6 +1752,60 @@
   return None;
 }
 
+/// Returns true if the stored value can be accessed through the pointer to
+/// another type.
+///
+/// C++20 7.2.1.11 [basic.lval] (excerpt):
+///  A program can access the stored value of an object through:
+///  - the same type of the object;
+///  - a signed or unsigned type corresponding to the type of the
+///object;
+///  - a char, unsigned char, std::byte. (NOTE:
+///  Otherwise, the behavior is undefined.
+///
+/// Example:
+///  const int arr[42] = {};
+///  auto* pchar = (char*)arr;
+///  auto* punsigned = (unsigned int*)arr;
+///  auto* pshort= (short*)arr;
+///  auto x1 = pchar[0]; // valid
+///  auto x2 = pchar[1]; // UB
+///  auto x3 = punsigned[0]; // valid
+///  auto x4 = pshort[0];// UB
+bool RegionStoreManager::canAccessStoredValue(QualType OrigT, QualType ThroughT,
+  uint64_t Index) const {
+  // Remove cv-qualifiers.
+  OrigT = OrigT->getCanonicalTypeUnqualified();
+  ThroughT = ThroughT->getCanonicalTypeUnqualified();
+
+  // - is same
+  if (OrigT == ThroughT)
+return true;
+
+  // NOTE: C++20 6.8.2(3.4) [basic.compound]:
+  //  An object of type T that is not an array element is considered to
+  //  belong to an array with ONE 

[PATCH] D110927: [analyzer] Access stored value of a constant array through a pointer to another type

2021-10-29 Thread Gabor Marton via Phabricator via cfe-commits
martong added a subscriber: vabridgers.
martong added a comment.

Adding @vabridgers as a subscriber, he might be interested in this.




Comment at: clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1675
+   //  belong to an array with one element of type T.
+   // Hence, the first element can be retrieved only. At least untill a
+   // paper P1839R0 be considered by the committee.

typo: `untill` -> `until`



Comment at: clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1758
+  // type.
+  QualType ElemT = Ctx.getCanonicalType(R->getElementType());
+  if (!canAccessStoredValue(ArrT, ElemT, I))

steakhal wrote:
> If you already compute the //canonical type// why do you recompute in the 
> `canAccessStoredValue()`?
> You could simply assert that instead.
He removes the qualifiers there, but getting the canonical type is probably 
redundant here.



Comment at: clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1759-1760
+  QualType ElemT = Ctx.getCanonicalType(R->getElementType());
+  if (!canAccessStoredValue(ArrT, ElemT, I))
+return UndefinedVal();
+

ASDenysPetrov wrote:
> steakhal wrote:
> > Even though I agree with you, I think it would be useful to hide this 
> > behavior behind an analyzer option.
> > There is quite a lot code out in the wild that violate the 
> > //strict-aliasing// rule and they probably pass the `-fno-strict-aliasing` 
> > compiler flag to accommodate this in codegen. AFAIK Linux is one of these 
> > projects for example.
> > So, I think there is a need to opt-out of this and/or bind the behavior to 
> > the presence of the mentioned compiler flag.
> > 
> > By not doing this, the user would get //garbage// value reports all over 
> > the place.
> > @NoQ @martong WDYT?
> > There is quite a lot code out in the wild that violate the strict-aliasing 
> > rule 
> Agree.
> > By not doing this, the user would get garbage value reports all over the 
> > place.
> Definitely.
> Using the flag is a good option. But the question whether to use existing 
> `-fno-strict-aliasing` or introduce a new one?
I think we could simply reuse the existing `-fno-strict-aliasing`.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110927/new/

https://reviews.llvm.org/D110927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D110927: [analyzer] Access stored value of a constant array through a pointer to another type

2021-10-28 Thread Denys Petrov via Phabricator via cfe-commits
ASDenysPetrov added a comment.

@steakhal I'll address all of your remarks. Thanks a lot!




Comment at: clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1630-1641
+/// Returns true if the stored value can be accessed through the pointer to
+/// another type:
+///  const int arr[42] = {};
+///  auto* pchar = (char*)arr;
+///  auto* punsigned = (unsigned int*)arr;
+///  auto* pshort= (short*)arr;
+///  auto x1 = pchar[0]; // valid

steakhal wrote:
> This is basically the //strict-aliasing// rule, we could probably mention 
> this.
> Although, I don't mind the current name.
> You probably know about it, but Shafik made a fancy [[ 
> https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8 | GitHub Gist 
> ]] about it.
> The only thing I missed was checking //type similarity// according to 
> [[https://eel.is/c++draft/basic.lval#11 | basic.lval ]], but I'm not exactly 
> sure if we should check that here.
> You probably know about it, but Shafik made a fancy GitHub Gist about it.
Yes, this is the one of those things which inspired me to take care of aliasing 
as a part of RegionStoreManager improvements.



Comment at: clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1759-1760
+  QualType ElemT = Ctx.getCanonicalType(R->getElementType());
+  if (!canAccessStoredValue(ArrT, ElemT, I))
+return UndefinedVal();
+

steakhal wrote:
> Even though I agree with you, I think it would be useful to hide this 
> behavior behind an analyzer option.
> There is quite a lot code out in the wild that violate the 
> //strict-aliasing// rule and they probably pass the `-fno-strict-aliasing` 
> compiler flag to accommodate this in codegen. AFAIK Linux is one of these 
> projects for example.
> So, I think there is a need to opt-out of this and/or bind the behavior to 
> the presence of the mentioned compiler flag.
> 
> By not doing this, the user would get //garbage// value reports all over the 
> place.
> @NoQ @martong WDYT?
> There is quite a lot code out in the wild that violate the strict-aliasing 
> rule 
Agree.
> By not doing this, the user would get garbage value reports all over the 
> place.
Definitely.
Using the flag is a good option. But the question whether to use existing 
`-fno-strict-aliasing` or introduce a new one?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110927/new/

https://reviews.llvm.org/D110927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D110927: [analyzer] Access stored value of a constant array through a pointer to another type

2021-10-27 Thread Balázs Benics via Phabricator via cfe-commits
steakhal added inline comments.



Comment at: clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1630-1641
+/// Returns true if the stored value can be accessed through the pointer to
+/// another type:
+///  const int arr[42] = {};
+///  auto* pchar = (char*)arr;
+///  auto* punsigned = (unsigned int*)arr;
+///  auto* pshort= (short*)arr;
+///  auto x1 = pchar[0]; // valid

This is basically the //strict-aliasing// rule, we could probably mention this.
Although, I don't mind the current name.
You probably know about it, but Shafik made a fancy [[ 
https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8 | GitHub Gist 
]] about it.
The only thing I missed was checking //type similarity// according to 
[[https://eel.is/c++draft/basic.lval#11 | basic.lval ]], but I'm not exactly 
sure if we should check that here.



Comment at: clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1657-1671
+  (((OrigT == Ctx.CharTy && ThroughT == Ctx.UnsignedCharTy) ||
+(OrigT == Ctx.SignedCharTy && ThroughT == Ctx.UnsignedCharTy) ||
+(OrigT == Ctx.ShortTy && ThroughT == Ctx.UnsignedShortTy) ||
+(OrigT == Ctx.IntTy && ThroughT == Ctx.UnsignedIntTy) ||
+(OrigT == Ctx.LongTy && ThroughT == Ctx.UnsignedLongTy) ||
+(OrigT == Ctx.LongLongTy && ThroughT == Ctx.UnsignedLongLongTy) ||
+(ThroughT == Ctx.CharTy && OrigT == Ctx.UnsignedCharTy) ||

I would rather `Ctx.getCorrespondingUnsignedType()` on both of the types and if 
they compare equal, that would mean that they differed only in signedness.
This is probably more costly, and that function will assert that it's a scalar 
type or something, so it would make sense to check `OrigT == ThroughT` 
separately earlier than this.



Comment at: clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1672-1677
+   // NOTE: C++20 6.8.2(3.4) [basic.compound]:
+   //  An object of type T that is not an array element is considered to
+   //  belong to an array with one element of type T.
+   // Hence, the first element can be retrieved only. At least untill a
+   // paper P1839R0 be considered by the committee.
+   (Index == 0));

You should probably early return on this.



Comment at: clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1758
+  // type.
+  QualType ElemT = Ctx.getCanonicalType(R->getElementType());
+  if (!canAccessStoredValue(ArrT, ElemT, I))

If you already compute the //canonical type// why do you recompute in the 
`canAccessStoredValue()`?
You could simply assert that instead.



Comment at: clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1759-1760
+  QualType ElemT = Ctx.getCanonicalType(R->getElementType());
+  if (!canAccessStoredValue(ArrT, ElemT, I))
+return UndefinedVal();
+

Even though I agree with you, I think it would be useful to hide this behavior 
behind an analyzer option.
There is quite a lot code out in the wild that violate the //strict-aliasing// 
rule and they probably pass the `-fno-strict-aliasing` compiler flag to 
accommodate this in codegen. AFAIK Linux is one of these projects for example.
So, I think there is a need to opt-out of this and/or bind the behavior to the 
presence of the mentioned compiler flag.

By not doing this, the user would get //garbage// value reports all over the 
place.
@NoQ @martong WDYT?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110927/new/

https://reviews.llvm.org/D110927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D110927: [analyzer] Access stored value of a constant array through a pointer to another type

2021-10-04 Thread Denys Petrov via Phabricator via cfe-commits
ASDenysPetrov added a comment.

In D110927#3037118 , @shafik wrote:

> IIUC the object is `const int arr[42]` and the `(char *)arr` is an expression 
> of pointer type and adding `1` to this is valid. The case you refer to in 
> D104285  ended up being a pointer to an 
> array of 2 ints and therefore accessing the third element was out of bounds.

You are right. According to http://eel.is/c++draft/expr.add#4, expression `P + 
I` is valid while `0 ≤ I ≤ n`, UB otherwise. This is valid untill we try to 
dereference it. After that it becomes an UB. The UB's you and me are talking 
about have different origin.

My concern is whether we do it correctly considering that dereferencing of type 
**T** through other types are UB in certain cases. Namely, 
http://eel.is/c++draft/basic.lval#11 and 
http://eel.is/c++draft/basic.compound#3.4 paragraphs tell us:

  int arr[42];
  // same type
  auto x = ((int*)arr)[0]; // OK
  auto x = ((int*)arr)[1]; // OK
  auto x = ((int*)arr)[41]; // OK 
  
  // opposite signedness
  auto x = ((unsigned int*)arr)[0]; // OK
  auto x = ((unsigned int*)arr)[1]; // UB
  auto x = ((unsigned int*)arr)[41]; // UB
  
  // for char*, unsigned char* and std::byte*
  auto x = ((char*)arr)[0]; // OK
  auto x = ((char*)arr)[1]; // UB
  auto x = ((char*)arr)[41]; // UB
  
  using T= AllTheRestTypes;
  auto x = ((T*)arr)[0]; // UB
  auto x = ((T*)arr)[1]; // UB
  auto x = ((T*)arr)[41]; // UB


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110927/new/

https://reviews.llvm.org/D110927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D110927: [analyzer] Access stored value of a constant array through a pointer to another type

2021-10-04 Thread Denys Petrov via Phabricator via cfe-commits
ASDenysPetrov updated this revision to Diff 376823.
ASDenysPetrov added a comment.

Fixed a comment.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110927/new/

https://reviews.llvm.org/D110927

Files:
  clang/lib/StaticAnalyzer/Core/RegionStore.cpp
  clang/test/Analysis/initialization.cpp

Index: clang/test/Analysis/initialization.cpp
===
--- clang/test/Analysis/initialization.cpp
+++ clang/test/Analysis/initialization.cpp
@@ -2,6 +2,10 @@
 
 void clang_analyzer_eval(int);
 
+namespace std {
+enum class byte : unsigned char {};
+};
+
 struct S {
   int a = 3;
 };
@@ -48,6 +52,46 @@
   auto x = ptr[idx]; // expected-warning{{garbage or undefined}}
 }
 
+void glob_cast_same() {
+  auto *ptr = (int *)glob_arr2;
+  auto x1 = ptr[0]; // no-warning
+  auto x2 = ptr[1]; // no-warning
+}
+
+void glob_cast_char() {
+  const auto *ptr = (char *)glob_arr2;
+  auto x1 = ptr[0]; // no-warning
+  auto x2 = ptr[1]; // expected-warning{{garbage or undefined}}
+}
+
+void glob_cast_uchar() {
+  auto *ptr = (unsigned char *)glob_arr2;
+  auto x1 = ptr[0]; // no-warning
+  auto x2 = ptr[1]; // expected-warning{{garbage or undefined}}
+}
+
+void glob_cast_byte() {
+  auto *ptr = (const std::byte *)glob_arr2;
+  auto x1 = ptr[0]; // no-warning
+  auto x2 = ptr[1]; // expected-warning{{garbage or undefined}}
+}
+
+void glob_cast_opposite_sign() {
+  auto *ptr = (unsigned int *)glob_arr2;
+  auto x1 = ptr[0]; // no-warning
+  auto x2 = ptr[1]; // expected-warning{{garbage or undefined}}
+}
+
+void glob_cast_invalid1() {
+  auto *ptr = (signed char *)glob_arr2;
+  auto x = ptr[0]; // expected-warning{{garbage or undefined}}
+}
+
+void glob_cast_invalid2() {
+  using T = short *;
+  auto x = ((T)glob_arr2)[0]; // expected-warning{{garbage or undefined}}
+}
+
 const float glob_arr3[] = {
 0., 0.0235, 0.0470, 0.0706, 0.0941, 0.1176};
 float no_warn_garbage_value() {
Index: clang/lib/StaticAnalyzer/Core/RegionStore.cpp
===
--- clang/lib/StaticAnalyzer/Core/RegionStore.cpp
+++ clang/lib/StaticAnalyzer/Core/RegionStore.cpp
@@ -437,6 +437,8 @@
 
   RegionBindingsRef removeSubRegionBindings(RegionBindingsConstRef B,
 const SubRegion *R);
+  bool canAccessStoredValue(QualType OrigT, QualType ThroughT,
+uint64_t Index) const;
 
 public: // Part of public interface to class.
 
@@ -1625,6 +1627,56 @@
   return Result;
 }
 
+/// Returns true if the stored value can be accessed through the pointer to
+/// another type:
+///  const int arr[42] = {};
+///  auto* pchar = (char*)arr;
+///  auto* punsigned = (unsigned int*)arr;
+///  auto* pshort= (short*)arr;
+///  auto x1 = pchar[0]; // valid
+///  auto x2 = pchar[1]; // invalid
+///  auto x3 = punsigned[0]; // valid
+///  auto x4 = pshort[0];// invalid
+bool RegionStoreManager::canAccessStoredValue(QualType OrigT, QualType ThroughT,
+  uint64_t Index) const {
+  // Remove cv-qualifiers.
+  OrigT = OrigT->getCanonicalTypeUnqualified();
+  ThroughT = ThroughT->getCanonicalTypeUnqualified();
+
+  // C++20 7.2.1.11 [basic.lval] (excerpt):
+  //  A program can access the stored value of an object through:
+  //  - the same type of the object;
+  //  - a signed or unsigned type corresponding to the type of the
+  //object;
+  //  - a char, unsigned char, std::byte. (NOTE:
+  //  Otherwise, the behavior is undefined.
+  return
+  // - is same
+  (OrigT == ThroughT) ||
+  // - is another sign
+  (((OrigT == Ctx.CharTy && ThroughT == Ctx.UnsignedCharTy) ||
+(OrigT == Ctx.SignedCharTy && ThroughT == Ctx.UnsignedCharTy) ||
+(OrigT == Ctx.ShortTy && ThroughT == Ctx.UnsignedShortTy) ||
+(OrigT == Ctx.IntTy && ThroughT == Ctx.UnsignedIntTy) ||
+(OrigT == Ctx.LongTy && ThroughT == Ctx.UnsignedLongTy) ||
+(OrigT == Ctx.LongLongTy && ThroughT == Ctx.UnsignedLongLongTy) ||
+(ThroughT == Ctx.CharTy && OrigT == Ctx.UnsignedCharTy) ||
+(ThroughT == Ctx.SignedCharTy && OrigT == Ctx.UnsignedCharTy) ||
+(ThroughT == Ctx.ShortTy && OrigT == Ctx.UnsignedShortTy) ||
+(ThroughT == Ctx.IntTy && OrigT == Ctx.UnsignedIntTy) ||
+(ThroughT == Ctx.LongTy && OrigT == Ctx.UnsignedLongTy) ||
+(ThroughT == Ctx.LongLongTy && OrigT == Ctx.UnsignedLongLongTy) ||
+// - is char, uchar, std::byte
+(ThroughT == Ctx.CharTy) || (ThroughT == Ctx.UnsignedCharTy) ||
+ThroughT->isStdByteType()) &&
+   // NOTE: C++20 6.8.2(3.4) [basic.compound]:
+   //  An object of type T that is not an array element is considered to
+   //  belong to an array with one element of type T.
+   // Hence, the first element can be retrieved only. At least untill a
+   // paper P1839R0 be considered by the committee.
+   (Index == 0));
+}

[PATCH] D110927: [analyzer] Access stored value of a constant array through a pointer to another type

2021-10-01 Thread Shafik Yaghmour via Phabricator via cfe-commits
shafik added a comment.

In D110927#3036647 , @ASDenysPetrov 
wrote:

> In D110927#3036436 , @steakhal 
> wrote:
>
>> I'm pretty sure that `int x4 = ((char*)arr)[1];` is supposed to be valid in 
>> your summary.
>> I think it's allowed by the standard to access any valid object via a 
>> `char*` - according to the strict aliasing rule.
>> @shafik WDYT?
>
> As I found we can legaly treat `char*` as the object of type `char` but not 
> as an array of objects. This is mentioned in 
> http://eel.is/c++draft/basic.compound#3.4 //For purposes of pointer 
> arithmetic ... an object of type T that is not an array element is considered 
> to belong to an array with one element of type T.// That means that we can 
> get only the first element of `char*`, otherwise it would be an UB. There is 
> also a paper to overcome this constraint 
> http://open-std.org/JTC1/SC22/WG21/docs/papers/2019/p1839r0.pdf
>
> @aaron.ballman I would like you join the discussion, as we have similar one 
> in D104285 .

IIUC the object is `const int arr[42]` and the `(char *)arr` is an expression 
of pointer type and adding `1` to this is valid. The case you refer to in 
D104285  ended up being a pointer to an array 
of 2 ints and therefore accessing the third element was out of bounds.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110927/new/

https://reviews.llvm.org/D110927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D110927: [analyzer] Access stored value of a constant array through a pointer to another type

2021-10-01 Thread Denys Petrov via Phabricator via cfe-commits
ASDenysPetrov added a comment.

In D110927#3036436 , @steakhal wrote:

> I'm pretty sure that `int x4 = ((char*)arr)[1];` is supposed to be valid in 
> your summary.
> I think it's allowed by the standard to access any valid object via a `char*` 
> - according to the strict aliasing rule.
> @shafik WDYT?

As I found we can legaly treat `char*` as the object of type `char` but not as 
an array of objects. This is mentioned in 
http://eel.is/c++draft/basic.compound#3.4 //For purposes of pointer arithmetic 
... an object of type T that is not an array element is considered to belong to 
an array with one element of type T.// That means that we can get only the 
first element of `char*`, otherwise it would be an UB.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110927/new/

https://reviews.llvm.org/D110927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D110927: [analyzer] Access stored value of a constant array through a pointer to another type

2021-10-01 Thread Balázs Benics via Phabricator via cfe-commits
steakhal added a subscriber: shafik.
steakhal added a comment.

I'm pretty sure that `int x4 = ((char*)arr)[1];` is supposed to be valid in 
your summary.
I think it's allowed by the standard to access any valid object via a `char*` - 
according to the strict aliasing rule.
@shafik WDYT?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110927/new/

https://reviews.llvm.org/D110927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D110927: [analyzer] Access stored value of a constant array through a pointer to another type

2021-10-01 Thread Denys Petrov via Phabricator via cfe-commits
ASDenysPetrov created this revision.
ASDenysPetrov added reviewers: aaron.ballman, martong, steakhal, NoQ, r.stahl.
ASDenysPetrov added a project: clang.
Herald added subscribers: manas, dkrupp, donat.nagy, Szelethus, 
mikhail.ramalho, a.sidorin, rnkovacs, szepet, baloghadamsoftware, xazax.hun.
ASDenysPetrov requested review of this revision.
Herald added a subscriber: cfe-commits.

Fixed cases in which RegionStore is able to get a stored value of a constant 
array through a pointer of inappropriate type. Adjust 
`RegionStoreManager::getBindingForElement` to the C++20 Standard.
Example:

  const int arr[42] = {1,2,3};
  int x1 = ((unsigned*)arr)[0];  // valid
  int x2 = ((short*)arr)[0]; // invalid
  int x3 = ((char*)arr)[0];  // valid
  int x4 = ((char*)arr)[1];  // invalid


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D110927

Files:
  clang/lib/StaticAnalyzer/Core/RegionStore.cpp
  clang/test/Analysis/initialization.cpp

Index: clang/test/Analysis/initialization.cpp
===
--- clang/test/Analysis/initialization.cpp
+++ clang/test/Analysis/initialization.cpp
@@ -2,6 +2,10 @@
 
 void clang_analyzer_eval(int);
 
+namespace std {
+enum class byte : unsigned char {};
+};
+
 struct S {
   int a = 3;
 };
@@ -48,6 +52,46 @@
   auto x = ptr[idx]; // expected-warning{{garbage or undefined}}
 }
 
+void glob_cast_same() {
+  auto *ptr = (int *)glob_arr2;
+  auto x1 = ptr[0]; // no-warning
+  auto x2 = ptr[1]; // no-warning
+}
+
+void glob_cast_char() {
+  const auto *ptr = (char *)glob_arr2;
+  auto x1 = ptr[0]; // no-warning
+  auto x2 = ptr[1]; // expected-warning{{garbage or undefined}}
+}
+
+void glob_cast_uchar() {
+  auto *ptr = (unsigned char *)glob_arr2;
+  auto x1 = ptr[0]; // no-warning
+  auto x2 = ptr[1]; // expected-warning{{garbage or undefined}}
+}
+
+void glob_cast_byte() {
+  auto *ptr = (const std::byte *)glob_arr2;
+  auto x1 = ptr[0]; // no-warning
+  auto x2 = ptr[1]; // expected-warning{{garbage or undefined}}
+}
+
+void glob_cast_opposite_sign() {
+  auto *ptr = (unsigned int *)glob_arr2;
+  auto x1 = ptr[0]; // no-warning
+  auto x2 = ptr[1]; // expected-warning{{garbage or undefined}}
+}
+
+void glob_cast_invalid1() {
+  auto *ptr = (signed char *)glob_arr2;
+  auto x = ptr[0]; // expected-warning{{garbage or undefined}}
+}
+
+void glob_cast_invalid2() {
+  using T = short *;
+  auto x = ((T)glob_arr2)[0]; // expected-warning{{garbage or undefined}}
+}
+
 const float glob_arr3[] = {
 0., 0.0235, 0.0470, 0.0706, 0.0941, 0.1176};
 float no_warn_garbage_value() {
Index: clang/lib/StaticAnalyzer/Core/RegionStore.cpp
===
--- clang/lib/StaticAnalyzer/Core/RegionStore.cpp
+++ clang/lib/StaticAnalyzer/Core/RegionStore.cpp
@@ -437,6 +437,8 @@
 
   RegionBindingsRef removeSubRegionBindings(RegionBindingsConstRef B,
 const SubRegion *R);
+  bool canAccessStoredValue(QualType OrigT, QualType ThroughT,
+uint64_t Index) const;
 
 public: // Part of public interface to class.
 
@@ -1625,6 +1627,56 @@
   return Result;
 }
 
+/// Returns true if the stored value can be accessed through the pointer to
+/// another type:
+///  const int arr[42] = {};
+///  auto* pchar = (char*)arr;
+///  auto* punsigned = (unsigned int*)arr;
+///  auto* pshort= (short*)arr;
+///  auto x1 = pchar[0]; // valid
+///  auto x2 = pchar[1]; // invalid
+///  auto x3 = punsigned[0]; // valid
+///  auto x4 = pshort;   // invalid
+bool RegionStoreManager::canAccessStoredValue(QualType OrigT, QualType ThroughT,
+  uint64_t Index) const {
+  // Remove cv-qualifiers.
+  OrigT = OrigT->getCanonicalTypeUnqualified();
+  ThroughT = ThroughT->getCanonicalTypeUnqualified();
+
+  // C++20 7.2.1.11 [basic.lval] (excerpt):
+  //  A program can access the stored value of an object through:
+  //  - the same type of the object;
+  //  - a signed or unsigned type corresponding to the type of the
+  //object;
+  //  - a char, unsigned char, std::byte. (NOTE:
+  //  Otherwise, the behavior is undefined.
+  return
+  // - is same
+  (OrigT == ThroughT) ||
+  // - is another sign
+  (((OrigT == Ctx.CharTy && ThroughT == Ctx.UnsignedCharTy) ||
+(OrigT == Ctx.SignedCharTy && ThroughT == Ctx.UnsignedCharTy) ||
+(OrigT == Ctx.ShortTy && ThroughT == Ctx.UnsignedShortTy) ||
+(OrigT == Ctx.IntTy && ThroughT == Ctx.UnsignedIntTy) ||
+(OrigT == Ctx.LongTy && ThroughT == Ctx.UnsignedLongTy) ||
+(OrigT == Ctx.LongLongTy && ThroughT == Ctx.UnsignedLongLongTy) ||
+(ThroughT == Ctx.CharTy && OrigT == Ctx.UnsignedCharTy) ||
+(ThroughT == Ctx.SignedCharTy && OrigT == Ctx.UnsignedCharTy) ||
+(ThroughT == Ctx.ShortTy && OrigT == Ctx.UnsignedShortTy) ||
+(ThroughT == Ctx.IntTy && OrigT ==