[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)

2024-03-21 Thread Balazs Benics via cfe-commits

steakhal wrote:

@NagyDonat Thank you for the excellent feedback. Great recommendations, with 
fixes. Thanks!
All applied.

https://github.com/llvm/llvm-project/pull/85211
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)

2024-03-21 Thread Balazs Benics via cfe-commits

https://github.com/steakhal closed 
https://github.com/llvm/llvm-project/pull/85211
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)

2024-03-21 Thread Balazs Benics via cfe-commits

https://github.com/steakhal updated 
https://github.com/llvm/llvm-project/pull/85211

>From bde85e0d145049a6661afba6f4585865c5630792 Mon Sep 17 00:00:00 2001
From: Balazs Benics 
Date: Thu, 14 Mar 2024 12:38:12 +0100
Subject: [PATCH 1/4] [analyzer] Wrap SymbolicRegions by ElementRegions before
 getting a FieldRegion

Inside the ExprEngine when we process the initializers, we create a
PostInitializer program-point, which will refer to the field being
initialized, see `FieldLoc` inside `ExprEngine::ProcessInitializer`.

When a constructor (of which we evaluate the initializer-list) is analyzed
in top-level context, then the `this` pointer will be represented by a
`SymbolicRegion`, (as it should be).

This means that we will form a `FieldRegion{SymbolicRegion{.}}` as the
initialized region.

```c++
class Bear {
public:
  void brum() const;
};
class Door {
public:
  // PostInitializer would refer to "FieldRegion{SymRegion{this}}"
  // whereas in the store and everywhere else it would be:
  // "FieldRegion{ELementRegion{SymRegion{Ty*, this}, 0, Ty}".
  Door() : ptr(nullptr) {
ptr->brum(); // Bug
  }
private:
  Bear* ptr;
};
```

We (as CSA folks) decided to avoid the creation of FieldRegions directly of
symbolic regions in the past:
https://github.com/llvm/llvm-project/commit/f8643a9b31c4029942f67d4534c9139b45173504

---

In this patch, I propose to also canonicalize it as in the mentioned patch,
into this: `FieldRegion{ElementRegion{SymbolicRegion{Ty*, .}, 0, Ty}`

This would mean that FieldRegions will/should never simply wrap a
SymbolicRegion directly, but rather an ElementRegion that is sitting
in between.

This patch should have practically no observable effects, as the store
(due to the mentioned patch) was made resilient to this issue, but we use
`PostInitializer::getLocationValue()` for an alternative reporting,
where we faced this issue.

Note that in really rare cases it suppresses now dereference bugs, as
demonstrated in the test. It is because in the past we failed to
follow the region of the PostInitializer inside
the StoreSiteFinder visitor - because it was using this code:
```c++
// If this is a post initializer expression, initializing the region, we
// should track the initializer expression.
if (std::optional PIP =
Pred->getLocationAs()) {
  const MemRegion *FieldReg = (const MemRegion *)PIP->getLocationValue();
  if (FieldReg == R) {
StoreSite = Pred;
InitE = PIP->getInitializer()->getInit();
  }
}
```
And the equality check didn't pass for the regions I'm canonicalizing in this 
patch.

Given the nature of this change, we would rather upstream this patch.

CPP-4954
---
 .../Core/PathSensitive/ProgramState.h | 14 
 .../lib/StaticAnalyzer/Core/ProgramState.cpp  | 32 +++
 .../inlining/false-positive-suppression.cpp   | 17 ++
 3 files changed, 49 insertions(+), 14 deletions(-)

diff --git 
a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h 
b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
index ca75c2a756a4a5..47fc4ad88d3161 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
@@ -782,20 +782,6 @@ inline SVal ProgramState::getLValue(const ObjCIvarDecl *D, 
SVal Base) const {
   return getStateManager().StoreMgr->getLValueIvar(D, Base);
 }
 
-inline SVal ProgramState::getLValue(const FieldDecl *D, SVal Base) const {
-  return getStateManager().StoreMgr->getLValueField(D, Base);
-}
-
-inline SVal ProgramState::getLValue(const IndirectFieldDecl *D,
-SVal Base) const {
-  StoreManager  = *getStateManager().StoreMgr;
-  for (const auto *I : D->chain()) {
-Base = SM.getLValueField(cast(I), Base);
-  }
-
-  return Base;
-}
-
 inline SVal ProgramState::getLValue(QualType ElementType, SVal Idx, SVal Base) 
const{
   if (std::optional N = Idx.getAs())
 return getStateManager().StoreMgr->getLValueElement(ElementType, *N, Base);
diff --git a/clang/lib/StaticAnalyzer/Core/ProgramState.cpp 
b/clang/lib/StaticAnalyzer/Core/ProgramState.cpp
index f12f1a5ac970dd..604728cdf1 100644
--- a/clang/lib/StaticAnalyzer/Core/ProgramState.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ProgramState.cpp
@@ -451,6 +451,38 @@ void ProgramState::setStore(const StoreRef ) {
   store = newStoreStore;
 }
 
+/// FieldRegions are expected to be wrapped by an ElementRegion as a canonical
+/// representation. See f8643a9b31c4029942f67d4534c9139b45173504 why.
+static SVal wrapSymbolicRegion(const ProgramState , SVal Base) {
+  const auto *SymbolicBase =
+  dyn_cast_or_null(Base.getAsRegion());
+
+  if (!SymbolicBase)
+return Base;
+
+  StoreManager  = State.getStateManager().getStoreManager();
+  QualType ElemTy = SymbolicBase->getPointeeStaticType();
+  return loc::MemRegionVal{SM.GetElementZeroRegion(SymbolicBase, ElemTy)};
+}
+
+SVal ProgramState::getLValue(const FieldDecl *D, 

[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)

2024-03-18 Thread via cfe-commits


@@ -226,6 +226,20 @@ ProgramStateRef ProgramState::killBinding(Loc LV) const {
   return makeWithStore(newStore);
 }
 
+/// FieldRegions are expected to be wrapped by an ElementRegion as a canonical
+/// representation. See f8643a9b31c4029942f67d4534c9139b45173504 why.

NagyDonat wrote:

It seems that in this comment the `FieldRegion` is a mistake and should be 
changed to `SymbolicRegion`. (But perhaps the comment should be extended to 
something like "As a canonical representation, SymbolicRegions should be 
wrapped by ElementRegions before getting a FieldRegion" to mention all three 
layers of this trickery.)

https://github.com/llvm/llvm-project/pull/85211
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)

2024-03-18 Thread via cfe-commits


@@ -226,6 +226,20 @@ ProgramStateRef ProgramState::killBinding(Loc LV) const {
   return makeWithStore(newStore);
 }
 
+/// FieldRegions are expected to be wrapped by an ElementRegion as a canonical
+/// representation. See f8643a9b31c4029942f67d4534c9139b45173504 why.
+SVal ProgramState::wrapSymbolicRegion(SVal Base) const {
+  const auto *SymbolicBase =
+  dyn_cast_or_null(Base.getAsRegion());
+
+  if (!SymbolicBase)
+return Base;
+
+  StoreManager  = getStateManager().getStoreManager();
+  QualType ElemTy = SymbolicBase->getPointeeStaticType();
+  return loc::MemRegionVal{SM.GetElementZeroRegion(SymbolicBase, ElemTy)};
+}

NagyDonat wrote:

I have several minor issues with the name choices:
- `wrapSymbolicRegion` sounds like a method that expects a symbolic region as 
an argument and always wraps it; I think the effects of this method are better 
represented by something like `wrapRegionIfSymbolic`.
- For me, the variable name `SymbolicBase` was confusing because it strongly 
suggested a connection with `MemRegion::getSymbolicBase()`. ("Why are we taking 
the symbolic base here? Oh, this isn't the symbolic base region of a region, 
this is just the region "base" if it happens to be a symbolic region...")
- The name "Base" is also a bit unfortunate, because in the context of this 
method it's not the "base" of anything. I'd prefer to use something more 
generic like `SV` or `Val` -- or perhaps use a method name like 
`wrapBaseIfSymbolic` to indicate that the purpose of this method is wrapping a 
base region.

https://github.com/llvm/llvm-project/pull/85211
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)

2024-03-18 Thread via cfe-commits

https://github.com/NagyDonat approved this pull request.

Thanks for the update! Unfortunately I have some additional minor remarks 
(sorry for not catching them in the first round).

https://github.com/llvm/llvm-project/pull/85211
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)

2024-03-18 Thread via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/85211
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)

2024-03-18 Thread Balazs Benics via cfe-commits

https://github.com/steakhal updated 
https://github.com/llvm/llvm-project/pull/85211

>From bde85e0d145049a6661afba6f4585865c5630792 Mon Sep 17 00:00:00 2001
From: Balazs Benics 
Date: Thu, 14 Mar 2024 12:38:12 +0100
Subject: [PATCH 1/3] [analyzer] Wrap SymbolicRegions by ElementRegions before
 getting a FieldRegion

Inside the ExprEngine when we process the initializers, we create a
PostInitializer program-point, which will refer to the field being
initialized, see `FieldLoc` inside `ExprEngine::ProcessInitializer`.

When a constructor (of which we evaluate the initializer-list) is analyzed
in top-level context, then the `this` pointer will be represented by a
`SymbolicRegion`, (as it should be).

This means that we will form a `FieldRegion{SymbolicRegion{.}}` as the
initialized region.

```c++
class Bear {
public:
  void brum() const;
};
class Door {
public:
  // PostInitializer would refer to "FieldRegion{SymRegion{this}}"
  // whereas in the store and everywhere else it would be:
  // "FieldRegion{ELementRegion{SymRegion{Ty*, this}, 0, Ty}".
  Door() : ptr(nullptr) {
ptr->brum(); // Bug
  }
private:
  Bear* ptr;
};
```

We (as CSA folks) decided to avoid the creation of FieldRegions directly of
symbolic regions in the past:
https://github.com/llvm/llvm-project/commit/f8643a9b31c4029942f67d4534c9139b45173504

---

In this patch, I propose to also canonicalize it as in the mentioned patch,
into this: `FieldRegion{ElementRegion{SymbolicRegion{Ty*, .}, 0, Ty}`

This would mean that FieldRegions will/should never simply wrap a
SymbolicRegion directly, but rather an ElementRegion that is sitting
in between.

This patch should have practically no observable effects, as the store
(due to the mentioned patch) was made resilient to this issue, but we use
`PostInitializer::getLocationValue()` for an alternative reporting,
where we faced this issue.

Note that in really rare cases it suppresses now dereference bugs, as
demonstrated in the test. It is because in the past we failed to
follow the region of the PostInitializer inside
the StoreSiteFinder visitor - because it was using this code:
```c++
// If this is a post initializer expression, initializing the region, we
// should track the initializer expression.
if (std::optional PIP =
Pred->getLocationAs()) {
  const MemRegion *FieldReg = (const MemRegion *)PIP->getLocationValue();
  if (FieldReg == R) {
StoreSite = Pred;
InitE = PIP->getInitializer()->getInit();
  }
}
```
And the equality check didn't pass for the regions I'm canonicalizing in this 
patch.

Given the nature of this change, we would rather upstream this patch.

CPP-4954
---
 .../Core/PathSensitive/ProgramState.h | 14 
 .../lib/StaticAnalyzer/Core/ProgramState.cpp  | 32 +++
 .../inlining/false-positive-suppression.cpp   | 17 ++
 3 files changed, 49 insertions(+), 14 deletions(-)

diff --git 
a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h 
b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
index ca75c2a756a4a5..47fc4ad88d3161 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
@@ -782,20 +782,6 @@ inline SVal ProgramState::getLValue(const ObjCIvarDecl *D, 
SVal Base) const {
   return getStateManager().StoreMgr->getLValueIvar(D, Base);
 }
 
-inline SVal ProgramState::getLValue(const FieldDecl *D, SVal Base) const {
-  return getStateManager().StoreMgr->getLValueField(D, Base);
-}
-
-inline SVal ProgramState::getLValue(const IndirectFieldDecl *D,
-SVal Base) const {
-  StoreManager  = *getStateManager().StoreMgr;
-  for (const auto *I : D->chain()) {
-Base = SM.getLValueField(cast(I), Base);
-  }
-
-  return Base;
-}
-
 inline SVal ProgramState::getLValue(QualType ElementType, SVal Idx, SVal Base) 
const{
   if (std::optional N = Idx.getAs())
 return getStateManager().StoreMgr->getLValueElement(ElementType, *N, Base);
diff --git a/clang/lib/StaticAnalyzer/Core/ProgramState.cpp 
b/clang/lib/StaticAnalyzer/Core/ProgramState.cpp
index f12f1a5ac970dd..604728cdf1 100644
--- a/clang/lib/StaticAnalyzer/Core/ProgramState.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ProgramState.cpp
@@ -451,6 +451,38 @@ void ProgramState::setStore(const StoreRef ) {
   store = newStoreStore;
 }
 
+/// FieldRegions are expected to be wrapped by an ElementRegion as a canonical
+/// representation. See f8643a9b31c4029942f67d4534c9139b45173504 why.
+static SVal wrapSymbolicRegion(const ProgramState , SVal Base) {
+  const auto *SymbolicBase =
+  dyn_cast_or_null(Base.getAsRegion());
+
+  if (!SymbolicBase)
+return Base;
+
+  StoreManager  = State.getStateManager().getStoreManager();
+  QualType ElemTy = SymbolicBase->getPointeeStaticType();
+  return loc::MemRegionVal{SM.GetElementZeroRegion(SymbolicBase, ElemTy)};
+}
+
+SVal ProgramState::getLValue(const FieldDecl *D, 

[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)

2024-03-18 Thread Balazs Benics via cfe-commits

steakhal wrote:

> I think it's good that we're moving towards establishing the invariant that 
> the parent of a FieldRegion should always be a typed value region (of a 
> suitable type) for the sake of consistency. Do I understand it correctly that 
> this is not too far away? Would it be possible to add an `assert` that 
> mandates this?

I'm not sure. I have not investigated this, and I fear, even if I could run it 
on a bunch of projects - we could still potentially miss one case where it 
wouldn't hold. IDK.

https://github.com/llvm/llvm-project/pull/85211
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)

2024-03-18 Thread Balazs Benics via cfe-commits


@@ -451,6 +451,38 @@ void ProgramState::setStore(const StoreRef ) {
   store = newStoreStore;
 }
 
+/// FieldRegions are expected to be wrapped by an ElementRegion as a canonical
+/// representation. See f8643a9b31c4029942f67d4534c9139b45173504 why.
+static SVal wrapSymbolicRegion(const ProgramState , SVal Base) {

steakhal wrote:

Makes sense. Moved into a private member function.

https://github.com/llvm/llvm-project/pull/85211
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)

2024-03-18 Thread Balazs Benics via cfe-commits

https://github.com/steakhal updated 
https://github.com/llvm/llvm-project/pull/85211

>From bde85e0d145049a6661afba6f4585865c5630792 Mon Sep 17 00:00:00 2001
From: Balazs Benics 
Date: Thu, 14 Mar 2024 12:38:12 +0100
Subject: [PATCH 1/2] [analyzer] Wrap SymbolicRegions by ElementRegions before
 getting a FieldRegion

Inside the ExprEngine when we process the initializers, we create a
PostInitializer program-point, which will refer to the field being
initialized, see `FieldLoc` inside `ExprEngine::ProcessInitializer`.

When a constructor (of which we evaluate the initializer-list) is analyzed
in top-level context, then the `this` pointer will be represented by a
`SymbolicRegion`, (as it should be).

This means that we will form a `FieldRegion{SymbolicRegion{.}}` as the
initialized region.

```c++
class Bear {
public:
  void brum() const;
};
class Door {
public:
  // PostInitializer would refer to "FieldRegion{SymRegion{this}}"
  // whereas in the store and everywhere else it would be:
  // "FieldRegion{ELementRegion{SymRegion{Ty*, this}, 0, Ty}".
  Door() : ptr(nullptr) {
ptr->brum(); // Bug
  }
private:
  Bear* ptr;
};
```

We (as CSA folks) decided to avoid the creation of FieldRegions directly of
symbolic regions in the past:
https://github.com/llvm/llvm-project/commit/f8643a9b31c4029942f67d4534c9139b45173504

---

In this patch, I propose to also canonicalize it as in the mentioned patch,
into this: `FieldRegion{ElementRegion{SymbolicRegion{Ty*, .}, 0, Ty}`

This would mean that FieldRegions will/should never simply wrap a
SymbolicRegion directly, but rather an ElementRegion that is sitting
in between.

This patch should have practically no observable effects, as the store
(due to the mentioned patch) was made resilient to this issue, but we use
`PostInitializer::getLocationValue()` for an alternative reporting,
where we faced this issue.

Note that in really rare cases it suppresses now dereference bugs, as
demonstrated in the test. It is because in the past we failed to
follow the region of the PostInitializer inside
the StoreSiteFinder visitor - because it was using this code:
```c++
// If this is a post initializer expression, initializing the region, we
// should track the initializer expression.
if (std::optional PIP =
Pred->getLocationAs()) {
  const MemRegion *FieldReg = (const MemRegion *)PIP->getLocationValue();
  if (FieldReg == R) {
StoreSite = Pred;
InitE = PIP->getInitializer()->getInit();
  }
}
```
And the equality check didn't pass for the regions I'm canonicalizing in this 
patch.

Given the nature of this change, we would rather upstream this patch.

CPP-4954
---
 .../Core/PathSensitive/ProgramState.h | 14 
 .../lib/StaticAnalyzer/Core/ProgramState.cpp  | 32 +++
 .../inlining/false-positive-suppression.cpp   | 17 ++
 3 files changed, 49 insertions(+), 14 deletions(-)

diff --git 
a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h 
b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
index ca75c2a756a4a5..47fc4ad88d3161 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
@@ -782,20 +782,6 @@ inline SVal ProgramState::getLValue(const ObjCIvarDecl *D, 
SVal Base) const {
   return getStateManager().StoreMgr->getLValueIvar(D, Base);
 }
 
-inline SVal ProgramState::getLValue(const FieldDecl *D, SVal Base) const {
-  return getStateManager().StoreMgr->getLValueField(D, Base);
-}
-
-inline SVal ProgramState::getLValue(const IndirectFieldDecl *D,
-SVal Base) const {
-  StoreManager  = *getStateManager().StoreMgr;
-  for (const auto *I : D->chain()) {
-Base = SM.getLValueField(cast(I), Base);
-  }
-
-  return Base;
-}
-
 inline SVal ProgramState::getLValue(QualType ElementType, SVal Idx, SVal Base) 
const{
   if (std::optional N = Idx.getAs())
 return getStateManager().StoreMgr->getLValueElement(ElementType, *N, Base);
diff --git a/clang/lib/StaticAnalyzer/Core/ProgramState.cpp 
b/clang/lib/StaticAnalyzer/Core/ProgramState.cpp
index f12f1a5ac970dd..604728cdf1 100644
--- a/clang/lib/StaticAnalyzer/Core/ProgramState.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ProgramState.cpp
@@ -451,6 +451,38 @@ void ProgramState::setStore(const StoreRef ) {
   store = newStoreStore;
 }
 
+/// FieldRegions are expected to be wrapped by an ElementRegion as a canonical
+/// representation. See f8643a9b31c4029942f67d4534c9139b45173504 why.
+static SVal wrapSymbolicRegion(const ProgramState , SVal Base) {
+  const auto *SymbolicBase =
+  dyn_cast_or_null(Base.getAsRegion());
+
+  if (!SymbolicBase)
+return Base;
+
+  StoreManager  = State.getStateManager().getStoreManager();
+  QualType ElemTy = SymbolicBase->getPointeeStaticType();
+  return loc::MemRegionVal{SM.GetElementZeroRegion(SymbolicBase, ElemTy)};
+}
+
+SVal ProgramState::getLValue(const FieldDecl *D, 

[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)

2024-03-18 Thread via cfe-commits


@@ -451,6 +451,38 @@ void ProgramState::setStore(const StoreRef ) {
   store = newStoreStore;
 }
 
+/// FieldRegions are expected to be wrapped by an ElementRegion as a canonical
+/// representation. See f8643a9b31c4029942f67d4534c9139b45173504 why.
+static SVal wrapSymbolicRegion(const ProgramState , SVal Base) {

NagyDonat wrote:

Why is this a global static function and not a private method of 
`ProgramState`? Calling this function with `*this` as the first argument 
triggers my "Why is this trickery useful?" intuition; and if it could work as a 
method, then I would prefer that "more natural" solution.

https://github.com/llvm/llvm-project/pull/85211
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)

2024-03-18 Thread via cfe-commits

https://github.com/NagyDonat approved this pull request.

Looks good overall, my only inline comment is a minor stylistic question.

I think it's good that we're moving towards establishing the invariant that the 
parent of a FieldRegion should always be a typed value region (of a suitable 
type) for the sake of consistency. Do I understand it correctly that this is 
not too far away? Would it be possible to add an `assert` that mandates this?

On the other hand, I still feel that it's problematic that we're using 
`ElementRegion` to represent three different things (real element access, 
pointer arithmetic, type conversions). Even changing the name to something more 
general would be a step forward. Of course fixing this mess is a difficult 
long-term goal, and until then it's fine to work with the currently existing 
tools.

https://github.com/llvm/llvm-project/pull/85211
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)

2024-03-18 Thread via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/85211
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)

2024-03-14 Thread via cfe-commits

llvmbot wrote:




@llvm/pr-subscribers-clang-static-analyzer-1

Author: Balazs Benics (steakhal)


Changes

Inside the ExprEngine when we process the initializers, we create a 
PostInitializer program-point, which will refer to the field being initialized, 
see `FieldLoc` inside `ExprEngine::ProcessInitializer`.

When a constructor (of which we evaluate the initializer-list) is analyzed in 
top-level context, then the `this` pointer will be represented by a 
`SymbolicRegion`, (as it should be).

This means that we will form a `FieldRegion{SymbolicRegion{.}}` as the 
initialized region.

```c++
class Bear {
public:
  void brum() const;
};
class Door {
public:
  // PostInitializer would refer to "FieldRegion{SymRegion{this}}"
  // whereas in the store and everywhere else it would be:
  // "FieldRegion{ELementRegion{SymRegion{Ty*, this}, 0, Ty}".
  Door() : ptr(nullptr) {
ptr-brum(); // Bug
  }
private:
  Bear* ptr;
};
```

We (as CSA folks) decided to avoid the creation of FieldRegions directly of 
symbolic regions in the past:
https://github.com/llvm/llvm-project/commit/f8643a9b31c4029942f67d4534c9139b45173504

---

In this patch, I propose to also canonicalize it as in the mentioned patch, 
into this: `FieldRegion{ElementRegion{SymbolicRegion{Ty*, .}, 0, Ty}`

This would mean that FieldRegions will/should never simply wrap a 
SymbolicRegion directly, but rather an ElementRegion that is sitting in between.

This patch should have practically no observable effects, as the store (due to 
the mentioned patch) was made resilient to this issue, but we use 
`PostInitializer::getLocationValue()` for an alternative reporting, where we 
faced this issue.

Note that in really rare cases it suppresses now dereference bugs, as 
demonstrated in the test. It is because in the past we failed to follow the 
region of the PostInitializer inside the StoreSiteFinder visitor - because it 
was using this code:
```c++
// If this is a post initializer expression, initializing the region, we
// should track the initializer expression.
if (std::optionalPostInitializer PIP =
Pred-getLocationAsPostInitializer()) {
  const MemRegion *FieldReg = (const MemRegion *)PIP-getLocationValue();
  if (FieldReg == R) {
StoreSite = Pred;
InitE = PIP-getInitializer()-getInit();
  }
}
```
Notice that the equality check didn't pass for the regions I'm canonicalizing 
in this patch.

Given the nature of this change, we would rather upstream this patch.

CPP-4954

---
Full diff: https://github.com/llvm/llvm-project/pull/85211.diff


3 Files Affected:

- (modified) 
clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h (-14) 
- (modified) clang/lib/StaticAnalyzer/Core/ProgramState.cpp (+32) 
- (modified) clang/test/Analysis/inlining/false-positive-suppression.cpp (+17) 


``diff
diff --git 
a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h 
b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
index ca75c2a756a4a5..47fc4ad88d3161 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
@@ -782,20 +782,6 @@ inline SVal ProgramState::getLValue(const ObjCIvarDecl *D, 
SVal Base) const {
   return getStateManager().StoreMgr->getLValueIvar(D, Base);
 }
 
-inline SVal ProgramState::getLValue(const FieldDecl *D, SVal Base) const {
-  return getStateManager().StoreMgr->getLValueField(D, Base);
-}
-
-inline SVal ProgramState::getLValue(const IndirectFieldDecl *D,
-SVal Base) const {
-  StoreManager  = *getStateManager().StoreMgr;
-  for (const auto *I : D->chain()) {
-Base = SM.getLValueField(cast(I), Base);
-  }
-
-  return Base;
-}
-
 inline SVal ProgramState::getLValue(QualType ElementType, SVal Idx, SVal Base) 
const{
   if (std::optional N = Idx.getAs())
 return getStateManager().StoreMgr->getLValueElement(ElementType, *N, Base);
diff --git a/clang/lib/StaticAnalyzer/Core/ProgramState.cpp 
b/clang/lib/StaticAnalyzer/Core/ProgramState.cpp
index f12f1a5ac970dd..604728cdf1 100644
--- a/clang/lib/StaticAnalyzer/Core/ProgramState.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ProgramState.cpp
@@ -451,6 +451,38 @@ void ProgramState::setStore(const StoreRef ) {
   store = newStoreStore;
 }
 
+/// FieldRegions are expected to be wrapped by an ElementRegion as a canonical
+/// representation. See f8643a9b31c4029942f67d4534c9139b45173504 why.
+static SVal wrapSymbolicRegion(const ProgramState , SVal Base) {
+  const auto *SymbolicBase =
+  dyn_cast_or_null(Base.getAsRegion());
+
+  if (!SymbolicBase)
+return Base;
+
+  StoreManager  = State.getStateManager().getStoreManager();
+  QualType ElemTy = SymbolicBase->getPointeeStaticType();
+  return loc::MemRegionVal{SM.GetElementZeroRegion(SymbolicBase, ElemTy)};
+}
+
+SVal ProgramState::getLValue(const FieldDecl *D, SVal Base) const {
+  Base = wrapSymbolicRegion(*this, Base);
+  return 

[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)

2024-03-14 Thread Balazs Benics via cfe-commits

https://github.com/steakhal created 
https://github.com/llvm/llvm-project/pull/85211

Inside the ExprEngine when we process the initializers, we create a 
PostInitializer program-point, which will refer to the field being initialized, 
see `FieldLoc` inside `ExprEngine::ProcessInitializer`.

When a constructor (of which we evaluate the initializer-list) is analyzed in 
top-level context, then the `this` pointer will be represented by a 
`SymbolicRegion`, (as it should be).

This means that we will form a `FieldRegion{SymbolicRegion{.}}` as the 
initialized region.

```c++
class Bear {
public:
  void brum() const;
};
class Door {
public:
  // PostInitializer would refer to "FieldRegion{SymRegion{this}}"
  // whereas in the store and everywhere else it would be:
  // "FieldRegion{ELementRegion{SymRegion{Ty*, this}, 0, Ty}".
  Door() : ptr(nullptr) {
ptr->brum(); // Bug
  }
private:
  Bear* ptr;
};
```

We (as CSA folks) decided to avoid the creation of FieldRegions directly of 
symbolic regions in the past:
https://github.com/llvm/llvm-project/commit/f8643a9b31c4029942f67d4534c9139b45173504

---

In this patch, I propose to also canonicalize it as in the mentioned patch, 
into this: `FieldRegion{ElementRegion{SymbolicRegion{Ty*, .}, 0, Ty}`

This would mean that FieldRegions will/should never simply wrap a 
SymbolicRegion directly, but rather an ElementRegion that is sitting in between.

This patch should have practically no observable effects, as the store (due to 
the mentioned patch) was made resilient to this issue, but we use 
`PostInitializer::getLocationValue()` for an alternative reporting, where we 
faced this issue.

Note that in really rare cases it suppresses now dereference bugs, as 
demonstrated in the test. It is because in the past we failed to follow the 
region of the PostInitializer inside the StoreSiteFinder visitor - because it 
was using this code:
```c++
// If this is a post initializer expression, initializing the region, we
// should track the initializer expression.
if (std::optional PIP =
Pred->getLocationAs()) {
  const MemRegion *FieldReg = (const MemRegion *)PIP->getLocationValue();
  if (FieldReg == R) {
StoreSite = Pred;
InitE = PIP->getInitializer()->getInit();
  }
}
```
Notice that the equality check didn't pass for the regions I'm canonicalizing 
in this patch.

Given the nature of this change, we would rather upstream this patch.

CPP-4954

>From bde85e0d145049a6661afba6f4585865c5630792 Mon Sep 17 00:00:00 2001
From: Balazs Benics 
Date: Thu, 14 Mar 2024 12:38:12 +0100
Subject: [PATCH] [analyzer] Wrap SymbolicRegions by ElementRegions before
 getting a FieldRegion

Inside the ExprEngine when we process the initializers, we create a
PostInitializer program-point, which will refer to the field being
initialized, see `FieldLoc` inside `ExprEngine::ProcessInitializer`.

When a constructor (of which we evaluate the initializer-list) is analyzed
in top-level context, then the `this` pointer will be represented by a
`SymbolicRegion`, (as it should be).

This means that we will form a `FieldRegion{SymbolicRegion{.}}` as the
initialized region.

```c++
class Bear {
public:
  void brum() const;
};
class Door {
public:
  // PostInitializer would refer to "FieldRegion{SymRegion{this}}"
  // whereas in the store and everywhere else it would be:
  // "FieldRegion{ELementRegion{SymRegion{Ty*, this}, 0, Ty}".
  Door() : ptr(nullptr) {
ptr->brum(); // Bug
  }
private:
  Bear* ptr;
};
```

We (as CSA folks) decided to avoid the creation of FieldRegions directly of
symbolic regions in the past:
https://github.com/llvm/llvm-project/commit/f8643a9b31c4029942f67d4534c9139b45173504

---

In this patch, I propose to also canonicalize it as in the mentioned patch,
into this: `FieldRegion{ElementRegion{SymbolicRegion{Ty*, .}, 0, Ty}`

This would mean that FieldRegions will/should never simply wrap a
SymbolicRegion directly, but rather an ElementRegion that is sitting
in between.

This patch should have practically no observable effects, as the store
(due to the mentioned patch) was made resilient to this issue, but we use
`PostInitializer::getLocationValue()` for an alternative reporting,
where we faced this issue.

Note that in really rare cases it suppresses now dereference bugs, as
demonstrated in the test. It is because in the past we failed to
follow the region of the PostInitializer inside
the StoreSiteFinder visitor - because it was using this code:
```c++
// If this is a post initializer expression, initializing the region, we
// should track the initializer expression.
if (std::optional PIP =
Pred->getLocationAs()) {
  const MemRegion *FieldReg = (const MemRegion *)PIP->getLocationValue();
  if (FieldReg == R) {
StoreSite = Pred;
InitE = PIP->getInitializer()->getInit();
  }
}
```
And the equality check didn't pass for the regions I'm canonicalizing in this 
patch.

Given the nature of this change, we would rather upstream this patch.

CPP-4954
---