[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)
steakhal wrote: @NagyDonat Thank you for the excellent feedback. Great recommendations, with fixes. Thanks! All applied. https://github.com/llvm/llvm-project/pull/85211 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)
https://github.com/steakhal closed https://github.com/llvm/llvm-project/pull/85211 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)
https://github.com/steakhal updated https://github.com/llvm/llvm-project/pull/85211 >From bde85e0d145049a6661afba6f4585865c5630792 Mon Sep 17 00:00:00 2001 From: Balazs Benics Date: Thu, 14 Mar 2024 12:38:12 +0100 Subject: [PATCH 1/4] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion Inside the ExprEngine when we process the initializers, we create a PostInitializer program-point, which will refer to the field being initialized, see `FieldLoc` inside `ExprEngine::ProcessInitializer`. When a constructor (of which we evaluate the initializer-list) is analyzed in top-level context, then the `this` pointer will be represented by a `SymbolicRegion`, (as it should be). This means that we will form a `FieldRegion{SymbolicRegion{.}}` as the initialized region. ```c++ class Bear { public: void brum() const; }; class Door { public: // PostInitializer would refer to "FieldRegion{SymRegion{this}}" // whereas in the store and everywhere else it would be: // "FieldRegion{ELementRegion{SymRegion{Ty*, this}, 0, Ty}". Door() : ptr(nullptr) { ptr->brum(); // Bug } private: Bear* ptr; }; ``` We (as CSA folks) decided to avoid the creation of FieldRegions directly of symbolic regions in the past: https://github.com/llvm/llvm-project/commit/f8643a9b31c4029942f67d4534c9139b45173504 --- In this patch, I propose to also canonicalize it as in the mentioned patch, into this: `FieldRegion{ElementRegion{SymbolicRegion{Ty*, .}, 0, Ty}` This would mean that FieldRegions will/should never simply wrap a SymbolicRegion directly, but rather an ElementRegion that is sitting in between. This patch should have practically no observable effects, as the store (due to the mentioned patch) was made resilient to this issue, but we use `PostInitializer::getLocationValue()` for an alternative reporting, where we faced this issue. Note that in really rare cases it suppresses now dereference bugs, as demonstrated in the test. It is because in the past we failed to follow the region of the PostInitializer inside the StoreSiteFinder visitor - because it was using this code: ```c++ // If this is a post initializer expression, initializing the region, we // should track the initializer expression. if (std::optional PIP = Pred->getLocationAs()) { const MemRegion *FieldReg = (const MemRegion *)PIP->getLocationValue(); if (FieldReg == R) { StoreSite = Pred; InitE = PIP->getInitializer()->getInit(); } } ``` And the equality check didn't pass for the regions I'm canonicalizing in this patch. Given the nature of this change, we would rather upstream this patch. CPP-4954 --- .../Core/PathSensitive/ProgramState.h | 14 .../lib/StaticAnalyzer/Core/ProgramState.cpp | 32 +++ .../inlining/false-positive-suppression.cpp | 17 ++ 3 files changed, 49 insertions(+), 14 deletions(-) diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h index ca75c2a756a4a5..47fc4ad88d3161 100644 --- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h +++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h @@ -782,20 +782,6 @@ inline SVal ProgramState::getLValue(const ObjCIvarDecl *D, SVal Base) const { return getStateManager().StoreMgr->getLValueIvar(D, Base); } -inline SVal ProgramState::getLValue(const FieldDecl *D, SVal Base) const { - return getStateManager().StoreMgr->getLValueField(D, Base); -} - -inline SVal ProgramState::getLValue(const IndirectFieldDecl *D, -SVal Base) const { - StoreManager = *getStateManager().StoreMgr; - for (const auto *I : D->chain()) { -Base = SM.getLValueField(cast(I), Base); - } - - return Base; -} - inline SVal ProgramState::getLValue(QualType ElementType, SVal Idx, SVal Base) const{ if (std::optional N = Idx.getAs()) return getStateManager().StoreMgr->getLValueElement(ElementType, *N, Base); diff --git a/clang/lib/StaticAnalyzer/Core/ProgramState.cpp b/clang/lib/StaticAnalyzer/Core/ProgramState.cpp index f12f1a5ac970dd..604728cdf1 100644 --- a/clang/lib/StaticAnalyzer/Core/ProgramState.cpp +++ b/clang/lib/StaticAnalyzer/Core/ProgramState.cpp @@ -451,6 +451,38 @@ void ProgramState::setStore(const StoreRef ) { store = newStoreStore; } +/// FieldRegions are expected to be wrapped by an ElementRegion as a canonical +/// representation. See f8643a9b31c4029942f67d4534c9139b45173504 why. +static SVal wrapSymbolicRegion(const ProgramState , SVal Base) { + const auto *SymbolicBase = + dyn_cast_or_null(Base.getAsRegion()); + + if (!SymbolicBase) +return Base; + + StoreManager = State.getStateManager().getStoreManager(); + QualType ElemTy = SymbolicBase->getPointeeStaticType(); + return loc::MemRegionVal{SM.GetElementZeroRegion(SymbolicBase, ElemTy)}; +} + +SVal ProgramState::getLValue(const FieldDecl *D,
[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)
@@ -226,6 +226,20 @@ ProgramStateRef ProgramState::killBinding(Loc LV) const { return makeWithStore(newStore); } +/// FieldRegions are expected to be wrapped by an ElementRegion as a canonical +/// representation. See f8643a9b31c4029942f67d4534c9139b45173504 why. NagyDonat wrote: It seems that in this comment the `FieldRegion` is a mistake and should be changed to `SymbolicRegion`. (But perhaps the comment should be extended to something like "As a canonical representation, SymbolicRegions should be wrapped by ElementRegions before getting a FieldRegion" to mention all three layers of this trickery.) https://github.com/llvm/llvm-project/pull/85211 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)
@@ -226,6 +226,20 @@ ProgramStateRef ProgramState::killBinding(Loc LV) const { return makeWithStore(newStore); } +/// FieldRegions are expected to be wrapped by an ElementRegion as a canonical +/// representation. See f8643a9b31c4029942f67d4534c9139b45173504 why. +SVal ProgramState::wrapSymbolicRegion(SVal Base) const { + const auto *SymbolicBase = + dyn_cast_or_null(Base.getAsRegion()); + + if (!SymbolicBase) +return Base; + + StoreManager = getStateManager().getStoreManager(); + QualType ElemTy = SymbolicBase->getPointeeStaticType(); + return loc::MemRegionVal{SM.GetElementZeroRegion(SymbolicBase, ElemTy)}; +} NagyDonat wrote: I have several minor issues with the name choices: - `wrapSymbolicRegion` sounds like a method that expects a symbolic region as an argument and always wraps it; I think the effects of this method are better represented by something like `wrapRegionIfSymbolic`. - For me, the variable name `SymbolicBase` was confusing because it strongly suggested a connection with `MemRegion::getSymbolicBase()`. ("Why are we taking the symbolic base here? Oh, this isn't the symbolic base region of a region, this is just the region "base" if it happens to be a symbolic region...") - The name "Base" is also a bit unfortunate, because in the context of this method it's not the "base" of anything. I'd prefer to use something more generic like `SV` or `Val` -- or perhaps use a method name like `wrapBaseIfSymbolic` to indicate that the purpose of this method is wrapping a base region. https://github.com/llvm/llvm-project/pull/85211 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)
https://github.com/NagyDonat approved this pull request. Thanks for the update! Unfortunately I have some additional minor remarks (sorry for not catching them in the first round). https://github.com/llvm/llvm-project/pull/85211 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)
https://github.com/NagyDonat edited https://github.com/llvm/llvm-project/pull/85211 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)
https://github.com/steakhal updated https://github.com/llvm/llvm-project/pull/85211 >From bde85e0d145049a6661afba6f4585865c5630792 Mon Sep 17 00:00:00 2001 From: Balazs Benics Date: Thu, 14 Mar 2024 12:38:12 +0100 Subject: [PATCH 1/3] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion Inside the ExprEngine when we process the initializers, we create a PostInitializer program-point, which will refer to the field being initialized, see `FieldLoc` inside `ExprEngine::ProcessInitializer`. When a constructor (of which we evaluate the initializer-list) is analyzed in top-level context, then the `this` pointer will be represented by a `SymbolicRegion`, (as it should be). This means that we will form a `FieldRegion{SymbolicRegion{.}}` as the initialized region. ```c++ class Bear { public: void brum() const; }; class Door { public: // PostInitializer would refer to "FieldRegion{SymRegion{this}}" // whereas in the store and everywhere else it would be: // "FieldRegion{ELementRegion{SymRegion{Ty*, this}, 0, Ty}". Door() : ptr(nullptr) { ptr->brum(); // Bug } private: Bear* ptr; }; ``` We (as CSA folks) decided to avoid the creation of FieldRegions directly of symbolic regions in the past: https://github.com/llvm/llvm-project/commit/f8643a9b31c4029942f67d4534c9139b45173504 --- In this patch, I propose to also canonicalize it as in the mentioned patch, into this: `FieldRegion{ElementRegion{SymbolicRegion{Ty*, .}, 0, Ty}` This would mean that FieldRegions will/should never simply wrap a SymbolicRegion directly, but rather an ElementRegion that is sitting in between. This patch should have practically no observable effects, as the store (due to the mentioned patch) was made resilient to this issue, but we use `PostInitializer::getLocationValue()` for an alternative reporting, where we faced this issue. Note that in really rare cases it suppresses now dereference bugs, as demonstrated in the test. It is because in the past we failed to follow the region of the PostInitializer inside the StoreSiteFinder visitor - because it was using this code: ```c++ // If this is a post initializer expression, initializing the region, we // should track the initializer expression. if (std::optional PIP = Pred->getLocationAs()) { const MemRegion *FieldReg = (const MemRegion *)PIP->getLocationValue(); if (FieldReg == R) { StoreSite = Pred; InitE = PIP->getInitializer()->getInit(); } } ``` And the equality check didn't pass for the regions I'm canonicalizing in this patch. Given the nature of this change, we would rather upstream this patch. CPP-4954 --- .../Core/PathSensitive/ProgramState.h | 14 .../lib/StaticAnalyzer/Core/ProgramState.cpp | 32 +++ .../inlining/false-positive-suppression.cpp | 17 ++ 3 files changed, 49 insertions(+), 14 deletions(-) diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h index ca75c2a756a4a5..47fc4ad88d3161 100644 --- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h +++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h @@ -782,20 +782,6 @@ inline SVal ProgramState::getLValue(const ObjCIvarDecl *D, SVal Base) const { return getStateManager().StoreMgr->getLValueIvar(D, Base); } -inline SVal ProgramState::getLValue(const FieldDecl *D, SVal Base) const { - return getStateManager().StoreMgr->getLValueField(D, Base); -} - -inline SVal ProgramState::getLValue(const IndirectFieldDecl *D, -SVal Base) const { - StoreManager = *getStateManager().StoreMgr; - for (const auto *I : D->chain()) { -Base = SM.getLValueField(cast(I), Base); - } - - return Base; -} - inline SVal ProgramState::getLValue(QualType ElementType, SVal Idx, SVal Base) const{ if (std::optional N = Idx.getAs()) return getStateManager().StoreMgr->getLValueElement(ElementType, *N, Base); diff --git a/clang/lib/StaticAnalyzer/Core/ProgramState.cpp b/clang/lib/StaticAnalyzer/Core/ProgramState.cpp index f12f1a5ac970dd..604728cdf1 100644 --- a/clang/lib/StaticAnalyzer/Core/ProgramState.cpp +++ b/clang/lib/StaticAnalyzer/Core/ProgramState.cpp @@ -451,6 +451,38 @@ void ProgramState::setStore(const StoreRef ) { store = newStoreStore; } +/// FieldRegions are expected to be wrapped by an ElementRegion as a canonical +/// representation. See f8643a9b31c4029942f67d4534c9139b45173504 why. +static SVal wrapSymbolicRegion(const ProgramState , SVal Base) { + const auto *SymbolicBase = + dyn_cast_or_null(Base.getAsRegion()); + + if (!SymbolicBase) +return Base; + + StoreManager = State.getStateManager().getStoreManager(); + QualType ElemTy = SymbolicBase->getPointeeStaticType(); + return loc::MemRegionVal{SM.GetElementZeroRegion(SymbolicBase, ElemTy)}; +} + +SVal ProgramState::getLValue(const FieldDecl *D,
[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)
steakhal wrote: > I think it's good that we're moving towards establishing the invariant that > the parent of a FieldRegion should always be a typed value region (of a > suitable type) for the sake of consistency. Do I understand it correctly that > this is not too far away? Would it be possible to add an `assert` that > mandates this? I'm not sure. I have not investigated this, and I fear, even if I could run it on a bunch of projects - we could still potentially miss one case where it wouldn't hold. IDK. https://github.com/llvm/llvm-project/pull/85211 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)
@@ -451,6 +451,38 @@ void ProgramState::setStore(const StoreRef ) { store = newStoreStore; } +/// FieldRegions are expected to be wrapped by an ElementRegion as a canonical +/// representation. See f8643a9b31c4029942f67d4534c9139b45173504 why. +static SVal wrapSymbolicRegion(const ProgramState , SVal Base) { steakhal wrote: Makes sense. Moved into a private member function. https://github.com/llvm/llvm-project/pull/85211 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)
https://github.com/steakhal updated https://github.com/llvm/llvm-project/pull/85211 >From bde85e0d145049a6661afba6f4585865c5630792 Mon Sep 17 00:00:00 2001 From: Balazs Benics Date: Thu, 14 Mar 2024 12:38:12 +0100 Subject: [PATCH 1/2] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion Inside the ExprEngine when we process the initializers, we create a PostInitializer program-point, which will refer to the field being initialized, see `FieldLoc` inside `ExprEngine::ProcessInitializer`. When a constructor (of which we evaluate the initializer-list) is analyzed in top-level context, then the `this` pointer will be represented by a `SymbolicRegion`, (as it should be). This means that we will form a `FieldRegion{SymbolicRegion{.}}` as the initialized region. ```c++ class Bear { public: void brum() const; }; class Door { public: // PostInitializer would refer to "FieldRegion{SymRegion{this}}" // whereas in the store and everywhere else it would be: // "FieldRegion{ELementRegion{SymRegion{Ty*, this}, 0, Ty}". Door() : ptr(nullptr) { ptr->brum(); // Bug } private: Bear* ptr; }; ``` We (as CSA folks) decided to avoid the creation of FieldRegions directly of symbolic regions in the past: https://github.com/llvm/llvm-project/commit/f8643a9b31c4029942f67d4534c9139b45173504 --- In this patch, I propose to also canonicalize it as in the mentioned patch, into this: `FieldRegion{ElementRegion{SymbolicRegion{Ty*, .}, 0, Ty}` This would mean that FieldRegions will/should never simply wrap a SymbolicRegion directly, but rather an ElementRegion that is sitting in between. This patch should have practically no observable effects, as the store (due to the mentioned patch) was made resilient to this issue, but we use `PostInitializer::getLocationValue()` for an alternative reporting, where we faced this issue. Note that in really rare cases it suppresses now dereference bugs, as demonstrated in the test. It is because in the past we failed to follow the region of the PostInitializer inside the StoreSiteFinder visitor - because it was using this code: ```c++ // If this is a post initializer expression, initializing the region, we // should track the initializer expression. if (std::optional PIP = Pred->getLocationAs()) { const MemRegion *FieldReg = (const MemRegion *)PIP->getLocationValue(); if (FieldReg == R) { StoreSite = Pred; InitE = PIP->getInitializer()->getInit(); } } ``` And the equality check didn't pass for the regions I'm canonicalizing in this patch. Given the nature of this change, we would rather upstream this patch. CPP-4954 --- .../Core/PathSensitive/ProgramState.h | 14 .../lib/StaticAnalyzer/Core/ProgramState.cpp | 32 +++ .../inlining/false-positive-suppression.cpp | 17 ++ 3 files changed, 49 insertions(+), 14 deletions(-) diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h index ca75c2a756a4a5..47fc4ad88d3161 100644 --- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h +++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h @@ -782,20 +782,6 @@ inline SVal ProgramState::getLValue(const ObjCIvarDecl *D, SVal Base) const { return getStateManager().StoreMgr->getLValueIvar(D, Base); } -inline SVal ProgramState::getLValue(const FieldDecl *D, SVal Base) const { - return getStateManager().StoreMgr->getLValueField(D, Base); -} - -inline SVal ProgramState::getLValue(const IndirectFieldDecl *D, -SVal Base) const { - StoreManager = *getStateManager().StoreMgr; - for (const auto *I : D->chain()) { -Base = SM.getLValueField(cast(I), Base); - } - - return Base; -} - inline SVal ProgramState::getLValue(QualType ElementType, SVal Idx, SVal Base) const{ if (std::optional N = Idx.getAs()) return getStateManager().StoreMgr->getLValueElement(ElementType, *N, Base); diff --git a/clang/lib/StaticAnalyzer/Core/ProgramState.cpp b/clang/lib/StaticAnalyzer/Core/ProgramState.cpp index f12f1a5ac970dd..604728cdf1 100644 --- a/clang/lib/StaticAnalyzer/Core/ProgramState.cpp +++ b/clang/lib/StaticAnalyzer/Core/ProgramState.cpp @@ -451,6 +451,38 @@ void ProgramState::setStore(const StoreRef ) { store = newStoreStore; } +/// FieldRegions are expected to be wrapped by an ElementRegion as a canonical +/// representation. See f8643a9b31c4029942f67d4534c9139b45173504 why. +static SVal wrapSymbolicRegion(const ProgramState , SVal Base) { + const auto *SymbolicBase = + dyn_cast_or_null(Base.getAsRegion()); + + if (!SymbolicBase) +return Base; + + StoreManager = State.getStateManager().getStoreManager(); + QualType ElemTy = SymbolicBase->getPointeeStaticType(); + return loc::MemRegionVal{SM.GetElementZeroRegion(SymbolicBase, ElemTy)}; +} + +SVal ProgramState::getLValue(const FieldDecl *D,
[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)
@@ -451,6 +451,38 @@ void ProgramState::setStore(const StoreRef ) { store = newStoreStore; } +/// FieldRegions are expected to be wrapped by an ElementRegion as a canonical +/// representation. See f8643a9b31c4029942f67d4534c9139b45173504 why. +static SVal wrapSymbolicRegion(const ProgramState , SVal Base) { NagyDonat wrote: Why is this a global static function and not a private method of `ProgramState`? Calling this function with `*this` as the first argument triggers my "Why is this trickery useful?" intuition; and if it could work as a method, then I would prefer that "more natural" solution. https://github.com/llvm/llvm-project/pull/85211 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)
https://github.com/NagyDonat approved this pull request. Looks good overall, my only inline comment is a minor stylistic question. I think it's good that we're moving towards establishing the invariant that the parent of a FieldRegion should always be a typed value region (of a suitable type) for the sake of consistency. Do I understand it correctly that this is not too far away? Would it be possible to add an `assert` that mandates this? On the other hand, I still feel that it's problematic that we're using `ElementRegion` to represent three different things (real element access, pointer arithmetic, type conversions). Even changing the name to something more general would be a step forward. Of course fixing this mess is a difficult long-term goal, and until then it's fine to work with the currently existing tools. https://github.com/llvm/llvm-project/pull/85211 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)
https://github.com/NagyDonat edited https://github.com/llvm/llvm-project/pull/85211 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)
llvmbot wrote: @llvm/pr-subscribers-clang-static-analyzer-1 Author: Balazs Benics (steakhal) Changes Inside the ExprEngine when we process the initializers, we create a PostInitializer program-point, which will refer to the field being initialized, see `FieldLoc` inside `ExprEngine::ProcessInitializer`. When a constructor (of which we evaluate the initializer-list) is analyzed in top-level context, then the `this` pointer will be represented by a `SymbolicRegion`, (as it should be). This means that we will form a `FieldRegion{SymbolicRegion{.}}` as the initialized region. ```c++ class Bear { public: void brum() const; }; class Door { public: // PostInitializer would refer to "FieldRegion{SymRegion{this}}" // whereas in the store and everywhere else it would be: // "FieldRegion{ELementRegion{SymRegion{Ty*, this}, 0, Ty}". Door() : ptr(nullptr) { ptr-brum(); // Bug } private: Bear* ptr; }; ``` We (as CSA folks) decided to avoid the creation of FieldRegions directly of symbolic regions in the past: https://github.com/llvm/llvm-project/commit/f8643a9b31c4029942f67d4534c9139b45173504 --- In this patch, I propose to also canonicalize it as in the mentioned patch, into this: `FieldRegion{ElementRegion{SymbolicRegion{Ty*, .}, 0, Ty}` This would mean that FieldRegions will/should never simply wrap a SymbolicRegion directly, but rather an ElementRegion that is sitting in between. This patch should have practically no observable effects, as the store (due to the mentioned patch) was made resilient to this issue, but we use `PostInitializer::getLocationValue()` for an alternative reporting, where we faced this issue. Note that in really rare cases it suppresses now dereference bugs, as demonstrated in the test. It is because in the past we failed to follow the region of the PostInitializer inside the StoreSiteFinder visitor - because it was using this code: ```c++ // If this is a post initializer expression, initializing the region, we // should track the initializer expression. if (std::optionalPostInitializer PIP = Pred-getLocationAsPostInitializer()) { const MemRegion *FieldReg = (const MemRegion *)PIP-getLocationValue(); if (FieldReg == R) { StoreSite = Pred; InitE = PIP-getInitializer()-getInit(); } } ``` Notice that the equality check didn't pass for the regions I'm canonicalizing in this patch. Given the nature of this change, we would rather upstream this patch. CPP-4954 --- Full diff: https://github.com/llvm/llvm-project/pull/85211.diff 3 Files Affected: - (modified) clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h (-14) - (modified) clang/lib/StaticAnalyzer/Core/ProgramState.cpp (+32) - (modified) clang/test/Analysis/inlining/false-positive-suppression.cpp (+17) ``diff diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h index ca75c2a756a4a5..47fc4ad88d3161 100644 --- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h +++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h @@ -782,20 +782,6 @@ inline SVal ProgramState::getLValue(const ObjCIvarDecl *D, SVal Base) const { return getStateManager().StoreMgr->getLValueIvar(D, Base); } -inline SVal ProgramState::getLValue(const FieldDecl *D, SVal Base) const { - return getStateManager().StoreMgr->getLValueField(D, Base); -} - -inline SVal ProgramState::getLValue(const IndirectFieldDecl *D, -SVal Base) const { - StoreManager = *getStateManager().StoreMgr; - for (const auto *I : D->chain()) { -Base = SM.getLValueField(cast(I), Base); - } - - return Base; -} - inline SVal ProgramState::getLValue(QualType ElementType, SVal Idx, SVal Base) const{ if (std::optional N = Idx.getAs()) return getStateManager().StoreMgr->getLValueElement(ElementType, *N, Base); diff --git a/clang/lib/StaticAnalyzer/Core/ProgramState.cpp b/clang/lib/StaticAnalyzer/Core/ProgramState.cpp index f12f1a5ac970dd..604728cdf1 100644 --- a/clang/lib/StaticAnalyzer/Core/ProgramState.cpp +++ b/clang/lib/StaticAnalyzer/Core/ProgramState.cpp @@ -451,6 +451,38 @@ void ProgramState::setStore(const StoreRef ) { store = newStoreStore; } +/// FieldRegions are expected to be wrapped by an ElementRegion as a canonical +/// representation. See f8643a9b31c4029942f67d4534c9139b45173504 why. +static SVal wrapSymbolicRegion(const ProgramState , SVal Base) { + const auto *SymbolicBase = + dyn_cast_or_null(Base.getAsRegion()); + + if (!SymbolicBase) +return Base; + + StoreManager = State.getStateManager().getStoreManager(); + QualType ElemTy = SymbolicBase->getPointeeStaticType(); + return loc::MemRegionVal{SM.GetElementZeroRegion(SymbolicBase, ElemTy)}; +} + +SVal ProgramState::getLValue(const FieldDecl *D, SVal Base) const { + Base = wrapSymbolicRegion(*this, Base); + return
[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)
https://github.com/steakhal created https://github.com/llvm/llvm-project/pull/85211 Inside the ExprEngine when we process the initializers, we create a PostInitializer program-point, which will refer to the field being initialized, see `FieldLoc` inside `ExprEngine::ProcessInitializer`. When a constructor (of which we evaluate the initializer-list) is analyzed in top-level context, then the `this` pointer will be represented by a `SymbolicRegion`, (as it should be). This means that we will form a `FieldRegion{SymbolicRegion{.}}` as the initialized region. ```c++ class Bear { public: void brum() const; }; class Door { public: // PostInitializer would refer to "FieldRegion{SymRegion{this}}" // whereas in the store and everywhere else it would be: // "FieldRegion{ELementRegion{SymRegion{Ty*, this}, 0, Ty}". Door() : ptr(nullptr) { ptr->brum(); // Bug } private: Bear* ptr; }; ``` We (as CSA folks) decided to avoid the creation of FieldRegions directly of symbolic regions in the past: https://github.com/llvm/llvm-project/commit/f8643a9b31c4029942f67d4534c9139b45173504 --- In this patch, I propose to also canonicalize it as in the mentioned patch, into this: `FieldRegion{ElementRegion{SymbolicRegion{Ty*, .}, 0, Ty}` This would mean that FieldRegions will/should never simply wrap a SymbolicRegion directly, but rather an ElementRegion that is sitting in between. This patch should have practically no observable effects, as the store (due to the mentioned patch) was made resilient to this issue, but we use `PostInitializer::getLocationValue()` for an alternative reporting, where we faced this issue. Note that in really rare cases it suppresses now dereference bugs, as demonstrated in the test. It is because in the past we failed to follow the region of the PostInitializer inside the StoreSiteFinder visitor - because it was using this code: ```c++ // If this is a post initializer expression, initializing the region, we // should track the initializer expression. if (std::optional PIP = Pred->getLocationAs()) { const MemRegion *FieldReg = (const MemRegion *)PIP->getLocationValue(); if (FieldReg == R) { StoreSite = Pred; InitE = PIP->getInitializer()->getInit(); } } ``` Notice that the equality check didn't pass for the regions I'm canonicalizing in this patch. Given the nature of this change, we would rather upstream this patch. CPP-4954 >From bde85e0d145049a6661afba6f4585865c5630792 Mon Sep 17 00:00:00 2001 From: Balazs Benics Date: Thu, 14 Mar 2024 12:38:12 +0100 Subject: [PATCH] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion Inside the ExprEngine when we process the initializers, we create a PostInitializer program-point, which will refer to the field being initialized, see `FieldLoc` inside `ExprEngine::ProcessInitializer`. When a constructor (of which we evaluate the initializer-list) is analyzed in top-level context, then the `this` pointer will be represented by a `SymbolicRegion`, (as it should be). This means that we will form a `FieldRegion{SymbolicRegion{.}}` as the initialized region. ```c++ class Bear { public: void brum() const; }; class Door { public: // PostInitializer would refer to "FieldRegion{SymRegion{this}}" // whereas in the store and everywhere else it would be: // "FieldRegion{ELementRegion{SymRegion{Ty*, this}, 0, Ty}". Door() : ptr(nullptr) { ptr->brum(); // Bug } private: Bear* ptr; }; ``` We (as CSA folks) decided to avoid the creation of FieldRegions directly of symbolic regions in the past: https://github.com/llvm/llvm-project/commit/f8643a9b31c4029942f67d4534c9139b45173504 --- In this patch, I propose to also canonicalize it as in the mentioned patch, into this: `FieldRegion{ElementRegion{SymbolicRegion{Ty*, .}, 0, Ty}` This would mean that FieldRegions will/should never simply wrap a SymbolicRegion directly, but rather an ElementRegion that is sitting in between. This patch should have practically no observable effects, as the store (due to the mentioned patch) was made resilient to this issue, but we use `PostInitializer::getLocationValue()` for an alternative reporting, where we faced this issue. Note that in really rare cases it suppresses now dereference bugs, as demonstrated in the test. It is because in the past we failed to follow the region of the PostInitializer inside the StoreSiteFinder visitor - because it was using this code: ```c++ // If this is a post initializer expression, initializing the region, we // should track the initializer expression. if (std::optional PIP = Pred->getLocationAs()) { const MemRegion *FieldReg = (const MemRegion *)PIP->getLocationValue(); if (FieldReg == R) { StoreSite = Pred; InitE = PIP->getInitializer()->getInit(); } } ``` And the equality check didn't pass for the regions I'm canonicalizing in this patch. Given the nature of this change, we would rather upstream this patch. CPP-4954 ---