[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
=?utf-8?q?Donát?= Nagy , =?utf-8?q?Donát?= Nagy , =?utf-8?q?Donát?= Nagy , =?utf-8?q?Donát?= Nagy Message-ID: In-Reply-To: https://github.com/DonatNagyE closed https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
=?utf-8?q?Don=C3=A1t?= Nagy , =?utf-8?q?Don=C3=A1t?= Nagy , =?utf-8?q?Don=C3=A1t?= Nagy , =?utf-8?q?Don=C3=A1t?= Nagy Message-ID: In-Reply-To: https://github.com/steakhal approved this pull request. https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
=?utf-8?q?Donát?= Nagy , =?utf-8?q?Donát?= Nagy , =?utf-8?q?Donát?= Nagy , =?utf-8?q?Donát?= Nagy Message-ID: In-Reply-To: DonatNagyE wrote: I'll merge this change soon, when I'll start to work on the followup commit (unless there is additional feedback until then). Thanks for the reviews! https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
=?utf-8?q?Donát?= Nagy , =?utf-8?q?Donát?= Nagy , =?utf-8?q?Donát?= Nagy , =?utf-8?q?Donát?= Nagy Message-ID: In-Reply-To: https://github.com/DonatNagyE updated https://github.com/llvm/llvm-project/pull/67572 >From 7bd280e2da3f2ee8fe5fd7086a4704331f21b435 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Don=C3=A1t=20Nagy?= Date: Wed, 6 Sep 2023 13:39:27 +0200 Subject: [PATCH 1/5] [analyzer][NFC] Simplifications in ArrayBoundV2 I'm planning to improve diagnostics generation in `ArrayBoundCheckerV2` but before that I'm refactoring the source code to clean up some over-complicated code and an inaccurate comment. Changes in this commit: - Remove the `mutable std::unique_ptr` boilerplate, because it's no longer needed. - Remove the code duplication between the methods `reportOOB()` and `reportTaintedOOB()`. - Eliminate the class `RegionRawOffsetV2` because it's just a "reinvent the wheel" version of `std::pair` and it was used only once, as a temporary object that was immediately decomposed. (I suspect that `RegionRawOffset` in MemRegion.cpp could also be eliminated.) - Flatten the code of `computeOffset()` which had contained six nested indentation levels before this commit. - Ensure that `computeOffset()` returns `std::nullopt` instead of a `{Region, }` pair in the case when it encounters a `Location` that is not an `ElementRegion`. This ensures that the `checkLocation` callback returns early when it handles a memory access where it has "nothing to do" (no subscript operation or equivalent pointer arithmetic). (Note that this is still NFC because zero is a valid index everywhere, so the old logic without this shortcut eventually reached the same conclusion.) - Correct a wrong explanation comment in `getSimplifiedOffsets()`. --- .../Checkers/ArrayBoundCheckerV2.cpp | 221 +++--- 1 file changed, 86 insertions(+), 135 deletions(-) diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp index db4a2fcea9b2cdd..c2ffcdf5e306d1f 100644 --- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp +++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp @@ -32,15 +32,13 @@ using namespace taint; namespace { class ArrayBoundCheckerV2 : public Checker { - mutable std::unique_ptr BT; - mutable std::unique_ptr TaintBT; + BugType BT{this, "Out-of-bound access"}; + BugType TaintBT{this, "Out-of-bound access", categories::TaintedData}; - enum OOB_Kind { OOB_Precedes, OOB_Excedes }; + enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint }; - void reportOOB(CheckerContext , ProgramStateRef errorState, - OOB_Kind kind) const; - void reportTaintOOB(CheckerContext , ProgramStateRef errorState, - SVal TaintedSVal) const; + void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind, + SVal TaintedSVal = UnknownVal()) const; static bool isFromCtypeMacro(const Stmt *S, ASTContext ); @@ -48,26 +46,58 @@ class ArrayBoundCheckerV2 : void checkLocation(SVal l, bool isLoad, const Stmt *S, CheckerContext ) const; }; +} // anonymous namespace -// FIXME: Eventually replace RegionRawOffset with this class. -class RegionRawOffsetV2 { -private: - const SubRegion *baseRegion; - NonLoc byteOffset; +/// For a given Location that can be represented as a symbolic expression +/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block +/// Arr and the distance of Location from the beginning of Arr (expressed in a +/// NonLoc that specifies the number of CharUnits). Returns nullopt when these +/// cannot be determined. +std::optional> +computeOffset(ProgramStateRef State, SValBuilder , SVal Location) { + QualType T = SVB.getArrayIndexType(); + auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) { +// We will use this utility to add and multiply values. +return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs(); + }; -public: - RegionRawOffsetV2(const SubRegion *base, NonLoc offset) - : baseRegion(base), byteOffset(offset) { assert(base); } + const auto *Region = dyn_cast_or_null(Location.getAsRegion()); + std::optional Offset = std::nullopt; + + while (const auto *ERegion = dyn_cast_or_null(Region)) { +const auto Index = ERegion->getIndex().getAs(); +if (!Index) + return std::nullopt; + +QualType ElemType = ERegion->getElementType(); +// If the element is an incomplete type, go no further. +if (ElemType->isIncompleteType()) + return std::nullopt; + +// Calculate Delta = Index * sizeof(ElemType). +NonLoc Size = SVB.makeArrayIndex( +SVB.getContext().getTypeSizeInChars(ElemType).getQuantity()); +auto Delta = Calc(BO_Mul, *Index, Size); +if (!Delta) + return std::nullopt; + +// Perform Offset += Delta, handling the initial nullopt as 0. +if (!Offset) { + Offset = Delta; +
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
=?utf-8?q?Donát?= Nagy , =?utf-8?q?Donát?= Nagy , =?utf-8?q?Donát?= Nagy Message-ID: In-Reply-To: @@ -32,42 +32,72 @@ using namespace taint; namespace { class ArrayBoundCheckerV2 : public Checker { - mutable std::unique_ptr BT; - mutable std::unique_ptr TaintBT; + BugType BT{this, "Out-of-bound access"}; + BugType TaintBT{this, "Out-of-bound access", categories::TaintedData}; - enum OOB_Kind { OOB_Precedes, OOB_Excedes }; + enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint }; - void reportOOB(CheckerContext , ProgramStateRef errorState, - OOB_Kind kind) const; - void reportTaintOOB(CheckerContext , ProgramStateRef errorState, - SVal TaintedSVal) const; + void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind, + SVal TaintedSVal = UnknownVal()) const; static bool isFromCtypeMacro(const Stmt *S, ASTContext ); public: void checkLocation(SVal l, bool isLoad, const Stmt *S, CheckerContext ) const; }; +} // anonymous namespace -// FIXME: Eventually replace RegionRawOffset with this class. -class RegionRawOffsetV2 { -private: - const SubRegion *baseRegion; - NonLoc byteOffset; +/// For a given Location that can be represented as a symbolic expression +/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block +/// Arr and the distance of Location from the beginning of Arr (expressed in a +/// NonLoc that specifies the number of CharUnits). Returns nullopt when these +/// cannot be determined. +std::optional> +computeOffset(ProgramStateRef State, SValBuilder , SVal Location) { + QualType T = SVB.getArrayIndexType(); + auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) { +// We will use this utility to add and multiply values. +return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs(); + }; -public: - RegionRawOffsetV2(const SubRegion *base, NonLoc offset) - : baseRegion(base), byteOffset(offset) { assert(base); } + const auto *Region = dyn_cast_or_null(Location.getAsRegion()); + std::optional Offset = std::nullopt; + + while (const auto *ERegion = dyn_cast_or_null(Region)) { +const auto Index = ERegion->getIndex().getAs(); +if (!Index) + return std::nullopt; + +QualType ElemType = ERegion->getElementType(); +// If the element is an incomplete type, go no further. +if (ElemType->isIncompleteType()) DonatNagyE wrote: The testsuite passed with `assert(!ElemType->isIncompleteType());` instead of the early return. I also analyzed some open source projects (ffmpeg, postgres, xerces) with that assertion and it wasn't triggered on them. However, I don't remove the early return in this commit because I don't want to risk a revert; but I added a FIXME that it should be removed in the future (and I'll probably do it soon in a stand-alone commit). https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
=?utf-8?q?Donát?= Nagy , =?utf-8?q?Donát?= Nagy , =?utf-8?q?Donát?= Nagy Message-ID: In-Reply-To: @@ -32,42 +32,72 @@ using namespace taint; namespace { class ArrayBoundCheckerV2 : public Checker { - mutable std::unique_ptr BT; - mutable std::unique_ptr TaintBT; + BugType BT{this, "Out-of-bound access"}; + BugType TaintBT{this, "Out-of-bound access", categories::TaintedData}; - enum OOB_Kind { OOB_Precedes, OOB_Excedes }; + enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint }; - void reportOOB(CheckerContext , ProgramStateRef errorState, - OOB_Kind kind) const; - void reportTaintOOB(CheckerContext , ProgramStateRef errorState, - SVal TaintedSVal) const; + void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind, + SVal TaintedSVal = UnknownVal()) const; static bool isFromCtypeMacro(const Stmt *S, ASTContext ); public: void checkLocation(SVal l, bool isLoad, const Stmt *S, CheckerContext ) const; }; +} // anonymous namespace -// FIXME: Eventually replace RegionRawOffset with this class. -class RegionRawOffsetV2 { -private: - const SubRegion *baseRegion; - NonLoc byteOffset; +/// For a given Location that can be represented as a symbolic expression +/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block +/// Arr and the distance of Location from the beginning of Arr (expressed in a +/// NonLoc that specifies the number of CharUnits). Returns nullopt when these +/// cannot be determined. +std::optional> +computeOffset(ProgramStateRef State, SValBuilder , SVal Location) { + QualType T = SVB.getArrayIndexType(); + auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) { +// We will use this utility to add and multiply values. +return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs(); + }; -public: - RegionRawOffsetV2(const SubRegion *base, NonLoc offset) - : baseRegion(base), byteOffset(offset) { assert(base); } + const auto *Region = dyn_cast_or_null(Location.getAsRegion()); + std::optional Offset = std::nullopt; + + while (const auto *ERegion = dyn_cast_or_null(Region)) { +const auto Index = ERegion->getIndex().getAs(); +if (!Index) + return std::nullopt; + +QualType ElemType = ERegion->getElementType(); +// If the element is an incomplete type, go no further. +if (ElemType->isIncompleteType()) + return std::nullopt; + +// Calculate Delta = Index * sizeof(ElemType). +NonLoc Size = SVB.makeArrayIndex( +SVB.getContext().getTypeSizeInChars(ElemType).getQuantity()); +auto Delta = Calc(BO_Mul, *Index, Size); +if (!Delta) + return std::nullopt; + +// Perform Offset += Delta, handling the initial nullopt as 0. +if (!Offset) { + Offset = Delta; +} else { + Offset = Calc(BO_Add, *Offset, *Delta); + if (!Offset) +return std::nullopt; +} DonatNagyE wrote: After some thinking I've chosen to use a different logic to filter out the trivial "no ElementRegion layer" case -- in the freshly uploaded revision it's signified by the value of the region pointer. https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
=?utf-8?q?Donát?= Nagy , =?utf-8?q?Donát?= Nagy , =?utf-8?q?Donát?= Nagy Message-ID: In-Reply-To: https://github.com/DonatNagyE updated https://github.com/llvm/llvm-project/pull/67572 >From 7bd280e2da3f2ee8fe5fd7086a4704331f21b435 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Don=C3=A1t=20Nagy?= Date: Wed, 6 Sep 2023 13:39:27 +0200 Subject: [PATCH 1/4] [analyzer][NFC] Simplifications in ArrayBoundV2 I'm planning to improve diagnostics generation in `ArrayBoundCheckerV2` but before that I'm refactoring the source code to clean up some over-complicated code and an inaccurate comment. Changes in this commit: - Remove the `mutable std::unique_ptr` boilerplate, because it's no longer needed. - Remove the code duplication between the methods `reportOOB()` and `reportTaintedOOB()`. - Eliminate the class `RegionRawOffsetV2` because it's just a "reinvent the wheel" version of `std::pair` and it was used only once, as a temporary object that was immediately decomposed. (I suspect that `RegionRawOffset` in MemRegion.cpp could also be eliminated.) - Flatten the code of `computeOffset()` which had contained six nested indentation levels before this commit. - Ensure that `computeOffset()` returns `std::nullopt` instead of a `{Region, }` pair in the case when it encounters a `Location` that is not an `ElementRegion`. This ensures that the `checkLocation` callback returns early when it handles a memory access where it has "nothing to do" (no subscript operation or equivalent pointer arithmetic). (Note that this is still NFC because zero is a valid index everywhere, so the old logic without this shortcut eventually reached the same conclusion.) - Correct a wrong explanation comment in `getSimplifiedOffsets()`. --- .../Checkers/ArrayBoundCheckerV2.cpp | 221 +++--- 1 file changed, 86 insertions(+), 135 deletions(-) diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp index db4a2fcea9b2cdd..c2ffcdf5e306d1f 100644 --- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp +++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp @@ -32,15 +32,13 @@ using namespace taint; namespace { class ArrayBoundCheckerV2 : public Checker { - mutable std::unique_ptr BT; - mutable std::unique_ptr TaintBT; + BugType BT{this, "Out-of-bound access"}; + BugType TaintBT{this, "Out-of-bound access", categories::TaintedData}; - enum OOB_Kind { OOB_Precedes, OOB_Excedes }; + enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint }; - void reportOOB(CheckerContext , ProgramStateRef errorState, - OOB_Kind kind) const; - void reportTaintOOB(CheckerContext , ProgramStateRef errorState, - SVal TaintedSVal) const; + void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind, + SVal TaintedSVal = UnknownVal()) const; static bool isFromCtypeMacro(const Stmt *S, ASTContext ); @@ -48,26 +46,58 @@ class ArrayBoundCheckerV2 : void checkLocation(SVal l, bool isLoad, const Stmt *S, CheckerContext ) const; }; +} // anonymous namespace -// FIXME: Eventually replace RegionRawOffset with this class. -class RegionRawOffsetV2 { -private: - const SubRegion *baseRegion; - NonLoc byteOffset; +/// For a given Location that can be represented as a symbolic expression +/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block +/// Arr and the distance of Location from the beginning of Arr (expressed in a +/// NonLoc that specifies the number of CharUnits). Returns nullopt when these +/// cannot be determined. +std::optional> +computeOffset(ProgramStateRef State, SValBuilder , SVal Location) { + QualType T = SVB.getArrayIndexType(); + auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) { +// We will use this utility to add and multiply values. +return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs(); + }; -public: - RegionRawOffsetV2(const SubRegion *base, NonLoc offset) - : baseRegion(base), byteOffset(offset) { assert(base); } + const auto *Region = dyn_cast_or_null(Location.getAsRegion()); + std::optional Offset = std::nullopt; + + while (const auto *ERegion = dyn_cast_or_null(Region)) { +const auto Index = ERegion->getIndex().getAs(); +if (!Index) + return std::nullopt; + +QualType ElemType = ERegion->getElementType(); +// If the element is an incomplete type, go no further. +if (ElemType->isIncompleteType()) + return std::nullopt; + +// Calculate Delta = Index * sizeof(ElemType). +NonLoc Size = SVB.makeArrayIndex( +SVB.getContext().getTypeSizeInChars(ElemType).getQuantity()); +auto Delta = Calc(BO_Mul, *Index, Size); +if (!Delta) + return std::nullopt; + +// Perform Offset += Delta, handling the initial nullopt as 0. +if (!Offset) { + Offset = Delta; +} else { +
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
=?utf-8?q?Donát?= Nagy , =?utf-8?q?Donát?= Nagy Message-ID: In-Reply-To: @@ -32,42 +32,72 @@ using namespace taint; namespace { class ArrayBoundCheckerV2 : public Checker { - mutable std::unique_ptr BT; - mutable std::unique_ptr TaintBT; + BugType BT{this, "Out-of-bound access"}; + BugType TaintBT{this, "Out-of-bound access", categories::TaintedData}; - enum OOB_Kind { OOB_Precedes, OOB_Excedes }; + enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint }; - void reportOOB(CheckerContext , ProgramStateRef errorState, - OOB_Kind kind) const; - void reportTaintOOB(CheckerContext , ProgramStateRef errorState, - SVal TaintedSVal) const; + void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind, + SVal TaintedSVal = UnknownVal()) const; static bool isFromCtypeMacro(const Stmt *S, ASTContext ); public: void checkLocation(SVal l, bool isLoad, const Stmt *S, CheckerContext ) const; }; +} // anonymous namespace -// FIXME: Eventually replace RegionRawOffset with this class. -class RegionRawOffsetV2 { -private: - const SubRegion *baseRegion; - NonLoc byteOffset; +/// For a given Location that can be represented as a symbolic expression +/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block +/// Arr and the distance of Location from the beginning of Arr (expressed in a +/// NonLoc that specifies the number of CharUnits). Returns nullopt when these +/// cannot be determined. +std::optional> +computeOffset(ProgramStateRef State, SValBuilder , SVal Location) { + QualType T = SVB.getArrayIndexType(); + auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) { +// We will use this utility to add and multiply values. +return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs(); + }; -public: - RegionRawOffsetV2(const SubRegion *base, NonLoc offset) - : baseRegion(base), byteOffset(offset) { assert(base); } + const auto *Region = dyn_cast_or_null(Location.getAsRegion()); + std::optional Offset = std::nullopt; + + while (const auto *ERegion = dyn_cast_or_null(Region)) { +const auto Index = ERegion->getIndex().getAs(); +if (!Index) + return std::nullopt; + +QualType ElemType = ERegion->getElementType(); +// If the element is an incomplete type, go no further. +if (ElemType->isIncompleteType()) + return std::nullopt; + +// Calculate Delta = Index * sizeof(ElemType). +NonLoc Size = SVB.makeArrayIndex( +SVB.getContext().getTypeSizeInChars(ElemType).getQuantity()); +auto Delta = Calc(BO_Mul, *Index, Size); +if (!Delta) + return std::nullopt; + +// Perform Offset += Delta, handling the initial nullopt as 0. +if (!Offset) { + Offset = Delta; +} else { + Offset = Calc(BO_Add, *Offset, *Delta); + if (!Offset) +return std::nullopt; +} steakhal wrote: I still dislike this, but I won't object. I find `if` statements assigning to the same variable a codesmell. https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
=?utf-8?q?Donát?= Nagy , =?utf-8?q?Donát?= Nagy Message-ID: In-Reply-To: @@ -32,42 +32,72 @@ using namespace taint; namespace { class ArrayBoundCheckerV2 : public Checker { - mutable std::unique_ptr BT; - mutable std::unique_ptr TaintBT; + BugType BT{this, "Out-of-bound access"}; + BugType TaintBT{this, "Out-of-bound access", categories::TaintedData}; - enum OOB_Kind { OOB_Precedes, OOB_Excedes }; + enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint }; - void reportOOB(CheckerContext , ProgramStateRef errorState, - OOB_Kind kind) const; - void reportTaintOOB(CheckerContext , ProgramStateRef errorState, - SVal TaintedSVal) const; + void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind, + SVal TaintedSVal = UnknownVal()) const; static bool isFromCtypeMacro(const Stmt *S, ASTContext ); public: void checkLocation(SVal l, bool isLoad, const Stmt *S, CheckerContext ) const; }; +} // anonymous namespace -// FIXME: Eventually replace RegionRawOffset with this class. -class RegionRawOffsetV2 { -private: - const SubRegion *baseRegion; - NonLoc byteOffset; +/// For a given Location that can be represented as a symbolic expression +/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block +/// Arr and the distance of Location from the beginning of Arr (expressed in a +/// NonLoc that specifies the number of CharUnits). Returns nullopt when these +/// cannot be determined. +std::optional> +computeOffset(ProgramStateRef State, SValBuilder , SVal Location) { + QualType T = SVB.getArrayIndexType(); + auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) { +// We will use this utility to add and multiply values. +return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs(); + }; -public: - RegionRawOffsetV2(const SubRegion *base, NonLoc offset) - : baseRegion(base), byteOffset(offset) { assert(base); } + const auto *Region = dyn_cast_or_null(Location.getAsRegion()); + std::optional Offset = std::nullopt; + + while (const auto *ERegion = dyn_cast_or_null(Region)) { +const auto Index = ERegion->getIndex().getAs(); +if (!Index) + return std::nullopt; + +QualType ElemType = ERegion->getElementType(); +// If the element is an incomplete type, go no further. +if (ElemType->isIncompleteType()) steakhal wrote: Just run the test suite with an assert there. I bet you will find a case given that we have a comment about it. https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
=?utf-8?q?Donát?= Nagy , =?utf-8?q?Donát?= Nagy Message-ID: In-Reply-To: https://github.com/steakhal edited https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
=?utf-8?q?Don=C3=A1t?= Nagy , =?utf-8?q?Don=C3=A1t?= Nagy Message-ID: In-Reply-To: https://github.com/steakhal approved this pull request. No blocking comments. https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
=?utf-8?q?Donát?= Nagy , =?utf-8?q?Donát?= Nagy Message-ID: In-Reply-To: DonatNagyE wrote: @steakhal Could you briefly look at this commit? I'm open to modifying it if you say that it's necessary, but if I could merge it, then I'd be able to rebase and finalize my next commit (which adds detailed diagnostic messages to ArrayBoundV2). https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
=?utf-8?q?Donát?= Nagy , =?utf-8?q?Donát?= Nagy Message-ID: In-Reply-To: @@ -32,42 +32,72 @@ using namespace taint; namespace { class ArrayBoundCheckerV2 : public Checker { - mutable std::unique_ptr BT; - mutable std::unique_ptr TaintBT; + BugType BT{this, "Out-of-bound access"}; + BugType TaintBT{this, "Out-of-bound access", categories::TaintedData}; - enum OOB_Kind { OOB_Precedes, OOB_Excedes }; + enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint }; - void reportOOB(CheckerContext , ProgramStateRef errorState, - OOB_Kind kind) const; - void reportTaintOOB(CheckerContext , ProgramStateRef errorState, - SVal TaintedSVal) const; + void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind, + SVal TaintedSVal = UnknownVal()) const; static bool isFromCtypeMacro(const Stmt *S, ASTContext ); public: void checkLocation(SVal l, bool isLoad, const Stmt *S, CheckerContext ) const; }; +} // anonymous namespace -// FIXME: Eventually replace RegionRawOffset with this class. -class RegionRawOffsetV2 { -private: - const SubRegion *baseRegion; - NonLoc byteOffset; +/// For a given Location that can be represented as a symbolic expression +/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block +/// Arr and the distance of Location from the beginning of Arr (expressed in a +/// NonLoc that specifies the number of CharUnits). Returns nullopt when these +/// cannot be determined. +std::optional> +computeOffset(ProgramStateRef State, SValBuilder , SVal Location) { + QualType T = SVB.getArrayIndexType(); + auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) { DonatNagyE wrote: Now I recall why I picked the short name -- it's the definition of the lambda where a longer name doesn't fit. However, I was just able to squeeze `EvalBinOp` in by reducing `LHS` and `RHS` to `L` and `R` (which seems to be enough in a single-line lambda). https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
=?utf-8?q?Donát?= Nagy , =?utf-8?q?Donát?= Nagy Message-ID: In-Reply-To: https://github.com/DonatNagyE updated https://github.com/llvm/llvm-project/pull/67572 >From 7bd280e2da3f2ee8fe5fd7086a4704331f21b435 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Don=C3=A1t=20Nagy?= Date: Wed, 6 Sep 2023 13:39:27 +0200 Subject: [PATCH 1/3] [analyzer][NFC] Simplifications in ArrayBoundV2 I'm planning to improve diagnostics generation in `ArrayBoundCheckerV2` but before that I'm refactoring the source code to clean up some over-complicated code and an inaccurate comment. Changes in this commit: - Remove the `mutable std::unique_ptr` boilerplate, because it's no longer needed. - Remove the code duplication between the methods `reportOOB()` and `reportTaintedOOB()`. - Eliminate the class `RegionRawOffsetV2` because it's just a "reinvent the wheel" version of `std::pair` and it was used only once, as a temporary object that was immediately decomposed. (I suspect that `RegionRawOffset` in MemRegion.cpp could also be eliminated.) - Flatten the code of `computeOffset()` which had contained six nested indentation levels before this commit. - Ensure that `computeOffset()` returns `std::nullopt` instead of a `{Region, }` pair in the case when it encounters a `Location` that is not an `ElementRegion`. This ensures that the `checkLocation` callback returns early when it handles a memory access where it has "nothing to do" (no subscript operation or equivalent pointer arithmetic). (Note that this is still NFC because zero is a valid index everywhere, so the old logic without this shortcut eventually reached the same conclusion.) - Correct a wrong explanation comment in `getSimplifiedOffsets()`. --- .../Checkers/ArrayBoundCheckerV2.cpp | 221 +++--- 1 file changed, 86 insertions(+), 135 deletions(-) diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp index db4a2fcea9b2cdd..c2ffcdf5e306d1f 100644 --- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp +++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp @@ -32,15 +32,13 @@ using namespace taint; namespace { class ArrayBoundCheckerV2 : public Checker { - mutable std::unique_ptr BT; - mutable std::unique_ptr TaintBT; + BugType BT{this, "Out-of-bound access"}; + BugType TaintBT{this, "Out-of-bound access", categories::TaintedData}; - enum OOB_Kind { OOB_Precedes, OOB_Excedes }; + enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint }; - void reportOOB(CheckerContext , ProgramStateRef errorState, - OOB_Kind kind) const; - void reportTaintOOB(CheckerContext , ProgramStateRef errorState, - SVal TaintedSVal) const; + void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind, + SVal TaintedSVal = UnknownVal()) const; static bool isFromCtypeMacro(const Stmt *S, ASTContext ); @@ -48,26 +46,58 @@ class ArrayBoundCheckerV2 : void checkLocation(SVal l, bool isLoad, const Stmt *S, CheckerContext ) const; }; +} // anonymous namespace -// FIXME: Eventually replace RegionRawOffset with this class. -class RegionRawOffsetV2 { -private: - const SubRegion *baseRegion; - NonLoc byteOffset; +/// For a given Location that can be represented as a symbolic expression +/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block +/// Arr and the distance of Location from the beginning of Arr (expressed in a +/// NonLoc that specifies the number of CharUnits). Returns nullopt when these +/// cannot be determined. +std::optional> +computeOffset(ProgramStateRef State, SValBuilder , SVal Location) { + QualType T = SVB.getArrayIndexType(); + auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) { +// We will use this utility to add and multiply values. +return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs(); + }; -public: - RegionRawOffsetV2(const SubRegion *base, NonLoc offset) - : baseRegion(base), byteOffset(offset) { assert(base); } + const auto *Region = dyn_cast_or_null(Location.getAsRegion()); + std::optional Offset = std::nullopt; + + while (const auto *ERegion = dyn_cast_or_null(Region)) { +const auto Index = ERegion->getIndex().getAs(); +if (!Index) + return std::nullopt; + +QualType ElemType = ERegion->getElementType(); +// If the element is an incomplete type, go no further. +if (ElemType->isIncompleteType()) + return std::nullopt; + +// Calculate Delta = Index * sizeof(ElemType). +NonLoc Size = SVB.makeArrayIndex( +SVB.getContext().getTypeSizeInChars(ElemType).getQuantity()); +auto Delta = Calc(BO_Mul, *Index, Size); +if (!Delta) + return std::nullopt; + +// Perform Offset += Delta, handling the initial nullopt as 0. +if (!Offset) { + Offset = Delta; +} else { + Offset = Calc(BO_Add, *Offset,
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
=?utf-8?q?Donát?= Nagy Message-ID: In-Reply-To: github-actions[bot] wrote: :warning: C/C++ code formatter, clang-format found issues in your code. :warning: You can test this locally with the following command: ``bash git-clang-format --diff b22917e6e2a0aec05474f58e64b7e87d1ea0a054 59bccd98022f61e6a656dca6ac3fc5622f994226 -- clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp `` View the diff from clang-format here. ``diff diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp index 0b53aa9b2051..6819cd6a6b90 100644 --- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp +++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp @@ -56,7 +56,8 @@ public: std::optional> computeOffset(ProgramStateRef State, SValBuilder , SVal Location) { QualType T = SVB.getArrayIndexType(); - auto EvalBinOp = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) { + auto EvalBinOp = [, State, T](BinaryOperatorKind Op, NonLoc LHS, +NonLoc RHS) { // We will use this utility to add and multiply values. return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs(); }; `` https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
=?utf-8?q?Donát?= Nagy Message-ID: In-Reply-To: @@ -32,42 +32,72 @@ using namespace taint; namespace { class ArrayBoundCheckerV2 : public Checker { - mutable std::unique_ptr BT; - mutable std::unique_ptr TaintBT; + BugType BT{this, "Out-of-bound access"}; + BugType TaintBT{this, "Out-of-bound access", categories::TaintedData}; - enum OOB_Kind { OOB_Precedes, OOB_Excedes }; + enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint }; - void reportOOB(CheckerContext , ProgramStateRef errorState, - OOB_Kind kind) const; - void reportTaintOOB(CheckerContext , ProgramStateRef errorState, - SVal TaintedSVal) const; + void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind, + SVal TaintedSVal = UnknownVal()) const; static bool isFromCtypeMacro(const Stmt *S, ASTContext ); public: void checkLocation(SVal l, bool isLoad, const Stmt *S, CheckerContext ) const; }; +} // anonymous namespace -// FIXME: Eventually replace RegionRawOffset with this class. -class RegionRawOffsetV2 { -private: - const SubRegion *baseRegion; - NonLoc byteOffset; +/// For a given Location that can be represented as a symbolic expression +/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block +/// Arr and the distance of Location from the beginning of Arr (expressed in a +/// NonLoc that specifies the number of CharUnits). Returns nullopt when these +/// cannot be determined. +std::optional> +computeOffset(ProgramStateRef State, SValBuilder , SVal Location) { + QualType T = SVB.getArrayIndexType(); + auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) { +// We will use this utility to add and multiply values. +return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs(); + }; -public: - RegionRawOffsetV2(const SubRegion *base, NonLoc offset) - : baseRegion(base), byteOffset(offset) { assert(base); } + const auto *Region = dyn_cast_or_null(Location.getAsRegion()); + std::optional Offset = std::nullopt; DonatNagyE wrote: Also note that creating zero-valued `NonLoc` (with the appropriate type...) is approximately two lines of code, so overall this solution is less verbose than using a separate early return to filter out the situations when the initial value of `Region` is not an `ElementRegion`. https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
=?utf-8?q?Donát?= Nagy Message-ID: In-Reply-To: https://github.com/DonatNagyE updated https://github.com/llvm/llvm-project/pull/67572 >From 7bd280e2da3f2ee8fe5fd7086a4704331f21b435 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Don=C3=A1t=20Nagy?= Date: Wed, 6 Sep 2023 13:39:27 +0200 Subject: [PATCH 1/2] [analyzer][NFC] Simplifications in ArrayBoundV2 I'm planning to improve diagnostics generation in `ArrayBoundCheckerV2` but before that I'm refactoring the source code to clean up some over-complicated code and an inaccurate comment. Changes in this commit: - Remove the `mutable std::unique_ptr` boilerplate, because it's no longer needed. - Remove the code duplication between the methods `reportOOB()` and `reportTaintedOOB()`. - Eliminate the class `RegionRawOffsetV2` because it's just a "reinvent the wheel" version of `std::pair` and it was used only once, as a temporary object that was immediately decomposed. (I suspect that `RegionRawOffset` in MemRegion.cpp could also be eliminated.) - Flatten the code of `computeOffset()` which had contained six nested indentation levels before this commit. - Ensure that `computeOffset()` returns `std::nullopt` instead of a `{Region, }` pair in the case when it encounters a `Location` that is not an `ElementRegion`. This ensures that the `checkLocation` callback returns early when it handles a memory access where it has "nothing to do" (no subscript operation or equivalent pointer arithmetic). (Note that this is still NFC because zero is a valid index everywhere, so the old logic without this shortcut eventually reached the same conclusion.) - Correct a wrong explanation comment in `getSimplifiedOffsets()`. --- .../Checkers/ArrayBoundCheckerV2.cpp | 221 +++--- 1 file changed, 86 insertions(+), 135 deletions(-) diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp index db4a2fcea9b2cdd..c2ffcdf5e306d1f 100644 --- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp +++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp @@ -32,15 +32,13 @@ using namespace taint; namespace { class ArrayBoundCheckerV2 : public Checker { - mutable std::unique_ptr BT; - mutable std::unique_ptr TaintBT; + BugType BT{this, "Out-of-bound access"}; + BugType TaintBT{this, "Out-of-bound access", categories::TaintedData}; - enum OOB_Kind { OOB_Precedes, OOB_Excedes }; + enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint }; - void reportOOB(CheckerContext , ProgramStateRef errorState, - OOB_Kind kind) const; - void reportTaintOOB(CheckerContext , ProgramStateRef errorState, - SVal TaintedSVal) const; + void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind, + SVal TaintedSVal = UnknownVal()) const; static bool isFromCtypeMacro(const Stmt *S, ASTContext ); @@ -48,26 +46,58 @@ class ArrayBoundCheckerV2 : void checkLocation(SVal l, bool isLoad, const Stmt *S, CheckerContext ) const; }; +} // anonymous namespace -// FIXME: Eventually replace RegionRawOffset with this class. -class RegionRawOffsetV2 { -private: - const SubRegion *baseRegion; - NonLoc byteOffset; +/// For a given Location that can be represented as a symbolic expression +/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block +/// Arr and the distance of Location from the beginning of Arr (expressed in a +/// NonLoc that specifies the number of CharUnits). Returns nullopt when these +/// cannot be determined. +std::optional> +computeOffset(ProgramStateRef State, SValBuilder , SVal Location) { + QualType T = SVB.getArrayIndexType(); + auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) { +// We will use this utility to add and multiply values. +return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs(); + }; -public: - RegionRawOffsetV2(const SubRegion *base, NonLoc offset) - : baseRegion(base), byteOffset(offset) { assert(base); } + const auto *Region = dyn_cast_or_null(Location.getAsRegion()); + std::optional Offset = std::nullopt; + + while (const auto *ERegion = dyn_cast_or_null(Region)) { +const auto Index = ERegion->getIndex().getAs(); +if (!Index) + return std::nullopt; + +QualType ElemType = ERegion->getElementType(); +// If the element is an incomplete type, go no further. +if (ElemType->isIncompleteType()) + return std::nullopt; + +// Calculate Delta = Index * sizeof(ElemType). +NonLoc Size = SVB.makeArrayIndex( +SVB.getContext().getTypeSizeInChars(ElemType).getQuantity()); +auto Delta = Calc(BO_Mul, *Index, Size); +if (!Delta) + return std::nullopt; + +// Perform Offset += Delta, handling the initial nullopt as 0. +if (!Offset) { + Offset = Delta; +} else { + Offset = Calc(BO_Add, *Offset, *Delta); + if
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
@@ -32,42 +32,72 @@ using namespace taint; namespace { class ArrayBoundCheckerV2 : public Checker { - mutable std::unique_ptr BT; - mutable std::unique_ptr TaintBT; + BugType BT{this, "Out-of-bound access"}; + BugType TaintBT{this, "Out-of-bound access", categories::TaintedData}; - enum OOB_Kind { OOB_Precedes, OOB_Excedes }; + enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint }; - void reportOOB(CheckerContext , ProgramStateRef errorState, - OOB_Kind kind) const; - void reportTaintOOB(CheckerContext , ProgramStateRef errorState, - SVal TaintedSVal) const; + void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind, + SVal TaintedSVal = UnknownVal()) const; static bool isFromCtypeMacro(const Stmt *S, ASTContext ); public: void checkLocation(SVal l, bool isLoad, const Stmt *S, CheckerContext ) const; }; +} // anonymous namespace -// FIXME: Eventually replace RegionRawOffset with this class. -class RegionRawOffsetV2 { -private: - const SubRegion *baseRegion; - NonLoc byteOffset; +/// For a given Location that can be represented as a symbolic expression +/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block +/// Arr and the distance of Location from the beginning of Arr (expressed in a +/// NonLoc that specifies the number of CharUnits). Returns nullopt when these +/// cannot be determined. +std::optional> +computeOffset(ProgramStateRef State, SValBuilder , SVal Location) { + QualType T = SVB.getArrayIndexType(); + auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) { DonatNagyE wrote: Good suggestion, I'll change it. Previously the four-character name was needed because I was using this in very deeply indented code, but now I can switch to this more descriptive name. https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
@@ -32,42 +32,72 @@ using namespace taint; namespace { class ArrayBoundCheckerV2 : public Checker { - mutable std::unique_ptr BT; - mutable std::unique_ptr TaintBT; + BugType BT{this, "Out-of-bound access"}; + BugType TaintBT{this, "Out-of-bound access", categories::TaintedData}; - enum OOB_Kind { OOB_Precedes, OOB_Excedes }; + enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint }; - void reportOOB(CheckerContext , ProgramStateRef errorState, - OOB_Kind kind) const; - void reportTaintOOB(CheckerContext , ProgramStateRef errorState, - SVal TaintedSVal) const; + void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind, + SVal TaintedSVal = UnknownVal()) const; static bool isFromCtypeMacro(const Stmt *S, ASTContext ); public: void checkLocation(SVal l, bool isLoad, const Stmt *S, CheckerContext ) const; }; +} // anonymous namespace -// FIXME: Eventually replace RegionRawOffset with this class. -class RegionRawOffsetV2 { -private: - const SubRegion *baseRegion; - NonLoc byteOffset; +/// For a given Location that can be represented as a symbolic expression +/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block +/// Arr and the distance of Location from the beginning of Arr (expressed in a +/// NonLoc that specifies the number of CharUnits). Returns nullopt when these +/// cannot be determined. +std::optional> +computeOffset(ProgramStateRef State, SValBuilder , SVal Location) { + QualType T = SVB.getArrayIndexType(); + auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) { +// We will use this utility to add and multiply values. +return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs(); + }; -public: - RegionRawOffsetV2(const SubRegion *base, NonLoc offset) - : baseRegion(base), byteOffset(offset) { assert(base); } + const auto *Region = dyn_cast_or_null(Location.getAsRegion()); + std::optional Offset = std::nullopt; + + while (const auto *ERegion = dyn_cast_or_null(Region)) { +const auto Index = ERegion->getIndex().getAs(); +if (!Index) + return std::nullopt; + +QualType ElemType = ERegion->getElementType(); +// If the element is an incomplete type, go no further. +if (ElemType->isIncompleteType()) DonatNagyE wrote: Good question, I "inherited" this check from the older variants of the code and I can easily imagine that it's never triggered. I'd prefer to avoid this question by just adding a "Paranoia: " prefix to this comment -- but I can investigate the situation if you think that it's important. https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
@@ -32,42 +32,72 @@ using namespace taint; namespace { class ArrayBoundCheckerV2 : public Checker { - mutable std::unique_ptr BT; - mutable std::unique_ptr TaintBT; + BugType BT{this, "Out-of-bound access"}; + BugType TaintBT{this, "Out-of-bound access", categories::TaintedData}; - enum OOB_Kind { OOB_Precedes, OOB_Excedes }; + enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint }; - void reportOOB(CheckerContext , ProgramStateRef errorState, - OOB_Kind kind) const; - void reportTaintOOB(CheckerContext , ProgramStateRef errorState, - SVal TaintedSVal) const; + void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind, + SVal TaintedSVal = UnknownVal()) const; static bool isFromCtypeMacro(const Stmt *S, ASTContext ); public: void checkLocation(SVal l, bool isLoad, const Stmt *S, CheckerContext ) const; }; +} // anonymous namespace -// FIXME: Eventually replace RegionRawOffset with this class. -class RegionRawOffsetV2 { -private: - const SubRegion *baseRegion; - NonLoc byteOffset; +/// For a given Location that can be represented as a symbolic expression +/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block +/// Arr and the distance of Location from the beginning of Arr (expressed in a +/// NonLoc that specifies the number of CharUnits). Returns nullopt when these +/// cannot be determined. +std::optional> +computeOffset(ProgramStateRef State, SValBuilder , SVal Location) { + QualType T = SVB.getArrayIndexType(); + auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) { +// We will use this utility to add and multiply values. +return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs(); + }; -public: - RegionRawOffsetV2(const SubRegion *base, NonLoc offset) - : baseRegion(base), byteOffset(offset) { assert(base); } + const auto *Region = dyn_cast_or_null(Location.getAsRegion()); + std::optional Offset = std::nullopt; + + while (const auto *ERegion = dyn_cast_or_null(Region)) { +const auto Index = ERegion->getIndex().getAs(); +if (!Index) + return std::nullopt; + +QualType ElemType = ERegion->getElementType(); +// If the element is an incomplete type, go no further. +if (ElemType->isIncompleteType()) + return std::nullopt; + +// Calculate Delta = Index * sizeof(ElemType). +NonLoc Size = SVB.makeArrayIndex( +SVB.getContext().getTypeSizeInChars(ElemType).getQuantity()); +auto Delta = Calc(BO_Mul, *Index, Size); +if (!Delta) + return std::nullopt; + +// Perform Offset += Delta, handling the initial nullopt as 0. +if (!Offset) { + Offset = Delta; +} else { + Offset = Calc(BO_Add, *Offset, *Delta); + if (!Offset) +return std::nullopt; +} DonatNagyE wrote: The `!Offset` case is just for handling the initial value of the variable `Offset`; which is an intentional special case to detect situations where the `while` loop wasn't entered at all. https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
@@ -32,42 +32,72 @@ using namespace taint; namespace { class ArrayBoundCheckerV2 : public Checker { - mutable std::unique_ptr BT; - mutable std::unique_ptr TaintBT; + BugType BT{this, "Out-of-bound access"}; + BugType TaintBT{this, "Out-of-bound access", categories::TaintedData}; - enum OOB_Kind { OOB_Precedes, OOB_Excedes }; + enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint }; - void reportOOB(CheckerContext , ProgramStateRef errorState, - OOB_Kind kind) const; - void reportTaintOOB(CheckerContext , ProgramStateRef errorState, - SVal TaintedSVal) const; + void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind, + SVal TaintedSVal = UnknownVal()) const; static bool isFromCtypeMacro(const Stmt *S, ASTContext ); public: void checkLocation(SVal l, bool isLoad, const Stmt *S, CheckerContext ) const; }; +} // anonymous namespace -// FIXME: Eventually replace RegionRawOffset with this class. -class RegionRawOffsetV2 { -private: - const SubRegion *baseRegion; - NonLoc byteOffset; +/// For a given Location that can be represented as a symbolic expression +/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block +/// Arr and the distance of Location from the beginning of Arr (expressed in a +/// NonLoc that specifies the number of CharUnits). Returns nullopt when these +/// cannot be determined. +std::optional> +computeOffset(ProgramStateRef State, SValBuilder , SVal Location) { + QualType T = SVB.getArrayIndexType(); + auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) { +// We will use this utility to add and multiply values. +return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs(); + }; -public: - RegionRawOffsetV2(const SubRegion *base, NonLoc offset) - : baseRegion(base), byteOffset(offset) { assert(base); } + const auto *Region = dyn_cast_or_null(Location.getAsRegion()); + std::optional Offset = std::nullopt; DonatNagyE wrote: It's intentionally set to `std::nullopt` instead of zero because this way we won't perform the bounds checks on Location checks involving regions that don't have any `ElementRegion` layers. I think that this might provide a significant performance advantage, as it may be costly to perform simplification steps and evaluate comparisons each time a "plain" variable is accessed. I'd like to keep this solution, but I'll add a comment to explain the motivations. https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
DonatNagyE wrote: After some delays (I had a short vacation, then forgot about this patch) I finally got some test results... and they revealed that it was causing lots of segfaults. @steakhal thanks for asking for the experimental confirmation, it was useful! After fixing the buggy line and pushing the update, I re-analyzed the open source projects, and this produced the intended results: the analysis did not crash and the set of the reports did not change. Based on this I think that this commit is ready to be merged now. For reference, I insert the table of the test results, but it contains no differences: | CodeChecker results | | | | --- | --- | --- | | bitcoin_v0.20.1_baseline_vs_bitcoin_v0.20.1_new | [New reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=bitcoin_v0.20.1_baseline=bitcoin_v0.20.1_new=on=New) | [Lost reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=bitcoin_v0.20.1_baseline=bitcoin_v0.20.1_new=on=Resolved) | | redis_6.2.6_baseline_vs_redis_6.2.6_new | [New reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=redis_6.2.6_baseline=redis_6.2.6_new=on=New) | [Lost reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=redis_6.2.6_baseline=redis_6.2.6_new=on=Resolved) | | protobuf_v3.13.0_baseline_vs_protobuf_v3.13.0_new | [New reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=protobuf_v3.13.0_baseline=protobuf_v3.13.0_new=on=New) | [Lost reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=protobuf_v3.13.0_baseline=protobuf_v3.13.0_new=on=Resolved) | | ffmpeg_n4.3.1_baseline_vs_ffmpeg_n4.3.1_new | [New reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=ffmpeg_n4.3.1_baseline=ffmpeg_n4.3.1_new=on=New) | [Lost reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=ffmpeg_n4.3.1_baseline=ffmpeg_n4.3.1_new=on=Resolved) | | postgres_REL_13_0_baseline_vs_postgres_REL_13_0_new | [New reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=postgres_REL_13_0_baseline=postgres_REL_13_0_new=on=New) | [Lost reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=postgres_REL_13_0_baseline=postgres_REL_13_0_new=on=Resolved) | | sqlite_version-3.33.0_baseline_vs_sqlite_version-3.33.0_new | [New reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=sqlite_version-3.33.0_baseline=sqlite_version-3.33.0_new=on=New) | [Lost reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=sqlite_version-3.33.0_baseline=sqlite_version-3.33.0_new=on=Resolved) | https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
@@ -32,42 +32,72 @@ using namespace taint; namespace { class ArrayBoundCheckerV2 : public Checker { - mutable std::unique_ptr BT; - mutable std::unique_ptr TaintBT; + BugType BT{this, "Out-of-bound access"}; + BugType TaintBT{this, "Out-of-bound access", categories::TaintedData}; - enum OOB_Kind { OOB_Precedes, OOB_Excedes }; + enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint }; - void reportOOB(CheckerContext , ProgramStateRef errorState, - OOB_Kind kind) const; - void reportTaintOOB(CheckerContext , ProgramStateRef errorState, - SVal TaintedSVal) const; + void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind, + SVal TaintedSVal = UnknownVal()) const; static bool isFromCtypeMacro(const Stmt *S, ASTContext ); public: void checkLocation(SVal l, bool isLoad, const Stmt *S, CheckerContext ) const; }; +} // anonymous namespace -// FIXME: Eventually replace RegionRawOffset with this class. -class RegionRawOffsetV2 { -private: - const SubRegion *baseRegion; - NonLoc byteOffset; +/// For a given Location that can be represented as a symbolic expression +/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block +/// Arr and the distance of Location from the beginning of Arr (expressed in a +/// NonLoc that specifies the number of CharUnits). Returns nullopt when these +/// cannot be determined. +std::optional> +computeOffset(ProgramStateRef State, SValBuilder , SVal Location) { + QualType T = SVB.getArrayIndexType(); + auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) { +// We will use this utility to add and multiply values. +return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs(); + }; -public: - RegionRawOffsetV2(const SubRegion *base, NonLoc offset) - : baseRegion(base), byteOffset(offset) { assert(base); } + const auto *Region = dyn_cast_or_null(Location.getAsRegion()); + std::optional Offset = std::nullopt; + + while (const auto *ERegion = dyn_cast_or_null(Region)) { +const auto Index = ERegion->getIndex().getAs(); +if (!Index) + return std::nullopt; + +QualType ElemType = ERegion->getElementType(); +// If the element is an incomplete type, go no further. +if (ElemType->isIncompleteType()) steakhal wrote: How can this be incomplete at this point? VLAs? I'd appreciate an example in the comment. https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
@@ -32,42 +32,72 @@ using namespace taint; namespace { class ArrayBoundCheckerV2 : public Checker { - mutable std::unique_ptr BT; - mutable std::unique_ptr TaintBT; + BugType BT{this, "Out-of-bound access"}; + BugType TaintBT{this, "Out-of-bound access", categories::TaintedData}; - enum OOB_Kind { OOB_Precedes, OOB_Excedes }; + enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint }; - void reportOOB(CheckerContext , ProgramStateRef errorState, - OOB_Kind kind) const; - void reportTaintOOB(CheckerContext , ProgramStateRef errorState, - SVal TaintedSVal) const; + void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind, + SVal TaintedSVal = UnknownVal()) const; static bool isFromCtypeMacro(const Stmt *S, ASTContext ); public: void checkLocation(SVal l, bool isLoad, const Stmt *S, CheckerContext ) const; }; +} // anonymous namespace -// FIXME: Eventually replace RegionRawOffset with this class. -class RegionRawOffsetV2 { -private: - const SubRegion *baseRegion; - NonLoc byteOffset; +/// For a given Location that can be represented as a symbolic expression +/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block +/// Arr and the distance of Location from the beginning of Arr (expressed in a +/// NonLoc that specifies the number of CharUnits). Returns nullopt when these +/// cannot be determined. +std::optional> +computeOffset(ProgramStateRef State, SValBuilder , SVal Location) { + QualType T = SVB.getArrayIndexType(); + auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) { steakhal wrote: How about calling this EvalBinOp? https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
@@ -32,42 +32,72 @@ using namespace taint; namespace { class ArrayBoundCheckerV2 : public Checker { - mutable std::unique_ptr BT; - mutable std::unique_ptr TaintBT; + BugType BT{this, "Out-of-bound access"}; + BugType TaintBT{this, "Out-of-bound access", categories::TaintedData}; - enum OOB_Kind { OOB_Precedes, OOB_Excedes }; + enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint }; - void reportOOB(CheckerContext , ProgramStateRef errorState, - OOB_Kind kind) const; - void reportTaintOOB(CheckerContext , ProgramStateRef errorState, - SVal TaintedSVal) const; + void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind, + SVal TaintedSVal = UnknownVal()) const; static bool isFromCtypeMacro(const Stmt *S, ASTContext ); public: void checkLocation(SVal l, bool isLoad, const Stmt *S, CheckerContext ) const; }; +} // anonymous namespace -// FIXME: Eventually replace RegionRawOffset with this class. -class RegionRawOffsetV2 { -private: - const SubRegion *baseRegion; - NonLoc byteOffset; +/// For a given Location that can be represented as a symbolic expression +/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block +/// Arr and the distance of Location from the beginning of Arr (expressed in a +/// NonLoc that specifies the number of CharUnits). Returns nullopt when these +/// cannot be determined. +std::optional> +computeOffset(ProgramStateRef State, SValBuilder , SVal Location) { + QualType T = SVB.getArrayIndexType(); + auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) { +// We will use this utility to add and multiply values. +return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs(); + }; -public: - RegionRawOffsetV2(const SubRegion *base, NonLoc offset) - : baseRegion(base), byteOffset(offset) { assert(base); } + const auto *Region = dyn_cast_or_null(Location.getAsRegion()); + std::optional Offset = std::nullopt; steakhal wrote: How about setting this to zero? https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
@@ -32,42 +32,72 @@ using namespace taint; namespace { class ArrayBoundCheckerV2 : public Checker { - mutable std::unique_ptr BT; - mutable std::unique_ptr TaintBT; + BugType BT{this, "Out-of-bound access"}; + BugType TaintBT{this, "Out-of-bound access", categories::TaintedData}; - enum OOB_Kind { OOB_Precedes, OOB_Excedes }; + enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint }; - void reportOOB(CheckerContext , ProgramStateRef errorState, - OOB_Kind kind) const; - void reportTaintOOB(CheckerContext , ProgramStateRef errorState, - SVal TaintedSVal) const; + void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind, + SVal TaintedSVal = UnknownVal()) const; static bool isFromCtypeMacro(const Stmt *S, ASTContext ); public: void checkLocation(SVal l, bool isLoad, const Stmt *S, CheckerContext ) const; }; +} // anonymous namespace -// FIXME: Eventually replace RegionRawOffset with this class. -class RegionRawOffsetV2 { -private: - const SubRegion *baseRegion; - NonLoc byteOffset; +/// For a given Location that can be represented as a symbolic expression +/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block +/// Arr and the distance of Location from the beginning of Arr (expressed in a +/// NonLoc that specifies the number of CharUnits). Returns nullopt when these +/// cannot be determined. +std::optional> +computeOffset(ProgramStateRef State, SValBuilder , SVal Location) { + QualType T = SVB.getArrayIndexType(); + auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) { +// We will use this utility to add and multiply values. +return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs(); + }; -public: - RegionRawOffsetV2(const SubRegion *base, NonLoc offset) - : baseRegion(base), byteOffset(offset) { assert(base); } + const auto *Region = dyn_cast_or_null(Location.getAsRegion()); + std::optional Offset = std::nullopt; + + while (const auto *ERegion = dyn_cast_or_null(Region)) { +const auto Index = ERegion->getIndex().getAs(); +if (!Index) + return std::nullopt; + +QualType ElemType = ERegion->getElementType(); +// If the element is an incomplete type, go no further. +if (ElemType->isIncompleteType()) + return std::nullopt; + +// Calculate Delta = Index * sizeof(ElemType). +NonLoc Size = SVB.makeArrayIndex( +SVB.getContext().getTypeSizeInChars(ElemType).getQuantity()); +auto Delta = Calc(BO_Mul, *Index, Size); +if (!Delta) + return std::nullopt; + +// Perform Offset += Delta, handling the initial nullopt as 0. +if (!Offset) { + Offset = Delta; +} else { + Offset = Calc(BO_Add, *Offset, *Delta); + if (!Offset) +return std::nullopt; +} steakhal wrote: I feel we could simplify the control flow here. ```suggestion assert(Offset); Offset = Calc(Add, *Offset, *Delta); if (!Offset) return std::nullopt; ``` https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
https://github.com/steakhal edited https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
https://github.com/steakhal requested changes to this pull request. https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
https://github.com/DonatNagyE updated https://github.com/llvm/llvm-project/pull/67572 >From 7bd280e2da3f2ee8fe5fd7086a4704331f21b435 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Don=C3=A1t=20Nagy?= Date: Wed, 6 Sep 2023 13:39:27 +0200 Subject: [PATCH] [analyzer][NFC] Simplifications in ArrayBoundV2 I'm planning to improve diagnostics generation in `ArrayBoundCheckerV2` but before that I'm refactoring the source code to clean up some over-complicated code and an inaccurate comment. Changes in this commit: - Remove the `mutable std::unique_ptr` boilerplate, because it's no longer needed. - Remove the code duplication between the methods `reportOOB()` and `reportTaintedOOB()`. - Eliminate the class `RegionRawOffsetV2` because it's just a "reinvent the wheel" version of `std::pair` and it was used only once, as a temporary object that was immediately decomposed. (I suspect that `RegionRawOffset` in MemRegion.cpp could also be eliminated.) - Flatten the code of `computeOffset()` which had contained six nested indentation levels before this commit. - Ensure that `computeOffset()` returns `std::nullopt` instead of a `{Region, }` pair in the case when it encounters a `Location` that is not an `ElementRegion`. This ensures that the `checkLocation` callback returns early when it handles a memory access where it has "nothing to do" (no subscript operation or equivalent pointer arithmetic). (Note that this is still NFC because zero is a valid index everywhere, so the old logic without this shortcut eventually reached the same conclusion.) - Correct a wrong explanation comment in `getSimplifiedOffsets()`. --- .../Checkers/ArrayBoundCheckerV2.cpp | 221 +++--- 1 file changed, 86 insertions(+), 135 deletions(-) diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp index db4a2fcea9b2cdd..c2ffcdf5e306d1f 100644 --- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp +++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp @@ -32,15 +32,13 @@ using namespace taint; namespace { class ArrayBoundCheckerV2 : public Checker { - mutable std::unique_ptr BT; - mutable std::unique_ptr TaintBT; + BugType BT{this, "Out-of-bound access"}; + BugType TaintBT{this, "Out-of-bound access", categories::TaintedData}; - enum OOB_Kind { OOB_Precedes, OOB_Excedes }; + enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint }; - void reportOOB(CheckerContext , ProgramStateRef errorState, - OOB_Kind kind) const; - void reportTaintOOB(CheckerContext , ProgramStateRef errorState, - SVal TaintedSVal) const; + void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind, + SVal TaintedSVal = UnknownVal()) const; static bool isFromCtypeMacro(const Stmt *S, ASTContext ); @@ -48,26 +46,58 @@ class ArrayBoundCheckerV2 : void checkLocation(SVal l, bool isLoad, const Stmt *S, CheckerContext ) const; }; +} // anonymous namespace -// FIXME: Eventually replace RegionRawOffset with this class. -class RegionRawOffsetV2 { -private: - const SubRegion *baseRegion; - NonLoc byteOffset; +/// For a given Location that can be represented as a symbolic expression +/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block +/// Arr and the distance of Location from the beginning of Arr (expressed in a +/// NonLoc that specifies the number of CharUnits). Returns nullopt when these +/// cannot be determined. +std::optional> +computeOffset(ProgramStateRef State, SValBuilder , SVal Location) { + QualType T = SVB.getArrayIndexType(); + auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) { +// We will use this utility to add and multiply values. +return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs(); + }; -public: - RegionRawOffsetV2(const SubRegion *base, NonLoc offset) - : baseRegion(base), byteOffset(offset) { assert(base); } + const auto *Region = dyn_cast_or_null(Location.getAsRegion()); + std::optional Offset = std::nullopt; + + while (const auto *ERegion = dyn_cast_or_null(Region)) { +const auto Index = ERegion->getIndex().getAs(); +if (!Index) + return std::nullopt; + +QualType ElemType = ERegion->getElementType(); +// If the element is an incomplete type, go no further. +if (ElemType->isIncompleteType()) + return std::nullopt; + +// Calculate Delta = Index * sizeof(ElemType). +NonLoc Size = SVB.makeArrayIndex( +SVB.getContext().getTypeSizeInChars(ElemType).getQuantity()); +auto Delta = Calc(BO_Mul, *Index, Size); +if (!Delta) + return std::nullopt; + +// Perform Offset += Delta, handling the initial nullopt as 0. +if (!Offset) { + Offset = Delta; +} else { + Offset = Calc(BO_Add, *Offset, *Delta); + if (!Offset) +return std::nullopt; +} - NonLoc
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
https://github.com/DonatNagyE updated https://github.com/llvm/llvm-project/pull/67572 >From 583b6c3bf838bf74899a1ce5ab53b3722ddb8e66 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Don=C3=A1t=20Nagy?= Date: Wed, 6 Sep 2023 13:39:27 +0200 Subject: [PATCH] [analyzer][NFC] Simplifications in ArrayBoundV2 I'm planning to improve diagnostics generation in `ArrayBoundCheckerV2` but before that I'm refactoring the source code to clean up some over-complicated code and an inaccurate comment. Changes in this commit: - Remove the `mutable std::unique_ptr` boilerplate, because it's no longer needed. - Remove the code duplication between the methods `reportOOB()` and `reportTaintedOOB()`. - Eliminate the class `RegionRawOffsetV2` because it's just a "reinvent the wheel" version of `std::pair` and it was used only once, as a temporary object that was immediately decomposed. (I suspect that `RegionRawOffset` in MemRegion.cpp could also be eliminated.) - Flatten the code of `computeOffset()` which had contained six nested indentation levels before this commit. - Ensure that `computeOffset()` returns `std::nullopt` instead of a `{Region, }` pair in the case when it encounters a `Location` that is not an `ElementRegion`. This ensures that the `checkLocation` callback returns early when it handles a memory access where it has "nothing to do" (no subscript operation or equivalent pointer arithmetic). (Note that this is still NFC because zero is a valid index everywhere, so the old logic without this shortcut eventually reached the same conclusion.) - Correct a wrong explanation comment in `getSimplifiedOffsets()`. --- .../Checkers/ArrayBoundCheckerV2.cpp | 221 +++--- 1 file changed, 86 insertions(+), 135 deletions(-) diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp index db4a2fcea9b2cdd..e52eef0fe94370b 100644 --- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp +++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp @@ -32,15 +32,13 @@ using namespace taint; namespace { class ArrayBoundCheckerV2 : public Checker { - mutable std::unique_ptr BT; - mutable std::unique_ptr TaintBT; + BugType BT{this, "Out-of-bound access"}; + BugType TaintBT{this, "Out-of-bound access", categories::TaintedData}; - enum OOB_Kind { OOB_Precedes, OOB_Excedes }; + enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint }; - void reportOOB(CheckerContext , ProgramStateRef errorState, - OOB_Kind kind) const; - void reportTaintOOB(CheckerContext , ProgramStateRef errorState, - SVal TaintedSVal) const; + void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind, + SVal TaintedSVal = UnknownVal()) const; static bool isFromCtypeMacro(const Stmt *S, ASTContext ); @@ -48,26 +46,58 @@ class ArrayBoundCheckerV2 : void checkLocation(SVal l, bool isLoad, const Stmt *S, CheckerContext ) const; }; +} // anonymous namespace -// FIXME: Eventually replace RegionRawOffset with this class. -class RegionRawOffsetV2 { -private: - const SubRegion *baseRegion; - NonLoc byteOffset; +/// For a given Location that can be represented as a symbolic expression +/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block +/// Arr and the distance of Location from the beginning of Arr (expressed in a +/// NonLoc that specifies the number of CharUnits). Returns nullopt when these +/// cannot be determined. +std::optional> +computeOffset(ProgramStateRef State, SValBuilder , SVal Location) { + QualType T = SVB.getArrayIndexType(); + auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) { +// We will use this utility to add and multiply values. +return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs(); + }; -public: - RegionRawOffsetV2(const SubRegion *base, NonLoc offset) - : baseRegion(base), byteOffset(offset) { assert(base); } + const auto *Region = Location.getAsRegion()->getAs(); + std::optional Offset = std::nullopt; + + while (const auto *ERegion = dyn_cast_or_null(Region)) { +const auto Index = ERegion->getIndex().getAs(); +if (!Index) + return std::nullopt; + +QualType ElemType = ERegion->getElementType(); +// If the element is an incomplete type, go no further. +if (ElemType->isIncompleteType()) + return std::nullopt; + +// Calculate Delta = Index * sizeof(ElemType). +NonLoc Size = SVB.makeArrayIndex( +SVB.getContext().getTypeSizeInChars(ElemType).getQuantity()); +auto Delta = Calc(BO_Mul, *Index, Size); +if (!Delta) + return std::nullopt; + +// Perform Offset += Delta, handling the initial nullopt as 0. +if (!Offset) { + Offset = Delta; +} else { + Offset = Calc(BO_Add, *Offset, *Delta); + if (!Offset) +return std::nullopt; +} - NonLoc
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
steakhal wrote: > I'm confident that this patch is NFC, but my claim is based on theoretical > reasoning (a.k.a. "I think I didn't make a mistake"). I have a background in > theoretical mathematics, so for me "I have a rough a proof in my head" is > intuitively stronger than "I verified it on some arbitrary input"; but you're > right that this change is so complex that it deserves an experimental check. > I started some analysis on open source projects and I'll give results on > Monday. Oh I see. It makes sense to me. And on second thought, it shouldn't be much of a hassle doing it anyways. I didn't mean to challenge you to be clear. https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
DonatNagyE wrote: I'm confident that this patch is NFC, but my claim is based on theoretical reasoning (a.k.a. "I think I didn't make a mistake"). I have a background in theoretical mathematics, so for me "I have a rough a proof in my head" is intuitively stronger than "I verified it on some arbitrary input"; but you're right that this change is so complex that it deserves an experimental check. I started some analysis on open source projects and I'll give results on Monday. https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
steakhal wrote: I remember I looked this once. I postponed my comments because I was expecting some numbers or confirmation of that this patch is indeed NFC, thus no report or notes change. I'm not sure if this actually holds, given the changes, but let me know in any case. BTW I'm fine with slight modifications if we call it NFCI or just without NFC. If really nothing changes, consider this approved :) https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
https://github.com/gamesh411 approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/67572 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
https://github.com/DonatNagyE updated https://github.com/llvm/llvm-project/pull/67572 >From 448999ecab28ea4d9768c1a8fa5fb4bb25e4bd32 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Don=C3=A1t=20Nagy?= Date: Wed, 6 Sep 2023 13:39:27 +0200 Subject: [PATCH] [analyzer][NFC] Simplifications in ArrayBoundV2 I'm planning to improve diagnostics generation in `ArrayBoundCheckerV2` but before that I'm refactoring the source code to clean up some over-complicated code and an inaccurate comment. Changes in this commit: - Remove the `mutable std::unique_ptr` boilerplate, because it's no longer needed. - Remove the code duplication between the methods `reportOOB()` and `reportTaintedOOB()`. - Eliminate the class `RegionRawOffsetV2` because it's just a "reinvent the wheel" version of `std::pair` and it was used only once, as a temporary object that was immediately decomposed. (I suspect that `RegionRawOffset` in MemRegion.cpp could also be eliminated.) - Flatten the code of `computeOffset()` which had contained six nested indentation levels before this commit. - Ensure that `computeOffset()` returns `std::nullopt` instead of a `{Region, }` pair in the case when it encounters a `Location` that is not an `ElementRegion`. This ensures that the `checkLocation` callback returns early when it handles a memory access where it has "nothing to do" (no subscript operation or equivalent pointer arithmetic). (Note that this is still NFC because zero is a valid index everywhere, so the old logic without this shortcut eventually reached the same conclusion.) - Correct a wrong explanation comment in `getSimplifiedOffsets()`. --- .../Checkers/ArrayBoundCheckerV2.cpp | 221 +++--- 1 file changed, 86 insertions(+), 135 deletions(-) diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp index db4a2fcea9b2cdd..e52eef0fe94370b 100644 --- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp +++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp @@ -32,15 +32,13 @@ using namespace taint; namespace { class ArrayBoundCheckerV2 : public Checker { - mutable std::unique_ptr BT; - mutable std::unique_ptr TaintBT; + BugType BT{this, "Out-of-bound access"}; + BugType TaintBT{this, "Out-of-bound access", categories::TaintedData}; - enum OOB_Kind { OOB_Precedes, OOB_Excedes }; + enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint }; - void reportOOB(CheckerContext , ProgramStateRef errorState, - OOB_Kind kind) const; - void reportTaintOOB(CheckerContext , ProgramStateRef errorState, - SVal TaintedSVal) const; + void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind, + SVal TaintedSVal = UnknownVal()) const; static bool isFromCtypeMacro(const Stmt *S, ASTContext ); @@ -48,26 +46,58 @@ class ArrayBoundCheckerV2 : void checkLocation(SVal l, bool isLoad, const Stmt *S, CheckerContext ) const; }; +} // anonymous namespace -// FIXME: Eventually replace RegionRawOffset with this class. -class RegionRawOffsetV2 { -private: - const SubRegion *baseRegion; - NonLoc byteOffset; +/// For a given Location that can be represented as a symbolic expression +/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block +/// Arr and the distance of Location from the beginning of Arr (expressed in a +/// NonLoc that specifies the number of CharUnits). Returns nullopt when these +/// cannot be determined. +std::optional> +computeOffset(ProgramStateRef State, SValBuilder , SVal Location) { + QualType T = SVB.getArrayIndexType(); + auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) { +// We will use this utility to add and multiply values. +return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs(); + }; -public: - RegionRawOffsetV2(const SubRegion *base, NonLoc offset) - : baseRegion(base), byteOffset(offset) { assert(base); } + const auto *Region = Location.getAsRegion()->getAs(); + std::optional Offset = std::nullopt; + + while (const auto *ERegion = dyn_cast_or_null(Region)) { +const auto Index = ERegion->getIndex().getAs(); +if (!Index) + return std::nullopt; + +QualType ElemType = ERegion->getElementType(); +// If the element is an incomplete type, go no further. +if (ElemType->isIncompleteType()) + return std::nullopt; + +// Calculate Delta = Index * sizeof(ElemType). +NonLoc Size = SVB.makeArrayIndex( +SVB.getContext().getTypeSizeInChars(ElemType).getQuantity()); +auto Delta = Calc(BO_Mul, *Index, Size); +if (!Delta) + return std::nullopt; + +// Perform Offset += Delta, handling the initial nullopt as 0. +if (!Offset) { + Offset = Delta; +} else { + Offset = Calc(BO_Add, *Offset, *Delta); + if (!Offset) +return std::nullopt; +} - NonLoc
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
https://github.com/DonatNagyE updated https://github.com/llvm/llvm-project/pull/67572 >From 2b6a23c71a41bc07e37b4bed2cfef8b790e84631 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Don=C3=A1t=20Nagy?= Date: Wed, 6 Sep 2023 13:39:27 +0200 Subject: [PATCH] [analyzer][NFC] Simplifications in ArrayBoundV2 I'm planning to improve diagnostics generation in `ArrayBoundCheckerV2` but before that I'm refactoring the source code to clean up some over-complicated code and an inaccurate comment. Changes in this commit: - Remove the `mutable std::unique_ptr` boilerplate, because it's no longer needed. - Remove the code duplication between the methods `reportOOB()` and `reportTaintedOOB()`. - Eliminate the class `RegionRawOffsetV2` because it's just a "reinvent the wheel" version of `std::pair` and it was used only once, as a temporary object that was immediately decomposed. (I suspect that `RegionRawOffset` in MemRegion.cpp could also be eliminated.) - Flatten the code of `computeOffset()` which had contained six nested indentation levels before this commit. - Ensure that `computeOffset()` returns `std::nullopt` instead of a `{Region, }` pair in the case when it encounters a `Location` that is not an `ElementRegion`. This ensures that the `checkLocation` callback returns early when it handles a memory access where it has "nothing to do" (no subscript operation or equivalent pointer arithmetic). (Note that this is still NFC because zero is a valid index everywhere, so the old logic without this shortcut eventually reached the same conclusion.) - Correct a wrong explanation comment in `getSimplifiedOffsets()`. --- .../Checkers/ArrayBoundCheckerV2.cpp | 221 +++--- 1 file changed, 86 insertions(+), 135 deletions(-) diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp index db4a2fcea9b2cdd..1d5b02a22b7aca5 100644 --- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp +++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp @@ -32,15 +32,13 @@ using namespace taint; namespace { class ArrayBoundCheckerV2 : public Checker { - mutable std::unique_ptr BT; - mutable std::unique_ptr TaintBT; + BugType BT{this, "Out-of-bound access"}; + BugType TaintBT{this, "Out-of-bound access", categories::TaintedData}; - enum OOB_Kind { OOB_Precedes, OOB_Excedes }; + enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint }; - void reportOOB(CheckerContext , ProgramStateRef errorState, - OOB_Kind kind) const; - void reportTaintOOB(CheckerContext , ProgramStateRef errorState, - SVal TaintedSVal) const; + void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind, + SVal TaintedSVal = UnknownVal()) const; static bool isFromCtypeMacro(const Stmt *S, ASTContext ); @@ -49,25 +47,56 @@ class ArrayBoundCheckerV2 : CheckerContext ) const; }; -// FIXME: Eventually replace RegionRawOffset with this class. -class RegionRawOffsetV2 { -private: - const SubRegion *baseRegion; - NonLoc byteOffset; +/// For a given Location that can be represented as a symbolic expression +/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block +/// Arr and the distance of Location from the beginning of Arr (expressed in a +/// NonLoc that specifies the number of CharUnits). Returns nullopt when these +/// cannot be determined. +std::optional> +computeOffset(ProgramStateRef State, SValBuilder , SVal Location) { + QualType T = SVB.getArrayIndexType(); + auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) { +// We will use this utility to add and multiply values. +return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs(); + }; -public: - RegionRawOffsetV2(const SubRegion *base, NonLoc offset) - : baseRegion(base), byteOffset(offset) { assert(base); } + const auto *Region = Location.getAsRegion()->getAs(); + std::optional Offset = std::nullopt; + + while (const auto *ERegion = dyn_cast_or_null(Region)) { +const auto Index = ERegion->getIndex().getAs(); +if (!Index) + return std::nullopt; + +QualType ElemType = ERegion->getElementType(); +// If the element is an incomplete type, go no further. +if (ElemType->isIncompleteType()) + return std::nullopt; + +// Calculate Delta = Index * sizeof(ElemType). +NonLoc Size = SVB.makeArrayIndex( +SVB.getContext().getTypeSizeInChars(ElemType).getQuantity()); +auto Delta = Calc(BO_Mul, *Index, Size); +if (!Delta) + return std::nullopt; + +// Perform Offset += Delta, handling the initial nullopt as 0. +if (!Offset) { + Offset = Delta; +} else { + Offset = Calc(BO_Add, *Offset, *Delta); + if (!Offset) +return std::nullopt; +} - NonLoc getByteOffset() const { return byteOffset; } - const SubRegion *getRegion() const { return
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
llvmbot wrote: @llvm/pr-subscribers-clang Changes I'm planning to improve diagnostics generation in `ArrayBoundCheckerV2` but before that I'm refactoring the source code to clean up some over-complicated code and an inaccurate comment. Changes in this commit: - Remove the `mutable std::unique_ptrBugType` boilerplate, because it's no longer needed. - Remove the code duplication between the methods `reportOOB()` and `reportTaintedOOB()`. - Eliminate the class `RegionRawOffsetV2` because it's just a "reinvent the wheel" version of `std::pair` and it was used only once, as a temporary object that was immediately decomposed. (I suspect that `RegionRawOffset` in MemRegion.cpp could also be eliminated.) - Flatten the code of `computeOffset()` which had contained six nested indentation levels before this commit. - Ensure that `computeOffset()` returns `std::nullopt` instead of a `{Region, zero array index}` pair in the case when it encounters a `Location` that is not an `ElementRegion`. This ensures that the `checkLocation` callback returns early when it handles a memory access where it has "nothing to do" (no subscript operation or equivalent pointer arithmetic). (Note that this is still NFC because zero is a valid index everywhere, so the old logic without this shortcut eventually reached the same conclusion.) - Correct a wrong explanation comment in `getSimplifiedOffsets()`. --- Full diff: https://github.com/llvm/llvm-project/pull/67572.diff 1 Files Affected: - (modified) clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp (+86-135) ``diff diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp index db4a2fcea9b2cdd..8c26649621fe431 100644 --- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp +++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp @@ -32,15 +32,13 @@ using namespace taint; namespace { class ArrayBoundCheckerV2 : public Checker { - mutable std::unique_ptr BT; - mutable std::unique_ptr TaintBT; + BugType BT{this, "Out-of-bound access"}; + BugType TaintBT{this, "Out-of-bound access", categories::TaintedData}; - enum OOB_Kind { OOB_Precedes, OOB_Excedes }; + enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint }; - void reportOOB(CheckerContext , ProgramStateRef errorState, - OOB_Kind kind) const; - void reportTaintOOB(CheckerContext , ProgramStateRef errorState, - SVal TaintedSVal) const; + void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind, + SVal TaintedSVal = UnknownVal()) const; static bool isFromCtypeMacro(const Stmt *S, ASTContext ); @@ -49,25 +47,56 @@ class ArrayBoundCheckerV2 : CheckerContext ) const; }; -// FIXME: Eventually replace RegionRawOffset with this class. -class RegionRawOffsetV2 { -private: - const SubRegion *baseRegion; - NonLoc byteOffset; +/// For a given Location that can be represented as a symbolic expression +/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block +/// Arr and the distance of Location from the beginning of Arr (expressed in a +/// NonLoc that specifies the number of CharUnits). Returns nullopt when these +/// cannot be determined. +std::optional> +computeOffset(ProgramStateRef State, SValBuilder , SVal Location) { + QualType T = SVB.getArrayIndexType(); + auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) { +// We will use this utility to add and multiply values. +return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs(); + }; -public: - RegionRawOffsetV2(const SubRegion *base, NonLoc offset) - : baseRegion(base), byteOffset(offset) { assert(base); } + const auto *Region = Location.getAsRegion()->getAs(); + std::optional Offset = std::nullopt; + + while (const auto *ERegion = dyn_cast_or_null(Region)) { +const auto Index = ERegion->getIndex().getAs(); +if (!Index) + return std::nullopt; + +QualType ElemType = ERegion->getElementType(); +// If the element is an incomplete type, go no further. +if (ElemType->isIncompleteType()) + return std::nullopt; + +// Calculate Delta = Index * sizeof(ElemType). +NonLoc Size = SVB.makeArrayIndex( +SVB.getContext().getTypeSizeInChars(ElemType).getQuantity()); +auto Delta = Calc(BO_Mul, *Index, Size); +if (!Delta) + return std::nullopt; + +// Perform Offset += Delta, handling the initial nullopt as 0. +if (!Offset) { + Offset = Delta; +} else { + Offset = Calc(BO_Add, *Offset, *Delta); + if (!Offset) +return std::nullopt; +} - NonLoc getByteOffset() const { return byteOffset; } - const SubRegion *getRegion() const { return baseRegion; } +// Continute the offset calculations with the SuperRegion. +Region = ERegion->getSuperRegion()->getAs(); + } - static std::optional -
[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)
https://github.com/DonatNagyE created https://github.com/llvm/llvm-project/pull/67572 I'm planning to improve diagnostics generation in `ArrayBoundCheckerV2` but before that I'm refactoring the source code to clean up some over-complicated code and an inaccurate comment. Changes in this commit: - Remove the `mutable std::unique_ptr` boilerplate, because it's no longer needed. - Remove the code duplication between the methods `reportOOB()` and `reportTaintedOOB()`. - Eliminate the class `RegionRawOffsetV2` because it's just a "reinvent the wheel" version of `std::pair` and it was used only once, as a temporary object that was immediately decomposed. (I suspect that `RegionRawOffset` in MemRegion.cpp could also be eliminated.) - Flatten the code of `computeOffset()` which had contained six nested indentation levels before this commit. - Ensure that `computeOffset()` returns `std::nullopt` instead of a `{Region, }` pair in the case when it encounters a `Location` that is not an `ElementRegion`. This ensures that the `checkLocation` callback returns early when it handles a memory access where it has "nothing to do" (no subscript operation or equivalent pointer arithmetic). (Note that this is still NFC because zero is a valid index everywhere, so the old logic without this shortcut eventually reached the same conclusion.) - Correct a wrong explanation comment in `getSimplifiedOffsets()`. >From 47c4b5c639d146f9f3469ed3fb3ac998a0ded483 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Don=C3=A1t=20Nagy?= Date: Wed, 6 Sep 2023 13:39:27 +0200 Subject: [PATCH] [analyzer][NFC] Simplifications in ArrayBoundV2 I'm planning to improve diagnostics generation in `ArrayBoundCheckerV2` but before that I'm refactoring the source code to clean up some over-complicated code and an inaccurate comment. Changes in this commit: - Remove the `mutable std::unique_ptr` boilerplate, because it's no longer needed. - Remove the code duplication between the methods `reportOOB()` and `reportTaintedOOB()`. - Eliminate the class `RegionRawOffsetV2` because it's just a "reinvent the wheel" version of `std::pair` and it was used only once, as a temporary object that was immediately decomposed. (I suspect that `RegionRawOffset` in MemRegion.cpp could also be eliminated.) - Flatten the code of `computeOffset()` which had contained six nested indentation levels before this commit. - Ensure that `computeOffset()` returns `std::nullopt` instead of a `{Region, }` pair in the case when it encounters a `Location` that is not an `ElementRegion`. This ensures that the `checkLocation` callback returns early when it handles a memory access where it has "nothing to do" (no subscript operation or equivalent pointer arithmetic). (Note that this is still NFC because zero is a valid index everywhere, so the old logic without this shortcut eventually reached the same conclusion.) - Correct a wrong explanation comment in `getSimplifiedOffsets()`. --- .../Checkers/ArrayBoundCheckerV2.cpp | 221 +++--- 1 file changed, 86 insertions(+), 135 deletions(-) diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp index db4a2fcea9b2cdd..8c26649621fe431 100644 --- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp +++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp @@ -32,15 +32,13 @@ using namespace taint; namespace { class ArrayBoundCheckerV2 : public Checker { - mutable std::unique_ptr BT; - mutable std::unique_ptr TaintBT; + BugType BT{this, "Out-of-bound access"}; + BugType TaintBT{this, "Out-of-bound access", categories::TaintedData}; - enum OOB_Kind { OOB_Precedes, OOB_Excedes }; + enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint }; - void reportOOB(CheckerContext , ProgramStateRef errorState, - OOB_Kind kind) const; - void reportTaintOOB(CheckerContext , ProgramStateRef errorState, - SVal TaintedSVal) const; + void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind, + SVal TaintedSVal = UnknownVal()) const; static bool isFromCtypeMacro(const Stmt *S, ASTContext ); @@ -49,25 +47,56 @@ class ArrayBoundCheckerV2 : CheckerContext ) const; }; -// FIXME: Eventually replace RegionRawOffset with this class. -class RegionRawOffsetV2 { -private: - const SubRegion *baseRegion; - NonLoc byteOffset; +/// For a given Location that can be represented as a symbolic expression +/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block +/// Arr and the distance of Location from the beginning of Arr (expressed in a +/// NonLoc that specifies the number of CharUnits). Returns nullopt when these +/// cannot be determined. +std::optional> +computeOffset(ProgramStateRef State, SValBuilder , SVal Location) { + QualType T = SVB.getArrayIndexType(); + auto Calc = [, State,