[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-24 Thread via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 


https://github.com/DonatNagyE closed 
https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-22 Thread Balazs Benics via cfe-commits
=?utf-8?q?Don=C3=A1t?= Nagy ,
=?utf-8?q?Don=C3=A1t?= Nagy ,
=?utf-8?q?Don=C3=A1t?= Nagy ,
=?utf-8?q?Don=C3=A1t?= Nagy 
Message-ID:
In-Reply-To: 


https://github.com/steakhal approved this pull request.


https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-20 Thread via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 


DonatNagyE wrote:

I'll merge this change soon, when I'll start to work on the followup commit 
(unless there is additional feedback until then). Thanks for the reviews!

https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-20 Thread via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 


https://github.com/DonatNagyE updated 
https://github.com/llvm/llvm-project/pull/67572

>From 7bd280e2da3f2ee8fe5fd7086a4704331f21b435 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= 
Date: Wed, 6 Sep 2023 13:39:27 +0200
Subject: [PATCH 1/5] [analyzer][NFC] Simplifications in ArrayBoundV2

I'm planning to improve diagnostics generation in `ArrayBoundCheckerV2`
but before that I'm refactoring the source code to clean up some
over-complicated code and an inaccurate comment.

Changes in this commit:
- Remove the `mutable std::unique_ptr` boilerplate, because
  it's no longer needed.
- Remove the code duplication between the methods `reportOOB()` and
  `reportTaintedOOB()`.
- Eliminate the class `RegionRawOffsetV2` because it's just a "reinvent
  the wheel" version of `std::pair` and it was used only once, as a
  temporary object that was immediately decomposed. (I suspect that
  `RegionRawOffset` in MemRegion.cpp could also be eliminated.)
- Flatten the code of `computeOffset()` which had contained six nested
  indentation levels before this commit.
- Ensure that `computeOffset()` returns `std::nullopt` instead of a
  `{Region, }` pair in the case when it encounters a
  `Location` that is not an `ElementRegion`. This ensures that the
  `checkLocation` callback returns early when it handles a memory access
  where it has "nothing to do" (no subscript operation or equivalent
  pointer arithmetic). (Note that this is still NFC because zero is a
  valid index everywhere, so the old logic without this shortcut
  eventually reached the same conclusion.)
- Correct a wrong explanation comment in `getSimplifiedOffsets()`.
---
 .../Checkers/ArrayBoundCheckerV2.cpp  | 221 +++---
 1 file changed, 86 insertions(+), 135 deletions(-)

diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp 
b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index db4a2fcea9b2cdd..c2ffcdf5e306d1f 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -32,15 +32,13 @@ using namespace taint;
 namespace {
 class ArrayBoundCheckerV2 :
 public Checker {
-  mutable std::unique_ptr BT;
-  mutable std::unique_ptr TaintBT;
+  BugType BT{this, "Out-of-bound access"};
+  BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
-  enum OOB_Kind { OOB_Precedes, OOB_Excedes };
+  enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-  void reportOOB(CheckerContext , ProgramStateRef errorState,
- OOB_Kind kind) const;
-  void reportTaintOOB(CheckerContext , ProgramStateRef errorState,
-  SVal TaintedSVal) const;
+  void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind,
+ SVal TaintedSVal = UnknownVal()) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext );
 
@@ -48,26 +46,58 @@ class ArrayBoundCheckerV2 :
   void checkLocation(SVal l, bool isLoad, const Stmt *S,
  CheckerContext ) const;
 };
+} // anonymous namespace
 
-// FIXME: Eventually replace RegionRawOffset with this class.
-class RegionRawOffsetV2 {
-private:
-  const SubRegion *baseRegion;
-  NonLoc byteOffset;
+/// For a given Location that can be represented as a symbolic expression
+/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block
+/// Arr and the distance of Location from the beginning of Arr (expressed in a
+/// NonLoc that specifies the number of CharUnits). Returns nullopt when these
+/// cannot be determined.
+std::optional>
+computeOffset(ProgramStateRef State, SValBuilder , SVal Location) {
+  QualType T = SVB.getArrayIndexType();
+  auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) {
+// We will use this utility to add and multiply values.
+return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs();
+  };
 
-public:
-  RegionRawOffsetV2(const SubRegion *base, NonLoc offset)
-  : baseRegion(base), byteOffset(offset) { assert(base); }
+  const auto *Region = dyn_cast_or_null(Location.getAsRegion());
+  std::optional Offset = std::nullopt;
+
+  while (const auto *ERegion = dyn_cast_or_null(Region)) {
+const auto Index = ERegion->getIndex().getAs();
+if (!Index)
+  return std::nullopt;
+
+QualType ElemType = ERegion->getElementType();
+// If the element is an incomplete type, go no further.
+if (ElemType->isIncompleteType())
+  return std::nullopt;
+
+// Calculate Delta = Index * sizeof(ElemType).
+NonLoc Size = SVB.makeArrayIndex(
+SVB.getContext().getTypeSizeInChars(ElemType).getQuantity());
+auto Delta = Calc(BO_Mul, *Index, Size);
+if (!Delta)
+  return std::nullopt;
+
+// Perform Offset += Delta, handling the initial nullopt as 0.
+if (!Offset) {
+  Offset = Delta;
+ 

[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-20 Thread via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 



@@ -32,42 +32,72 @@ using namespace taint;
 namespace {
 class ArrayBoundCheckerV2 :
 public Checker {
-  mutable std::unique_ptr BT;
-  mutable std::unique_ptr TaintBT;
+  BugType BT{this, "Out-of-bound access"};
+  BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
-  enum OOB_Kind { OOB_Precedes, OOB_Excedes };
+  enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-  void reportOOB(CheckerContext , ProgramStateRef errorState,
- OOB_Kind kind) const;
-  void reportTaintOOB(CheckerContext , ProgramStateRef errorState,
-  SVal TaintedSVal) const;
+  void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind,
+ SVal TaintedSVal = UnknownVal()) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext );
 
 public:
   void checkLocation(SVal l, bool isLoad, const Stmt *S,
  CheckerContext ) const;
 };
+} // anonymous namespace
 
-// FIXME: Eventually replace RegionRawOffset with this class.
-class RegionRawOffsetV2 {
-private:
-  const SubRegion *baseRegion;
-  NonLoc byteOffset;
+/// For a given Location that can be represented as a symbolic expression
+/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block
+/// Arr and the distance of Location from the beginning of Arr (expressed in a
+/// NonLoc that specifies the number of CharUnits). Returns nullopt when these
+/// cannot be determined.
+std::optional>
+computeOffset(ProgramStateRef State, SValBuilder , SVal Location) {
+  QualType T = SVB.getArrayIndexType();
+  auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) {
+// We will use this utility to add and multiply values.
+return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs();
+  };
 
-public:
-  RegionRawOffsetV2(const SubRegion *base, NonLoc offset)
-  : baseRegion(base), byteOffset(offset) { assert(base); }
+  const auto *Region = dyn_cast_or_null(Location.getAsRegion());
+  std::optional Offset = std::nullopt;
+
+  while (const auto *ERegion = dyn_cast_or_null(Region)) {
+const auto Index = ERegion->getIndex().getAs();
+if (!Index)
+  return std::nullopt;
+
+QualType ElemType = ERegion->getElementType();
+// If the element is an incomplete type, go no further.
+if (ElemType->isIncompleteType())

DonatNagyE wrote:

The testsuite passed with `assert(!ElemType->isIncompleteType());` instead of 
the early return. I also analyzed some open source projects (ffmpeg, postgres, 
xerces) with that assertion and it wasn't triggered on them.

However, I don't remove the early return in this commit because I don't want to 
risk a revert; but I added a FIXME that it should be removed in the future (and 
I'll probably do it soon in a stand-alone commit).

https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-20 Thread via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 



@@ -32,42 +32,72 @@ using namespace taint;
 namespace {
 class ArrayBoundCheckerV2 :
 public Checker {
-  mutable std::unique_ptr BT;
-  mutable std::unique_ptr TaintBT;
+  BugType BT{this, "Out-of-bound access"};
+  BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
-  enum OOB_Kind { OOB_Precedes, OOB_Excedes };
+  enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-  void reportOOB(CheckerContext , ProgramStateRef errorState,
- OOB_Kind kind) const;
-  void reportTaintOOB(CheckerContext , ProgramStateRef errorState,
-  SVal TaintedSVal) const;
+  void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind,
+ SVal TaintedSVal = UnknownVal()) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext );
 
 public:
   void checkLocation(SVal l, bool isLoad, const Stmt *S,
  CheckerContext ) const;
 };
+} // anonymous namespace
 
-// FIXME: Eventually replace RegionRawOffset with this class.
-class RegionRawOffsetV2 {
-private:
-  const SubRegion *baseRegion;
-  NonLoc byteOffset;
+/// For a given Location that can be represented as a symbolic expression
+/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block
+/// Arr and the distance of Location from the beginning of Arr (expressed in a
+/// NonLoc that specifies the number of CharUnits). Returns nullopt when these
+/// cannot be determined.
+std::optional>
+computeOffset(ProgramStateRef State, SValBuilder , SVal Location) {
+  QualType T = SVB.getArrayIndexType();
+  auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) {
+// We will use this utility to add and multiply values.
+return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs();
+  };
 
-public:
-  RegionRawOffsetV2(const SubRegion *base, NonLoc offset)
-  : baseRegion(base), byteOffset(offset) { assert(base); }
+  const auto *Region = dyn_cast_or_null(Location.getAsRegion());
+  std::optional Offset = std::nullopt;
+
+  while (const auto *ERegion = dyn_cast_or_null(Region)) {
+const auto Index = ERegion->getIndex().getAs();
+if (!Index)
+  return std::nullopt;
+
+QualType ElemType = ERegion->getElementType();
+// If the element is an incomplete type, go no further.
+if (ElemType->isIncompleteType())
+  return std::nullopt;
+
+// Calculate Delta = Index * sizeof(ElemType).
+NonLoc Size = SVB.makeArrayIndex(
+SVB.getContext().getTypeSizeInChars(ElemType).getQuantity());
+auto Delta = Calc(BO_Mul, *Index, Size);
+if (!Delta)
+  return std::nullopt;
+
+// Perform Offset += Delta, handling the initial nullopt as 0.
+if (!Offset) {
+  Offset = Delta;
+} else {
+  Offset = Calc(BO_Add, *Offset, *Delta);
+  if (!Offset)
+return std::nullopt;
+}

DonatNagyE wrote:

After some thinking I've chosen to use a different logic to filter out the 
trivial "no ElementRegion layer" case -- in the freshly uploaded revision it's 
signified by the value of the region pointer.

https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-20 Thread via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 


https://github.com/DonatNagyE updated 
https://github.com/llvm/llvm-project/pull/67572

>From 7bd280e2da3f2ee8fe5fd7086a4704331f21b435 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= 
Date: Wed, 6 Sep 2023 13:39:27 +0200
Subject: [PATCH 1/4] [analyzer][NFC] Simplifications in ArrayBoundV2

I'm planning to improve diagnostics generation in `ArrayBoundCheckerV2`
but before that I'm refactoring the source code to clean up some
over-complicated code and an inaccurate comment.

Changes in this commit:
- Remove the `mutable std::unique_ptr` boilerplate, because
  it's no longer needed.
- Remove the code duplication between the methods `reportOOB()` and
  `reportTaintedOOB()`.
- Eliminate the class `RegionRawOffsetV2` because it's just a "reinvent
  the wheel" version of `std::pair` and it was used only once, as a
  temporary object that was immediately decomposed. (I suspect that
  `RegionRawOffset` in MemRegion.cpp could also be eliminated.)
- Flatten the code of `computeOffset()` which had contained six nested
  indentation levels before this commit.
- Ensure that `computeOffset()` returns `std::nullopt` instead of a
  `{Region, }` pair in the case when it encounters a
  `Location` that is not an `ElementRegion`. This ensures that the
  `checkLocation` callback returns early when it handles a memory access
  where it has "nothing to do" (no subscript operation or equivalent
  pointer arithmetic). (Note that this is still NFC because zero is a
  valid index everywhere, so the old logic without this shortcut
  eventually reached the same conclusion.)
- Correct a wrong explanation comment in `getSimplifiedOffsets()`.
---
 .../Checkers/ArrayBoundCheckerV2.cpp  | 221 +++---
 1 file changed, 86 insertions(+), 135 deletions(-)

diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp 
b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index db4a2fcea9b2cdd..c2ffcdf5e306d1f 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -32,15 +32,13 @@ using namespace taint;
 namespace {
 class ArrayBoundCheckerV2 :
 public Checker {
-  mutable std::unique_ptr BT;
-  mutable std::unique_ptr TaintBT;
+  BugType BT{this, "Out-of-bound access"};
+  BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
-  enum OOB_Kind { OOB_Precedes, OOB_Excedes };
+  enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-  void reportOOB(CheckerContext , ProgramStateRef errorState,
- OOB_Kind kind) const;
-  void reportTaintOOB(CheckerContext , ProgramStateRef errorState,
-  SVal TaintedSVal) const;
+  void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind,
+ SVal TaintedSVal = UnknownVal()) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext );
 
@@ -48,26 +46,58 @@ class ArrayBoundCheckerV2 :
   void checkLocation(SVal l, bool isLoad, const Stmt *S,
  CheckerContext ) const;
 };
+} // anonymous namespace
 
-// FIXME: Eventually replace RegionRawOffset with this class.
-class RegionRawOffsetV2 {
-private:
-  const SubRegion *baseRegion;
-  NonLoc byteOffset;
+/// For a given Location that can be represented as a symbolic expression
+/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block
+/// Arr and the distance of Location from the beginning of Arr (expressed in a
+/// NonLoc that specifies the number of CharUnits). Returns nullopt when these
+/// cannot be determined.
+std::optional>
+computeOffset(ProgramStateRef State, SValBuilder , SVal Location) {
+  QualType T = SVB.getArrayIndexType();
+  auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) {
+// We will use this utility to add and multiply values.
+return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs();
+  };
 
-public:
-  RegionRawOffsetV2(const SubRegion *base, NonLoc offset)
-  : baseRegion(base), byteOffset(offset) { assert(base); }
+  const auto *Region = dyn_cast_or_null(Location.getAsRegion());
+  std::optional Offset = std::nullopt;
+
+  while (const auto *ERegion = dyn_cast_or_null(Region)) {
+const auto Index = ERegion->getIndex().getAs();
+if (!Index)
+  return std::nullopt;
+
+QualType ElemType = ERegion->getElementType();
+// If the element is an incomplete type, go no further.
+if (ElemType->isIncompleteType())
+  return std::nullopt;
+
+// Calculate Delta = Index * sizeof(ElemType).
+NonLoc Size = SVB.makeArrayIndex(
+SVB.getContext().getTypeSizeInChars(ElemType).getQuantity());
+auto Delta = Calc(BO_Mul, *Index, Size);
+if (!Delta)
+  return std::nullopt;
+
+// Perform Offset += Delta, handling the initial nullopt as 0.
+if (!Offset) {
+  Offset = Delta;
+} else {
+  

[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-19 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 



@@ -32,42 +32,72 @@ using namespace taint;
 namespace {
 class ArrayBoundCheckerV2 :
 public Checker {
-  mutable std::unique_ptr BT;
-  mutable std::unique_ptr TaintBT;
+  BugType BT{this, "Out-of-bound access"};
+  BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
-  enum OOB_Kind { OOB_Precedes, OOB_Excedes };
+  enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-  void reportOOB(CheckerContext , ProgramStateRef errorState,
- OOB_Kind kind) const;
-  void reportTaintOOB(CheckerContext , ProgramStateRef errorState,
-  SVal TaintedSVal) const;
+  void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind,
+ SVal TaintedSVal = UnknownVal()) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext );
 
 public:
   void checkLocation(SVal l, bool isLoad, const Stmt *S,
  CheckerContext ) const;
 };
+} // anonymous namespace
 
-// FIXME: Eventually replace RegionRawOffset with this class.
-class RegionRawOffsetV2 {
-private:
-  const SubRegion *baseRegion;
-  NonLoc byteOffset;
+/// For a given Location that can be represented as a symbolic expression
+/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block
+/// Arr and the distance of Location from the beginning of Arr (expressed in a
+/// NonLoc that specifies the number of CharUnits). Returns nullopt when these
+/// cannot be determined.
+std::optional>
+computeOffset(ProgramStateRef State, SValBuilder , SVal Location) {
+  QualType T = SVB.getArrayIndexType();
+  auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) {
+// We will use this utility to add and multiply values.
+return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs();
+  };
 
-public:
-  RegionRawOffsetV2(const SubRegion *base, NonLoc offset)
-  : baseRegion(base), byteOffset(offset) { assert(base); }
+  const auto *Region = dyn_cast_or_null(Location.getAsRegion());
+  std::optional Offset = std::nullopt;
+
+  while (const auto *ERegion = dyn_cast_or_null(Region)) {
+const auto Index = ERegion->getIndex().getAs();
+if (!Index)
+  return std::nullopt;
+
+QualType ElemType = ERegion->getElementType();
+// If the element is an incomplete type, go no further.
+if (ElemType->isIncompleteType())
+  return std::nullopt;
+
+// Calculate Delta = Index * sizeof(ElemType).
+NonLoc Size = SVB.makeArrayIndex(
+SVB.getContext().getTypeSizeInChars(ElemType).getQuantity());
+auto Delta = Calc(BO_Mul, *Index, Size);
+if (!Delta)
+  return std::nullopt;
+
+// Perform Offset += Delta, handling the initial nullopt as 0.
+if (!Offset) {
+  Offset = Delta;
+} else {
+  Offset = Calc(BO_Add, *Offset, *Delta);
+  if (!Offset)
+return std::nullopt;
+}

steakhal wrote:

I still dislike this, but I won't object.
I find `if` statements assigning to the same variable a codesmell.

https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-19 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 



@@ -32,42 +32,72 @@ using namespace taint;
 namespace {
 class ArrayBoundCheckerV2 :
 public Checker {
-  mutable std::unique_ptr BT;
-  mutable std::unique_ptr TaintBT;
+  BugType BT{this, "Out-of-bound access"};
+  BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
-  enum OOB_Kind { OOB_Precedes, OOB_Excedes };
+  enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-  void reportOOB(CheckerContext , ProgramStateRef errorState,
- OOB_Kind kind) const;
-  void reportTaintOOB(CheckerContext , ProgramStateRef errorState,
-  SVal TaintedSVal) const;
+  void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind,
+ SVal TaintedSVal = UnknownVal()) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext );
 
 public:
   void checkLocation(SVal l, bool isLoad, const Stmt *S,
  CheckerContext ) const;
 };
+} // anonymous namespace
 
-// FIXME: Eventually replace RegionRawOffset with this class.
-class RegionRawOffsetV2 {
-private:
-  const SubRegion *baseRegion;
-  NonLoc byteOffset;
+/// For a given Location that can be represented as a symbolic expression
+/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block
+/// Arr and the distance of Location from the beginning of Arr (expressed in a
+/// NonLoc that specifies the number of CharUnits). Returns nullopt when these
+/// cannot be determined.
+std::optional>
+computeOffset(ProgramStateRef State, SValBuilder , SVal Location) {
+  QualType T = SVB.getArrayIndexType();
+  auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) {
+// We will use this utility to add and multiply values.
+return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs();
+  };
 
-public:
-  RegionRawOffsetV2(const SubRegion *base, NonLoc offset)
-  : baseRegion(base), byteOffset(offset) { assert(base); }
+  const auto *Region = dyn_cast_or_null(Location.getAsRegion());
+  std::optional Offset = std::nullopt;
+
+  while (const auto *ERegion = dyn_cast_or_null(Region)) {
+const auto Index = ERegion->getIndex().getAs();
+if (!Index)
+  return std::nullopt;
+
+QualType ElemType = ERegion->getElementType();
+// If the element is an incomplete type, go no further.
+if (ElemType->isIncompleteType())

steakhal wrote:

Just run the test suite with an assert there. I bet you will find a case given 
that we have a comment about it.

https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-19 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 


https://github.com/steakhal edited 
https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-19 Thread Balazs Benics via cfe-commits
=?utf-8?q?Don=C3=A1t?= Nagy ,
=?utf-8?q?Don=C3=A1t?= Nagy 
Message-ID:
In-Reply-To: 


https://github.com/steakhal approved this pull request.

No blocking comments.

https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-19 Thread via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 


DonatNagyE wrote:

@steakhal Could you briefly look at this commit? I'm open to modifying it if 
you say that it's necessary, but if I could merge it, then I'd be able to 
rebase and finalize my next commit (which adds detailed diagnostic messages to 
ArrayBoundV2).

https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-16 Thread via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 



@@ -32,42 +32,72 @@ using namespace taint;
 namespace {
 class ArrayBoundCheckerV2 :
 public Checker {
-  mutable std::unique_ptr BT;
-  mutable std::unique_ptr TaintBT;
+  BugType BT{this, "Out-of-bound access"};
+  BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
-  enum OOB_Kind { OOB_Precedes, OOB_Excedes };
+  enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-  void reportOOB(CheckerContext , ProgramStateRef errorState,
- OOB_Kind kind) const;
-  void reportTaintOOB(CheckerContext , ProgramStateRef errorState,
-  SVal TaintedSVal) const;
+  void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind,
+ SVal TaintedSVal = UnknownVal()) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext );
 
 public:
   void checkLocation(SVal l, bool isLoad, const Stmt *S,
  CheckerContext ) const;
 };
+} // anonymous namespace
 
-// FIXME: Eventually replace RegionRawOffset with this class.
-class RegionRawOffsetV2 {
-private:
-  const SubRegion *baseRegion;
-  NonLoc byteOffset;
+/// For a given Location that can be represented as a symbolic expression
+/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block
+/// Arr and the distance of Location from the beginning of Arr (expressed in a
+/// NonLoc that specifies the number of CharUnits). Returns nullopt when these
+/// cannot be determined.
+std::optional>
+computeOffset(ProgramStateRef State, SValBuilder , SVal Location) {
+  QualType T = SVB.getArrayIndexType();
+  auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) {

DonatNagyE wrote:

Now I recall why I picked the short name -- it's the definition of the lambda 
where a longer name doesn't fit. However, I was just able to squeeze 
`EvalBinOp` in by reducing `LHS` and `RHS` to `L` and `R` (which seems to be 
enough in a single-line lambda). 

https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-16 Thread via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 


https://github.com/DonatNagyE updated 
https://github.com/llvm/llvm-project/pull/67572

>From 7bd280e2da3f2ee8fe5fd7086a4704331f21b435 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= 
Date: Wed, 6 Sep 2023 13:39:27 +0200
Subject: [PATCH 1/3] [analyzer][NFC] Simplifications in ArrayBoundV2

I'm planning to improve diagnostics generation in `ArrayBoundCheckerV2`
but before that I'm refactoring the source code to clean up some
over-complicated code and an inaccurate comment.

Changes in this commit:
- Remove the `mutable std::unique_ptr` boilerplate, because
  it's no longer needed.
- Remove the code duplication between the methods `reportOOB()` and
  `reportTaintedOOB()`.
- Eliminate the class `RegionRawOffsetV2` because it's just a "reinvent
  the wheel" version of `std::pair` and it was used only once, as a
  temporary object that was immediately decomposed. (I suspect that
  `RegionRawOffset` in MemRegion.cpp could also be eliminated.)
- Flatten the code of `computeOffset()` which had contained six nested
  indentation levels before this commit.
- Ensure that `computeOffset()` returns `std::nullopt` instead of a
  `{Region, }` pair in the case when it encounters a
  `Location` that is not an `ElementRegion`. This ensures that the
  `checkLocation` callback returns early when it handles a memory access
  where it has "nothing to do" (no subscript operation or equivalent
  pointer arithmetic). (Note that this is still NFC because zero is a
  valid index everywhere, so the old logic without this shortcut
  eventually reached the same conclusion.)
- Correct a wrong explanation comment in `getSimplifiedOffsets()`.
---
 .../Checkers/ArrayBoundCheckerV2.cpp  | 221 +++---
 1 file changed, 86 insertions(+), 135 deletions(-)

diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp 
b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index db4a2fcea9b2cdd..c2ffcdf5e306d1f 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -32,15 +32,13 @@ using namespace taint;
 namespace {
 class ArrayBoundCheckerV2 :
 public Checker {
-  mutable std::unique_ptr BT;
-  mutable std::unique_ptr TaintBT;
+  BugType BT{this, "Out-of-bound access"};
+  BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
-  enum OOB_Kind { OOB_Precedes, OOB_Excedes };
+  enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-  void reportOOB(CheckerContext , ProgramStateRef errorState,
- OOB_Kind kind) const;
-  void reportTaintOOB(CheckerContext , ProgramStateRef errorState,
-  SVal TaintedSVal) const;
+  void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind,
+ SVal TaintedSVal = UnknownVal()) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext );
 
@@ -48,26 +46,58 @@ class ArrayBoundCheckerV2 :
   void checkLocation(SVal l, bool isLoad, const Stmt *S,
  CheckerContext ) const;
 };
+} // anonymous namespace
 
-// FIXME: Eventually replace RegionRawOffset with this class.
-class RegionRawOffsetV2 {
-private:
-  const SubRegion *baseRegion;
-  NonLoc byteOffset;
+/// For a given Location that can be represented as a symbolic expression
+/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block
+/// Arr and the distance of Location from the beginning of Arr (expressed in a
+/// NonLoc that specifies the number of CharUnits). Returns nullopt when these
+/// cannot be determined.
+std::optional>
+computeOffset(ProgramStateRef State, SValBuilder , SVal Location) {
+  QualType T = SVB.getArrayIndexType();
+  auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) {
+// We will use this utility to add and multiply values.
+return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs();
+  };
 
-public:
-  RegionRawOffsetV2(const SubRegion *base, NonLoc offset)
-  : baseRegion(base), byteOffset(offset) { assert(base); }
+  const auto *Region = dyn_cast_or_null(Location.getAsRegion());
+  std::optional Offset = std::nullopt;
+
+  while (const auto *ERegion = dyn_cast_or_null(Region)) {
+const auto Index = ERegion->getIndex().getAs();
+if (!Index)
+  return std::nullopt;
+
+QualType ElemType = ERegion->getElementType();
+// If the element is an incomplete type, go no further.
+if (ElemType->isIncompleteType())
+  return std::nullopt;
+
+// Calculate Delta = Index * sizeof(ElemType).
+NonLoc Size = SVB.makeArrayIndex(
+SVB.getContext().getTypeSizeInChars(ElemType).getQuantity());
+auto Delta = Calc(BO_Mul, *Index, Size);
+if (!Delta)
+  return std::nullopt;
+
+// Perform Offset += Delta, handling the initial nullopt as 0.
+if (!Offset) {
+  Offset = Delta;
+} else {
+  Offset = Calc(BO_Add, *Offset, 

[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-16 Thread via cfe-commits
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 


github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff b22917e6e2a0aec05474f58e64b7e87d1ea0a054 
59bccd98022f61e6a656dca6ac3fc5622f994226 -- 
clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
``





View the diff from clang-format here.


``diff
diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp 
b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index 0b53aa9b2051..6819cd6a6b90 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -56,7 +56,8 @@ public:
 std::optional>
 computeOffset(ProgramStateRef State, SValBuilder , SVal Location) {
   QualType T = SVB.getArrayIndexType();
-  auto EvalBinOp = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc 
RHS) {
+  auto EvalBinOp = [, State, T](BinaryOperatorKind Op, NonLoc LHS,
+NonLoc RHS) {
 // We will use this utility to add and multiply values.
 return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs();
   };

``




https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-16 Thread via cfe-commits
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 



@@ -32,42 +32,72 @@ using namespace taint;
 namespace {
 class ArrayBoundCheckerV2 :
 public Checker {
-  mutable std::unique_ptr BT;
-  mutable std::unique_ptr TaintBT;
+  BugType BT{this, "Out-of-bound access"};
+  BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
-  enum OOB_Kind { OOB_Precedes, OOB_Excedes };
+  enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-  void reportOOB(CheckerContext , ProgramStateRef errorState,
- OOB_Kind kind) const;
-  void reportTaintOOB(CheckerContext , ProgramStateRef errorState,
-  SVal TaintedSVal) const;
+  void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind,
+ SVal TaintedSVal = UnknownVal()) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext );
 
 public:
   void checkLocation(SVal l, bool isLoad, const Stmt *S,
  CheckerContext ) const;
 };
+} // anonymous namespace
 
-// FIXME: Eventually replace RegionRawOffset with this class.
-class RegionRawOffsetV2 {
-private:
-  const SubRegion *baseRegion;
-  NonLoc byteOffset;
+/// For a given Location that can be represented as a symbolic expression
+/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block
+/// Arr and the distance of Location from the beginning of Arr (expressed in a
+/// NonLoc that specifies the number of CharUnits). Returns nullopt when these
+/// cannot be determined.
+std::optional>
+computeOffset(ProgramStateRef State, SValBuilder , SVal Location) {
+  QualType T = SVB.getArrayIndexType();
+  auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) {
+// We will use this utility to add and multiply values.
+return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs();
+  };
 
-public:
-  RegionRawOffsetV2(const SubRegion *base, NonLoc offset)
-  : baseRegion(base), byteOffset(offset) { assert(base); }
+  const auto *Region = dyn_cast_or_null(Location.getAsRegion());
+  std::optional Offset = std::nullopt;

DonatNagyE wrote:

Also note that creating zero-valued `NonLoc` (with the appropriate type...) is 
approximately two lines of code, so overall this solution is less verbose than 
using a separate early return to filter out the situations when the initial 
value of `Region` is not an `ElementRegion`. 

https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-16 Thread via cfe-commits
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 


https://github.com/DonatNagyE updated 
https://github.com/llvm/llvm-project/pull/67572

>From 7bd280e2da3f2ee8fe5fd7086a4704331f21b435 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= 
Date: Wed, 6 Sep 2023 13:39:27 +0200
Subject: [PATCH 1/2] [analyzer][NFC] Simplifications in ArrayBoundV2

I'm planning to improve diagnostics generation in `ArrayBoundCheckerV2`
but before that I'm refactoring the source code to clean up some
over-complicated code and an inaccurate comment.

Changes in this commit:
- Remove the `mutable std::unique_ptr` boilerplate, because
  it's no longer needed.
- Remove the code duplication between the methods `reportOOB()` and
  `reportTaintedOOB()`.
- Eliminate the class `RegionRawOffsetV2` because it's just a "reinvent
  the wheel" version of `std::pair` and it was used only once, as a
  temporary object that was immediately decomposed. (I suspect that
  `RegionRawOffset` in MemRegion.cpp could also be eliminated.)
- Flatten the code of `computeOffset()` which had contained six nested
  indentation levels before this commit.
- Ensure that `computeOffset()` returns `std::nullopt` instead of a
  `{Region, }` pair in the case when it encounters a
  `Location` that is not an `ElementRegion`. This ensures that the
  `checkLocation` callback returns early when it handles a memory access
  where it has "nothing to do" (no subscript operation or equivalent
  pointer arithmetic). (Note that this is still NFC because zero is a
  valid index everywhere, so the old logic without this shortcut
  eventually reached the same conclusion.)
- Correct a wrong explanation comment in `getSimplifiedOffsets()`.
---
 .../Checkers/ArrayBoundCheckerV2.cpp  | 221 +++---
 1 file changed, 86 insertions(+), 135 deletions(-)

diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp 
b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index db4a2fcea9b2cdd..c2ffcdf5e306d1f 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -32,15 +32,13 @@ using namespace taint;
 namespace {
 class ArrayBoundCheckerV2 :
 public Checker {
-  mutable std::unique_ptr BT;
-  mutable std::unique_ptr TaintBT;
+  BugType BT{this, "Out-of-bound access"};
+  BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
-  enum OOB_Kind { OOB_Precedes, OOB_Excedes };
+  enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-  void reportOOB(CheckerContext , ProgramStateRef errorState,
- OOB_Kind kind) const;
-  void reportTaintOOB(CheckerContext , ProgramStateRef errorState,
-  SVal TaintedSVal) const;
+  void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind,
+ SVal TaintedSVal = UnknownVal()) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext );
 
@@ -48,26 +46,58 @@ class ArrayBoundCheckerV2 :
   void checkLocation(SVal l, bool isLoad, const Stmt *S,
  CheckerContext ) const;
 };
+} // anonymous namespace
 
-// FIXME: Eventually replace RegionRawOffset with this class.
-class RegionRawOffsetV2 {
-private:
-  const SubRegion *baseRegion;
-  NonLoc byteOffset;
+/// For a given Location that can be represented as a symbolic expression
+/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block
+/// Arr and the distance of Location from the beginning of Arr (expressed in a
+/// NonLoc that specifies the number of CharUnits). Returns nullopt when these
+/// cannot be determined.
+std::optional>
+computeOffset(ProgramStateRef State, SValBuilder , SVal Location) {
+  QualType T = SVB.getArrayIndexType();
+  auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) {
+// We will use this utility to add and multiply values.
+return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs();
+  };
 
-public:
-  RegionRawOffsetV2(const SubRegion *base, NonLoc offset)
-  : baseRegion(base), byteOffset(offset) { assert(base); }
+  const auto *Region = dyn_cast_or_null(Location.getAsRegion());
+  std::optional Offset = std::nullopt;
+
+  while (const auto *ERegion = dyn_cast_or_null(Region)) {
+const auto Index = ERegion->getIndex().getAs();
+if (!Index)
+  return std::nullopt;
+
+QualType ElemType = ERegion->getElementType();
+// If the element is an incomplete type, go no further.
+if (ElemType->isIncompleteType())
+  return std::nullopt;
+
+// Calculate Delta = Index * sizeof(ElemType).
+NonLoc Size = SVB.makeArrayIndex(
+SVB.getContext().getTypeSizeInChars(ElemType).getQuantity());
+auto Delta = Calc(BO_Mul, *Index, Size);
+if (!Delta)
+  return std::nullopt;
+
+// Perform Offset += Delta, handling the initial nullopt as 0.
+if (!Offset) {
+  Offset = Delta;
+} else {
+  Offset = Calc(BO_Add, *Offset, *Delta);
+  if 

[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-16 Thread via cfe-commits


@@ -32,42 +32,72 @@ using namespace taint;
 namespace {
 class ArrayBoundCheckerV2 :
 public Checker {
-  mutable std::unique_ptr BT;
-  mutable std::unique_ptr TaintBT;
+  BugType BT{this, "Out-of-bound access"};
+  BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
-  enum OOB_Kind { OOB_Precedes, OOB_Excedes };
+  enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-  void reportOOB(CheckerContext , ProgramStateRef errorState,
- OOB_Kind kind) const;
-  void reportTaintOOB(CheckerContext , ProgramStateRef errorState,
-  SVal TaintedSVal) const;
+  void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind,
+ SVal TaintedSVal = UnknownVal()) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext );
 
 public:
   void checkLocation(SVal l, bool isLoad, const Stmt *S,
  CheckerContext ) const;
 };
+} // anonymous namespace
 
-// FIXME: Eventually replace RegionRawOffset with this class.
-class RegionRawOffsetV2 {
-private:
-  const SubRegion *baseRegion;
-  NonLoc byteOffset;
+/// For a given Location that can be represented as a symbolic expression
+/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block
+/// Arr and the distance of Location from the beginning of Arr (expressed in a
+/// NonLoc that specifies the number of CharUnits). Returns nullopt when these
+/// cannot be determined.
+std::optional>
+computeOffset(ProgramStateRef State, SValBuilder , SVal Location) {
+  QualType T = SVB.getArrayIndexType();
+  auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) {

DonatNagyE wrote:

Good suggestion, I'll change it. Previously the four-character name was needed 
because I was using this in very deeply indented code, but now I can switch to 
this more descriptive name.

https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-16 Thread via cfe-commits


@@ -32,42 +32,72 @@ using namespace taint;
 namespace {
 class ArrayBoundCheckerV2 :
 public Checker {
-  mutable std::unique_ptr BT;
-  mutable std::unique_ptr TaintBT;
+  BugType BT{this, "Out-of-bound access"};
+  BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
-  enum OOB_Kind { OOB_Precedes, OOB_Excedes };
+  enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-  void reportOOB(CheckerContext , ProgramStateRef errorState,
- OOB_Kind kind) const;
-  void reportTaintOOB(CheckerContext , ProgramStateRef errorState,
-  SVal TaintedSVal) const;
+  void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind,
+ SVal TaintedSVal = UnknownVal()) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext );
 
 public:
   void checkLocation(SVal l, bool isLoad, const Stmt *S,
  CheckerContext ) const;
 };
+} // anonymous namespace
 
-// FIXME: Eventually replace RegionRawOffset with this class.
-class RegionRawOffsetV2 {
-private:
-  const SubRegion *baseRegion;
-  NonLoc byteOffset;
+/// For a given Location that can be represented as a symbolic expression
+/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block
+/// Arr and the distance of Location from the beginning of Arr (expressed in a
+/// NonLoc that specifies the number of CharUnits). Returns nullopt when these
+/// cannot be determined.
+std::optional>
+computeOffset(ProgramStateRef State, SValBuilder , SVal Location) {
+  QualType T = SVB.getArrayIndexType();
+  auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) {
+// We will use this utility to add and multiply values.
+return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs();
+  };
 
-public:
-  RegionRawOffsetV2(const SubRegion *base, NonLoc offset)
-  : baseRegion(base), byteOffset(offset) { assert(base); }
+  const auto *Region = dyn_cast_or_null(Location.getAsRegion());
+  std::optional Offset = std::nullopt;
+
+  while (const auto *ERegion = dyn_cast_or_null(Region)) {
+const auto Index = ERegion->getIndex().getAs();
+if (!Index)
+  return std::nullopt;
+
+QualType ElemType = ERegion->getElementType();
+// If the element is an incomplete type, go no further.
+if (ElemType->isIncompleteType())

DonatNagyE wrote:

Good question, I "inherited" this check from the older variants of the code and 
I can easily imagine that it's never triggered. I'd prefer to avoid this 
question by just adding a "Paranoia: " prefix to this comment -- but I can 
investigate the situation if you think that it's important.

https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-16 Thread via cfe-commits


@@ -32,42 +32,72 @@ using namespace taint;
 namespace {
 class ArrayBoundCheckerV2 :
 public Checker {
-  mutable std::unique_ptr BT;
-  mutable std::unique_ptr TaintBT;
+  BugType BT{this, "Out-of-bound access"};
+  BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
-  enum OOB_Kind { OOB_Precedes, OOB_Excedes };
+  enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-  void reportOOB(CheckerContext , ProgramStateRef errorState,
- OOB_Kind kind) const;
-  void reportTaintOOB(CheckerContext , ProgramStateRef errorState,
-  SVal TaintedSVal) const;
+  void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind,
+ SVal TaintedSVal = UnknownVal()) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext );
 
 public:
   void checkLocation(SVal l, bool isLoad, const Stmt *S,
  CheckerContext ) const;
 };
+} // anonymous namespace
 
-// FIXME: Eventually replace RegionRawOffset with this class.
-class RegionRawOffsetV2 {
-private:
-  const SubRegion *baseRegion;
-  NonLoc byteOffset;
+/// For a given Location that can be represented as a symbolic expression
+/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block
+/// Arr and the distance of Location from the beginning of Arr (expressed in a
+/// NonLoc that specifies the number of CharUnits). Returns nullopt when these
+/// cannot be determined.
+std::optional>
+computeOffset(ProgramStateRef State, SValBuilder , SVal Location) {
+  QualType T = SVB.getArrayIndexType();
+  auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) {
+// We will use this utility to add and multiply values.
+return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs();
+  };
 
-public:
-  RegionRawOffsetV2(const SubRegion *base, NonLoc offset)
-  : baseRegion(base), byteOffset(offset) { assert(base); }
+  const auto *Region = dyn_cast_or_null(Location.getAsRegion());
+  std::optional Offset = std::nullopt;
+
+  while (const auto *ERegion = dyn_cast_or_null(Region)) {
+const auto Index = ERegion->getIndex().getAs();
+if (!Index)
+  return std::nullopt;
+
+QualType ElemType = ERegion->getElementType();
+// If the element is an incomplete type, go no further.
+if (ElemType->isIncompleteType())
+  return std::nullopt;
+
+// Calculate Delta = Index * sizeof(ElemType).
+NonLoc Size = SVB.makeArrayIndex(
+SVB.getContext().getTypeSizeInChars(ElemType).getQuantity());
+auto Delta = Calc(BO_Mul, *Index, Size);
+if (!Delta)
+  return std::nullopt;
+
+// Perform Offset += Delta, handling the initial nullopt as 0.
+if (!Offset) {
+  Offset = Delta;
+} else {
+  Offset = Calc(BO_Add, *Offset, *Delta);
+  if (!Offset)
+return std::nullopt;
+}

DonatNagyE wrote:

The `!Offset` case is just for handling the initial value of the variable 
`Offset`; which is an intentional special case to detect situations where the 
`while` loop wasn't entered at all. 

https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-16 Thread via cfe-commits


@@ -32,42 +32,72 @@ using namespace taint;
 namespace {
 class ArrayBoundCheckerV2 :
 public Checker {
-  mutable std::unique_ptr BT;
-  mutable std::unique_ptr TaintBT;
+  BugType BT{this, "Out-of-bound access"};
+  BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
-  enum OOB_Kind { OOB_Precedes, OOB_Excedes };
+  enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-  void reportOOB(CheckerContext , ProgramStateRef errorState,
- OOB_Kind kind) const;
-  void reportTaintOOB(CheckerContext , ProgramStateRef errorState,
-  SVal TaintedSVal) const;
+  void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind,
+ SVal TaintedSVal = UnknownVal()) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext );
 
 public:
   void checkLocation(SVal l, bool isLoad, const Stmt *S,
  CheckerContext ) const;
 };
+} // anonymous namespace
 
-// FIXME: Eventually replace RegionRawOffset with this class.
-class RegionRawOffsetV2 {
-private:
-  const SubRegion *baseRegion;
-  NonLoc byteOffset;
+/// For a given Location that can be represented as a symbolic expression
+/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block
+/// Arr and the distance of Location from the beginning of Arr (expressed in a
+/// NonLoc that specifies the number of CharUnits). Returns nullopt when these
+/// cannot be determined.
+std::optional>
+computeOffset(ProgramStateRef State, SValBuilder , SVal Location) {
+  QualType T = SVB.getArrayIndexType();
+  auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) {
+// We will use this utility to add and multiply values.
+return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs();
+  };
 
-public:
-  RegionRawOffsetV2(const SubRegion *base, NonLoc offset)
-  : baseRegion(base), byteOffset(offset) { assert(base); }
+  const auto *Region = dyn_cast_or_null(Location.getAsRegion());
+  std::optional Offset = std::nullopt;

DonatNagyE wrote:

It's intentionally set to `std::nullopt` instead of zero because this way we 
won't perform the bounds checks on Location checks involving regions that don't 
have any `ElementRegion` layers.

I think that this might provide a significant performance advantage, as it may 
be costly to perform simplification steps and evaluate comparisons each time a 
"plain" variable is accessed.

I'd like to keep this solution, but I'll add a comment to explain the 
motivations.

https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-16 Thread via cfe-commits

DonatNagyE wrote:

After some delays (I had a short vacation, then forgot about this patch) I 
finally got some test results... and they revealed that it was causing lots of 
segfaults. @steakhal thanks for asking for the experimental confirmation, it 
was useful!

After fixing the buggy line and pushing the update, I re-analyzed the open 
source projects, and this produced the intended results: the analysis did not 
crash and the set of the reports did not change. Based on this I think that 
this commit is ready to be merged now.

For reference, I insert the table of the test results, but it contains no 
differences:
| CodeChecker results | | |
| --- | --- | --- |
| bitcoin_v0.20.1_baseline_vs_bitcoin_v0.20.1_new | [New 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=bitcoin_v0.20.1_baseline=bitcoin_v0.20.1_new=on=New)
 | [Lost 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=bitcoin_v0.20.1_baseline=bitcoin_v0.20.1_new=on=Resolved)
 |
| redis_6.2.6_baseline_vs_redis_6.2.6_new | [New 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=redis_6.2.6_baseline=redis_6.2.6_new=on=New)
 | [Lost 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=redis_6.2.6_baseline=redis_6.2.6_new=on=Resolved)
 |
| protobuf_v3.13.0_baseline_vs_protobuf_v3.13.0_new | [New 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=protobuf_v3.13.0_baseline=protobuf_v3.13.0_new=on=New)
 | [Lost 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=protobuf_v3.13.0_baseline=protobuf_v3.13.0_new=on=Resolved)
 |
| ffmpeg_n4.3.1_baseline_vs_ffmpeg_n4.3.1_new | [New 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=ffmpeg_n4.3.1_baseline=ffmpeg_n4.3.1_new=on=New)
 | [Lost 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=ffmpeg_n4.3.1_baseline=ffmpeg_n4.3.1_new=on=Resolved)
 |
| postgres_REL_13_0_baseline_vs_postgres_REL_13_0_new | [New 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=postgres_REL_13_0_baseline=postgres_REL_13_0_new=on=New)
 | [Lost 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=postgres_REL_13_0_baseline=postgres_REL_13_0_new=on=Resolved)
 |
| sqlite_version-3.33.0_baseline_vs_sqlite_version-3.33.0_new | [New 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=sqlite_version-3.33.0_baseline=sqlite_version-3.33.0_new=on=New)
 | [Lost 
reports](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=sqlite_version-3.33.0_baseline=sqlite_version-3.33.0_new=on=Resolved)
 |

https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-15 Thread Balazs Benics via cfe-commits


@@ -32,42 +32,72 @@ using namespace taint;
 namespace {
 class ArrayBoundCheckerV2 :
 public Checker {
-  mutable std::unique_ptr BT;
-  mutable std::unique_ptr TaintBT;
+  BugType BT{this, "Out-of-bound access"};
+  BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
-  enum OOB_Kind { OOB_Precedes, OOB_Excedes };
+  enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-  void reportOOB(CheckerContext , ProgramStateRef errorState,
- OOB_Kind kind) const;
-  void reportTaintOOB(CheckerContext , ProgramStateRef errorState,
-  SVal TaintedSVal) const;
+  void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind,
+ SVal TaintedSVal = UnknownVal()) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext );
 
 public:
   void checkLocation(SVal l, bool isLoad, const Stmt *S,
  CheckerContext ) const;
 };
+} // anonymous namespace
 
-// FIXME: Eventually replace RegionRawOffset with this class.
-class RegionRawOffsetV2 {
-private:
-  const SubRegion *baseRegion;
-  NonLoc byteOffset;
+/// For a given Location that can be represented as a symbolic expression
+/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block
+/// Arr and the distance of Location from the beginning of Arr (expressed in a
+/// NonLoc that specifies the number of CharUnits). Returns nullopt when these
+/// cannot be determined.
+std::optional>
+computeOffset(ProgramStateRef State, SValBuilder , SVal Location) {
+  QualType T = SVB.getArrayIndexType();
+  auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) {
+// We will use this utility to add and multiply values.
+return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs();
+  };
 
-public:
-  RegionRawOffsetV2(const SubRegion *base, NonLoc offset)
-  : baseRegion(base), byteOffset(offset) { assert(base); }
+  const auto *Region = dyn_cast_or_null(Location.getAsRegion());
+  std::optional Offset = std::nullopt;
+
+  while (const auto *ERegion = dyn_cast_or_null(Region)) {
+const auto Index = ERegion->getIndex().getAs();
+if (!Index)
+  return std::nullopt;
+
+QualType ElemType = ERegion->getElementType();
+// If the element is an incomplete type, go no further.
+if (ElemType->isIncompleteType())

steakhal wrote:

How can this be incomplete at this point?
VLAs? I'd appreciate an example in the comment.

https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-15 Thread Balazs Benics via cfe-commits


@@ -32,42 +32,72 @@ using namespace taint;
 namespace {
 class ArrayBoundCheckerV2 :
 public Checker {
-  mutable std::unique_ptr BT;
-  mutable std::unique_ptr TaintBT;
+  BugType BT{this, "Out-of-bound access"};
+  BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
-  enum OOB_Kind { OOB_Precedes, OOB_Excedes };
+  enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-  void reportOOB(CheckerContext , ProgramStateRef errorState,
- OOB_Kind kind) const;
-  void reportTaintOOB(CheckerContext , ProgramStateRef errorState,
-  SVal TaintedSVal) const;
+  void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind,
+ SVal TaintedSVal = UnknownVal()) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext );
 
 public:
   void checkLocation(SVal l, bool isLoad, const Stmt *S,
  CheckerContext ) const;
 };
+} // anonymous namespace
 
-// FIXME: Eventually replace RegionRawOffset with this class.
-class RegionRawOffsetV2 {
-private:
-  const SubRegion *baseRegion;
-  NonLoc byteOffset;
+/// For a given Location that can be represented as a symbolic expression
+/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block
+/// Arr and the distance of Location from the beginning of Arr (expressed in a
+/// NonLoc that specifies the number of CharUnits). Returns nullopt when these
+/// cannot be determined.
+std::optional>
+computeOffset(ProgramStateRef State, SValBuilder , SVal Location) {
+  QualType T = SVB.getArrayIndexType();
+  auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) {

steakhal wrote:

How about calling this EvalBinOp?

https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-15 Thread Balazs Benics via cfe-commits


@@ -32,42 +32,72 @@ using namespace taint;
 namespace {
 class ArrayBoundCheckerV2 :
 public Checker {
-  mutable std::unique_ptr BT;
-  mutable std::unique_ptr TaintBT;
+  BugType BT{this, "Out-of-bound access"};
+  BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
-  enum OOB_Kind { OOB_Precedes, OOB_Excedes };
+  enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-  void reportOOB(CheckerContext , ProgramStateRef errorState,
- OOB_Kind kind) const;
-  void reportTaintOOB(CheckerContext , ProgramStateRef errorState,
-  SVal TaintedSVal) const;
+  void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind,
+ SVal TaintedSVal = UnknownVal()) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext );
 
 public:
   void checkLocation(SVal l, bool isLoad, const Stmt *S,
  CheckerContext ) const;
 };
+} // anonymous namespace
 
-// FIXME: Eventually replace RegionRawOffset with this class.
-class RegionRawOffsetV2 {
-private:
-  const SubRegion *baseRegion;
-  NonLoc byteOffset;
+/// For a given Location that can be represented as a symbolic expression
+/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block
+/// Arr and the distance of Location from the beginning of Arr (expressed in a
+/// NonLoc that specifies the number of CharUnits). Returns nullopt when these
+/// cannot be determined.
+std::optional>
+computeOffset(ProgramStateRef State, SValBuilder , SVal Location) {
+  QualType T = SVB.getArrayIndexType();
+  auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) {
+// We will use this utility to add and multiply values.
+return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs();
+  };
 
-public:
-  RegionRawOffsetV2(const SubRegion *base, NonLoc offset)
-  : baseRegion(base), byteOffset(offset) { assert(base); }
+  const auto *Region = dyn_cast_or_null(Location.getAsRegion());
+  std::optional Offset = std::nullopt;

steakhal wrote:

How about setting this to zero?

https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-15 Thread Balazs Benics via cfe-commits


@@ -32,42 +32,72 @@ using namespace taint;
 namespace {
 class ArrayBoundCheckerV2 :
 public Checker {
-  mutable std::unique_ptr BT;
-  mutable std::unique_ptr TaintBT;
+  BugType BT{this, "Out-of-bound access"};
+  BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
-  enum OOB_Kind { OOB_Precedes, OOB_Excedes };
+  enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-  void reportOOB(CheckerContext , ProgramStateRef errorState,
- OOB_Kind kind) const;
-  void reportTaintOOB(CheckerContext , ProgramStateRef errorState,
-  SVal TaintedSVal) const;
+  void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind,
+ SVal TaintedSVal = UnknownVal()) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext );
 
 public:
   void checkLocation(SVal l, bool isLoad, const Stmt *S,
  CheckerContext ) const;
 };
+} // anonymous namespace
 
-// FIXME: Eventually replace RegionRawOffset with this class.
-class RegionRawOffsetV2 {
-private:
-  const SubRegion *baseRegion;
-  NonLoc byteOffset;
+/// For a given Location that can be represented as a symbolic expression
+/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block
+/// Arr and the distance of Location from the beginning of Arr (expressed in a
+/// NonLoc that specifies the number of CharUnits). Returns nullopt when these
+/// cannot be determined.
+std::optional>
+computeOffset(ProgramStateRef State, SValBuilder , SVal Location) {
+  QualType T = SVB.getArrayIndexType();
+  auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) {
+// We will use this utility to add and multiply values.
+return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs();
+  };
 
-public:
-  RegionRawOffsetV2(const SubRegion *base, NonLoc offset)
-  : baseRegion(base), byteOffset(offset) { assert(base); }
+  const auto *Region = dyn_cast_or_null(Location.getAsRegion());
+  std::optional Offset = std::nullopt;
+
+  while (const auto *ERegion = dyn_cast_or_null(Region)) {
+const auto Index = ERegion->getIndex().getAs();
+if (!Index)
+  return std::nullopt;
+
+QualType ElemType = ERegion->getElementType();
+// If the element is an incomplete type, go no further.
+if (ElemType->isIncompleteType())
+  return std::nullopt;
+
+// Calculate Delta = Index * sizeof(ElemType).
+NonLoc Size = SVB.makeArrayIndex(
+SVB.getContext().getTypeSizeInChars(ElemType).getQuantity());
+auto Delta = Calc(BO_Mul, *Index, Size);
+if (!Delta)
+  return std::nullopt;
+
+// Perform Offset += Delta, handling the initial nullopt as 0.
+if (!Offset) {
+  Offset = Delta;
+} else {
+  Offset = Calc(BO_Add, *Offset, *Delta);
+  if (!Offset)
+return std::nullopt;
+}

steakhal wrote:

I feel we could simplify the control flow here.
```suggestion
  assert(Offset);
  Offset = Calc(Add, *Offset, *Delta);
  if (!Offset)
return std::nullopt;
```

https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-15 Thread Balazs Benics via cfe-commits

https://github.com/steakhal edited 
https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-15 Thread Balazs Benics via cfe-commits

https://github.com/steakhal requested changes to this pull request.


https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-13 Thread via cfe-commits

https://github.com/DonatNagyE updated 
https://github.com/llvm/llvm-project/pull/67572

>From 7bd280e2da3f2ee8fe5fd7086a4704331f21b435 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= 
Date: Wed, 6 Sep 2023 13:39:27 +0200
Subject: [PATCH] [analyzer][NFC] Simplifications in ArrayBoundV2

I'm planning to improve diagnostics generation in `ArrayBoundCheckerV2`
but before that I'm refactoring the source code to clean up some
over-complicated code and an inaccurate comment.

Changes in this commit:
- Remove the `mutable std::unique_ptr` boilerplate, because
  it's no longer needed.
- Remove the code duplication between the methods `reportOOB()` and
  `reportTaintedOOB()`.
- Eliminate the class `RegionRawOffsetV2` because it's just a "reinvent
  the wheel" version of `std::pair` and it was used only once, as a
  temporary object that was immediately decomposed. (I suspect that
  `RegionRawOffset` in MemRegion.cpp could also be eliminated.)
- Flatten the code of `computeOffset()` which had contained six nested
  indentation levels before this commit.
- Ensure that `computeOffset()` returns `std::nullopt` instead of a
  `{Region, }` pair in the case when it encounters a
  `Location` that is not an `ElementRegion`. This ensures that the
  `checkLocation` callback returns early when it handles a memory access
  where it has "nothing to do" (no subscript operation or equivalent
  pointer arithmetic). (Note that this is still NFC because zero is a
  valid index everywhere, so the old logic without this shortcut
  eventually reached the same conclusion.)
- Correct a wrong explanation comment in `getSimplifiedOffsets()`.
---
 .../Checkers/ArrayBoundCheckerV2.cpp  | 221 +++---
 1 file changed, 86 insertions(+), 135 deletions(-)

diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp 
b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index db4a2fcea9b2cdd..c2ffcdf5e306d1f 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -32,15 +32,13 @@ using namespace taint;
 namespace {
 class ArrayBoundCheckerV2 :
 public Checker {
-  mutable std::unique_ptr BT;
-  mutable std::unique_ptr TaintBT;
+  BugType BT{this, "Out-of-bound access"};
+  BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
-  enum OOB_Kind { OOB_Precedes, OOB_Excedes };
+  enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-  void reportOOB(CheckerContext , ProgramStateRef errorState,
- OOB_Kind kind) const;
-  void reportTaintOOB(CheckerContext , ProgramStateRef errorState,
-  SVal TaintedSVal) const;
+  void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind,
+ SVal TaintedSVal = UnknownVal()) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext );
 
@@ -48,26 +46,58 @@ class ArrayBoundCheckerV2 :
   void checkLocation(SVal l, bool isLoad, const Stmt *S,
  CheckerContext ) const;
 };
+} // anonymous namespace
 
-// FIXME: Eventually replace RegionRawOffset with this class.
-class RegionRawOffsetV2 {
-private:
-  const SubRegion *baseRegion;
-  NonLoc byteOffset;
+/// For a given Location that can be represented as a symbolic expression
+/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block
+/// Arr and the distance of Location from the beginning of Arr (expressed in a
+/// NonLoc that specifies the number of CharUnits). Returns nullopt when these
+/// cannot be determined.
+std::optional>
+computeOffset(ProgramStateRef State, SValBuilder , SVal Location) {
+  QualType T = SVB.getArrayIndexType();
+  auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) {
+// We will use this utility to add and multiply values.
+return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs();
+  };
 
-public:
-  RegionRawOffsetV2(const SubRegion *base, NonLoc offset)
-  : baseRegion(base), byteOffset(offset) { assert(base); }
+  const auto *Region = dyn_cast_or_null(Location.getAsRegion());
+  std::optional Offset = std::nullopt;
+
+  while (const auto *ERegion = dyn_cast_or_null(Region)) {
+const auto Index = ERegion->getIndex().getAs();
+if (!Index)
+  return std::nullopt;
+
+QualType ElemType = ERegion->getElementType();
+// If the element is an incomplete type, go no further.
+if (ElemType->isIncompleteType())
+  return std::nullopt;
+
+// Calculate Delta = Index * sizeof(ElemType).
+NonLoc Size = SVB.makeArrayIndex(
+SVB.getContext().getTypeSizeInChars(ElemType).getQuantity());
+auto Delta = Calc(BO_Mul, *Index, Size);
+if (!Delta)
+  return std::nullopt;
+
+// Perform Offset += Delta, handling the initial nullopt as 0.
+if (!Offset) {
+  Offset = Delta;
+} else {
+  Offset = Calc(BO_Add, *Offset, *Delta);
+  if (!Offset)
+return std::nullopt;
+}
 
-  NonLoc 

[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-10-13 Thread via cfe-commits

https://github.com/DonatNagyE updated 
https://github.com/llvm/llvm-project/pull/67572

>From 583b6c3bf838bf74899a1ce5ab53b3722ddb8e66 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= 
Date: Wed, 6 Sep 2023 13:39:27 +0200
Subject: [PATCH] [analyzer][NFC] Simplifications in ArrayBoundV2

I'm planning to improve diagnostics generation in `ArrayBoundCheckerV2`
but before that I'm refactoring the source code to clean up some
over-complicated code and an inaccurate comment.

Changes in this commit:
- Remove the `mutable std::unique_ptr` boilerplate, because
  it's no longer needed.
- Remove the code duplication between the methods `reportOOB()` and
  `reportTaintedOOB()`.
- Eliminate the class `RegionRawOffsetV2` because it's just a "reinvent
  the wheel" version of `std::pair` and it was used only once, as a
  temporary object that was immediately decomposed. (I suspect that
  `RegionRawOffset` in MemRegion.cpp could also be eliminated.)
- Flatten the code of `computeOffset()` which had contained six nested
  indentation levels before this commit.
- Ensure that `computeOffset()` returns `std::nullopt` instead of a
  `{Region, }` pair in the case when it encounters a
  `Location` that is not an `ElementRegion`. This ensures that the
  `checkLocation` callback returns early when it handles a memory access
  where it has "nothing to do" (no subscript operation or equivalent
  pointer arithmetic). (Note that this is still NFC because zero is a
  valid index everywhere, so the old logic without this shortcut
  eventually reached the same conclusion.)
- Correct a wrong explanation comment in `getSimplifiedOffsets()`.
---
 .../Checkers/ArrayBoundCheckerV2.cpp  | 221 +++---
 1 file changed, 86 insertions(+), 135 deletions(-)

diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp 
b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index db4a2fcea9b2cdd..e52eef0fe94370b 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -32,15 +32,13 @@ using namespace taint;
 namespace {
 class ArrayBoundCheckerV2 :
 public Checker {
-  mutable std::unique_ptr BT;
-  mutable std::unique_ptr TaintBT;
+  BugType BT{this, "Out-of-bound access"};
+  BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
-  enum OOB_Kind { OOB_Precedes, OOB_Excedes };
+  enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-  void reportOOB(CheckerContext , ProgramStateRef errorState,
- OOB_Kind kind) const;
-  void reportTaintOOB(CheckerContext , ProgramStateRef errorState,
-  SVal TaintedSVal) const;
+  void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind,
+ SVal TaintedSVal = UnknownVal()) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext );
 
@@ -48,26 +46,58 @@ class ArrayBoundCheckerV2 :
   void checkLocation(SVal l, bool isLoad, const Stmt *S,
  CheckerContext ) const;
 };
+} // anonymous namespace
 
-// FIXME: Eventually replace RegionRawOffset with this class.
-class RegionRawOffsetV2 {
-private:
-  const SubRegion *baseRegion;
-  NonLoc byteOffset;
+/// For a given Location that can be represented as a symbolic expression
+/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block
+/// Arr and the distance of Location from the beginning of Arr (expressed in a
+/// NonLoc that specifies the number of CharUnits). Returns nullopt when these
+/// cannot be determined.
+std::optional>
+computeOffset(ProgramStateRef State, SValBuilder , SVal Location) {
+  QualType T = SVB.getArrayIndexType();
+  auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) {
+// We will use this utility to add and multiply values.
+return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs();
+  };
 
-public:
-  RegionRawOffsetV2(const SubRegion *base, NonLoc offset)
-  : baseRegion(base), byteOffset(offset) { assert(base); }
+  const auto *Region = Location.getAsRegion()->getAs();
+  std::optional Offset = std::nullopt;
+
+  while (const auto *ERegion = dyn_cast_or_null(Region)) {
+const auto Index = ERegion->getIndex().getAs();
+if (!Index)
+  return std::nullopt;
+
+QualType ElemType = ERegion->getElementType();
+// If the element is an incomplete type, go no further.
+if (ElemType->isIncompleteType())
+  return std::nullopt;
+
+// Calculate Delta = Index * sizeof(ElemType).
+NonLoc Size = SVB.makeArrayIndex(
+SVB.getContext().getTypeSizeInChars(ElemType).getQuantity());
+auto Delta = Calc(BO_Mul, *Index, Size);
+if (!Delta)
+  return std::nullopt;
+
+// Perform Offset += Delta, handling the initial nullopt as 0.
+if (!Offset) {
+  Offset = Delta;
+} else {
+  Offset = Calc(BO_Add, *Offset, *Delta);
+  if (!Offset)
+return std::nullopt;
+}
 
-  NonLoc 

[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-09-29 Thread Balazs Benics via cfe-commits

steakhal wrote:

> I'm confident that this patch is NFC, but my claim is based on theoretical 
> reasoning (a.k.a. "I think I didn't make a mistake"). I have a background in 
> theoretical mathematics, so for me "I have a rough a proof in my head" is 
> intuitively stronger than "I verified it on some arbitrary input"; but you're 
> right that this change is so complex that it deserves an experimental check. 
> I started some analysis on open source projects and I'll give results on 
> Monday.

Oh I see. It makes sense to me. And on second thought, it shouldn't be much of 
a hassle doing it anyways.
I didn't mean to challenge you to be clear.

https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-09-29 Thread via cfe-commits

DonatNagyE wrote:

I'm confident that this patch is NFC, but my claim is based on theoretical 
reasoning (a.k.a. "I think I didn't make a mistake"). I have a background in 
theoretical mathematics, so for me "I have a rough a proof in my head" is 
intuitively stronger than "I verified it on some arbitrary input"; but you're 
right that this change is so complex that it deserves an experimental check. I 
started some analysis on open source projects and I'll give results on Monday.

https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-09-29 Thread Balazs Benics via cfe-commits

steakhal wrote:

I remember I looked this once. I postponed my comments because I was expecting 
some numbers or confirmation of that this patch is indeed NFC, thus no report 
or notes change. I'm not sure if this actually holds, given the changes, but 
let me know in any case.

BTW I'm fine with slight modifications if we call it NFCI or just without NFC.

If really nothing changes, consider this approved :)

https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-09-29 Thread Endre Fülöp via cfe-commits

https://github.com/gamesh411 approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/67572
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-09-28 Thread via cfe-commits

https://github.com/DonatNagyE updated 
https://github.com/llvm/llvm-project/pull/67572

>From 448999ecab28ea4d9768c1a8fa5fb4bb25e4bd32 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= 
Date: Wed, 6 Sep 2023 13:39:27 +0200
Subject: [PATCH] [analyzer][NFC] Simplifications in ArrayBoundV2

I'm planning to improve diagnostics generation in `ArrayBoundCheckerV2`
but before that I'm refactoring the source code to clean up some
over-complicated code and an inaccurate comment.

Changes in this commit:
- Remove the `mutable std::unique_ptr` boilerplate, because
  it's no longer needed.
- Remove the code duplication between the methods `reportOOB()` and
  `reportTaintedOOB()`.
- Eliminate the class `RegionRawOffsetV2` because it's just a "reinvent
  the wheel" version of `std::pair` and it was used only once, as a
  temporary object that was immediately decomposed. (I suspect that
  `RegionRawOffset` in MemRegion.cpp could also be eliminated.)
- Flatten the code of `computeOffset()` which had contained six nested
  indentation levels before this commit.
- Ensure that `computeOffset()` returns `std::nullopt` instead of a
  `{Region, }` pair in the case when it encounters a
  `Location` that is not an `ElementRegion`. This ensures that the
  `checkLocation` callback returns early when it handles a memory access
  where it has "nothing to do" (no subscript operation or equivalent
  pointer arithmetic). (Note that this is still NFC because zero is a
  valid index everywhere, so the old logic without this shortcut
  eventually reached the same conclusion.)
- Correct a wrong explanation comment in `getSimplifiedOffsets()`.
---
 .../Checkers/ArrayBoundCheckerV2.cpp  | 221 +++---
 1 file changed, 86 insertions(+), 135 deletions(-)

diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp 
b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index db4a2fcea9b2cdd..e52eef0fe94370b 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -32,15 +32,13 @@ using namespace taint;
 namespace {
 class ArrayBoundCheckerV2 :
 public Checker {
-  mutable std::unique_ptr BT;
-  mutable std::unique_ptr TaintBT;
+  BugType BT{this, "Out-of-bound access"};
+  BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
-  enum OOB_Kind { OOB_Precedes, OOB_Excedes };
+  enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-  void reportOOB(CheckerContext , ProgramStateRef errorState,
- OOB_Kind kind) const;
-  void reportTaintOOB(CheckerContext , ProgramStateRef errorState,
-  SVal TaintedSVal) const;
+  void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind,
+ SVal TaintedSVal = UnknownVal()) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext );
 
@@ -48,26 +46,58 @@ class ArrayBoundCheckerV2 :
   void checkLocation(SVal l, bool isLoad, const Stmt *S,
  CheckerContext ) const;
 };
+} // anonymous namespace
 
-// FIXME: Eventually replace RegionRawOffset with this class.
-class RegionRawOffsetV2 {
-private:
-  const SubRegion *baseRegion;
-  NonLoc byteOffset;
+/// For a given Location that can be represented as a symbolic expression
+/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block
+/// Arr and the distance of Location from the beginning of Arr (expressed in a
+/// NonLoc that specifies the number of CharUnits). Returns nullopt when these
+/// cannot be determined.
+std::optional>
+computeOffset(ProgramStateRef State, SValBuilder , SVal Location) {
+  QualType T = SVB.getArrayIndexType();
+  auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) {
+// We will use this utility to add and multiply values.
+return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs();
+  };
 
-public:
-  RegionRawOffsetV2(const SubRegion *base, NonLoc offset)
-  : baseRegion(base), byteOffset(offset) { assert(base); }
+  const auto *Region = Location.getAsRegion()->getAs();
+  std::optional Offset = std::nullopt;
+
+  while (const auto *ERegion = dyn_cast_or_null(Region)) {
+const auto Index = ERegion->getIndex().getAs();
+if (!Index)
+  return std::nullopt;
+
+QualType ElemType = ERegion->getElementType();
+// If the element is an incomplete type, go no further.
+if (ElemType->isIncompleteType())
+  return std::nullopt;
+
+// Calculate Delta = Index * sizeof(ElemType).
+NonLoc Size = SVB.makeArrayIndex(
+SVB.getContext().getTypeSizeInChars(ElemType).getQuantity());
+auto Delta = Calc(BO_Mul, *Index, Size);
+if (!Delta)
+  return std::nullopt;
+
+// Perform Offset += Delta, handling the initial nullopt as 0.
+if (!Offset) {
+  Offset = Delta;
+} else {
+  Offset = Calc(BO_Add, *Offset, *Delta);
+  if (!Offset)
+return std::nullopt;
+}
 
-  NonLoc 

[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-09-28 Thread via cfe-commits

https://github.com/DonatNagyE updated 
https://github.com/llvm/llvm-project/pull/67572

>From 2b6a23c71a41bc07e37b4bed2cfef8b790e84631 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= 
Date: Wed, 6 Sep 2023 13:39:27 +0200
Subject: [PATCH] [analyzer][NFC] Simplifications in ArrayBoundV2

I'm planning to improve diagnostics generation in `ArrayBoundCheckerV2`
but before that I'm refactoring the source code to clean up some
over-complicated code and an inaccurate comment.

Changes in this commit:
- Remove the `mutable std::unique_ptr` boilerplate, because
  it's no longer needed.
- Remove the code duplication between the methods `reportOOB()` and
  `reportTaintedOOB()`.
- Eliminate the class `RegionRawOffsetV2` because it's just a "reinvent
  the wheel" version of `std::pair` and it was used only once, as a
  temporary object that was immediately decomposed. (I suspect that
  `RegionRawOffset` in MemRegion.cpp could also be eliminated.)
- Flatten the code of `computeOffset()` which had contained six nested
  indentation levels before this commit.
- Ensure that `computeOffset()` returns `std::nullopt` instead of a
  `{Region, }` pair in the case when it encounters a
  `Location` that is not an `ElementRegion`. This ensures that the
  `checkLocation` callback returns early when it handles a memory access
  where it has "nothing to do" (no subscript operation or equivalent
  pointer arithmetic). (Note that this is still NFC because zero is a
  valid index everywhere, so the old logic without this shortcut
  eventually reached the same conclusion.)
- Correct a wrong explanation comment in `getSimplifiedOffsets()`.
---
 .../Checkers/ArrayBoundCheckerV2.cpp  | 221 +++---
 1 file changed, 86 insertions(+), 135 deletions(-)

diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp 
b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index db4a2fcea9b2cdd..1d5b02a22b7aca5 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -32,15 +32,13 @@ using namespace taint;
 namespace {
 class ArrayBoundCheckerV2 :
 public Checker {
-  mutable std::unique_ptr BT;
-  mutable std::unique_ptr TaintBT;
+  BugType BT{this, "Out-of-bound access"};
+  BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
-  enum OOB_Kind { OOB_Precedes, OOB_Excedes };
+  enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-  void reportOOB(CheckerContext , ProgramStateRef errorState,
- OOB_Kind kind) const;
-  void reportTaintOOB(CheckerContext , ProgramStateRef errorState,
-  SVal TaintedSVal) const;
+  void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind,
+ SVal TaintedSVal = UnknownVal()) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext );
 
@@ -49,25 +47,56 @@ class ArrayBoundCheckerV2 :
  CheckerContext ) const;
 };
 
-// FIXME: Eventually replace RegionRawOffset with this class.
-class RegionRawOffsetV2 {
-private:
-  const SubRegion *baseRegion;
-  NonLoc byteOffset;
+/// For a given Location that can be represented as a symbolic expression
+/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block
+/// Arr and the distance of Location from the beginning of Arr (expressed in a
+/// NonLoc that specifies the number of CharUnits). Returns nullopt when these
+/// cannot be determined.
+std::optional>
+computeOffset(ProgramStateRef State, SValBuilder , SVal Location) {
+  QualType T = SVB.getArrayIndexType();
+  auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) {
+// We will use this utility to add and multiply values.
+return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs();
+  };
 
-public:
-  RegionRawOffsetV2(const SubRegion *base, NonLoc offset)
-  : baseRegion(base), byteOffset(offset) { assert(base); }
+  const auto *Region = Location.getAsRegion()->getAs();
+  std::optional Offset = std::nullopt;
+
+  while (const auto *ERegion = dyn_cast_or_null(Region)) {
+const auto Index = ERegion->getIndex().getAs();
+if (!Index)
+  return std::nullopt;
+
+QualType ElemType = ERegion->getElementType();
+// If the element is an incomplete type, go no further.
+if (ElemType->isIncompleteType())
+  return std::nullopt;
+
+// Calculate Delta = Index * sizeof(ElemType).
+NonLoc Size = SVB.makeArrayIndex(
+SVB.getContext().getTypeSizeInChars(ElemType).getQuantity());
+auto Delta = Calc(BO_Mul, *Index, Size);
+if (!Delta)
+  return std::nullopt;
+
+// Perform Offset += Delta, handling the initial nullopt as 0.
+if (!Offset) {
+  Offset = Delta;
+} else {
+  Offset = Calc(BO_Add, *Offset, *Delta);
+  if (!Offset)
+return std::nullopt;
+}
 
-  NonLoc getByteOffset() const { return byteOffset; }
-  const SubRegion *getRegion() const { return 

[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-09-27 Thread via cfe-commits

llvmbot wrote:




@llvm/pr-subscribers-clang


Changes

I'm planning to improve diagnostics generation in `ArrayBoundCheckerV2` but 
before that I'm refactoring the source code to clean up some over-complicated 
code and an inaccurate comment.

Changes in this commit:
- Remove the `mutable std::unique_ptrBugType` boilerplate, because it's 
no longer needed.
- Remove the code duplication between the methods `reportOOB()` and 
`reportTaintedOOB()`.
- Eliminate the class `RegionRawOffsetV2` because it's just a "reinvent the 
wheel" version of `std::pair` and it was used only once, as a temporary object 
that was immediately decomposed. (I suspect that `RegionRawOffset` in 
MemRegion.cpp could also be eliminated.)
- Flatten the code of `computeOffset()` which had contained six nested 
indentation levels before this commit.
- Ensure that `computeOffset()` returns `std::nullopt` instead of a `{Region, 
zero array index}` pair in the case when it encounters a `Location` 
that is not an `ElementRegion`. This ensures that the `checkLocation` callback 
returns early when it handles a memory access where it has "nothing to do" (no 
subscript operation or equivalent pointer arithmetic). (Note that this is still 
NFC because zero is a valid index everywhere, so the old logic without this 
shortcut eventually reached the same conclusion.)
- Correct a wrong explanation comment in `getSimplifiedOffsets()`.

---
Full diff: https://github.com/llvm/llvm-project/pull/67572.diff


1 Files Affected:

- (modified) clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp 
(+86-135) 


``diff
diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp 
b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index db4a2fcea9b2cdd..8c26649621fe431 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -32,15 +32,13 @@ using namespace taint;
 namespace {
 class ArrayBoundCheckerV2 :
 public Checker {
-  mutable std::unique_ptr BT;
-  mutable std::unique_ptr TaintBT;
+  BugType BT{this, "Out-of-bound access"};
+  BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
-  enum OOB_Kind { OOB_Precedes, OOB_Excedes };
+  enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-  void reportOOB(CheckerContext , ProgramStateRef errorState,
- OOB_Kind kind) const;
-  void reportTaintOOB(CheckerContext , ProgramStateRef errorState,
-  SVal TaintedSVal) const;
+  void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind,
+ SVal TaintedSVal = UnknownVal()) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext );
 
@@ -49,25 +47,56 @@ class ArrayBoundCheckerV2 :
  CheckerContext ) const;
 };
 
-// FIXME: Eventually replace RegionRawOffset with this class.
-class RegionRawOffsetV2 {
-private:
-  const SubRegion *baseRegion;
-  NonLoc byteOffset;
+/// For a given Location that can be represented as a symbolic expression
+/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block
+/// Arr and the distance of Location from the beginning of Arr (expressed in a
+/// NonLoc that specifies the number of CharUnits). Returns nullopt when these
+/// cannot be determined.
+std::optional>
+computeOffset(ProgramStateRef State, SValBuilder , SVal Location) {
+  QualType T = SVB.getArrayIndexType();
+  auto Calc = [, State, T](BinaryOperatorKind Op, NonLoc LHS, NonLoc RHS) {
+// We will use this utility to add and multiply values.
+return SVB.evalBinOpNN(State, Op, LHS, RHS, T).getAs();
+  };
 
-public:
-  RegionRawOffsetV2(const SubRegion *base, NonLoc offset)
-  : baseRegion(base), byteOffset(offset) { assert(base); }
+  const auto *Region = Location.getAsRegion()->getAs();
+  std::optional Offset = std::nullopt;
+
+  while (const auto *ERegion = dyn_cast_or_null(Region)) {
+const auto Index = ERegion->getIndex().getAs();
+if (!Index)
+  return std::nullopt;
+
+QualType ElemType = ERegion->getElementType();
+// If the element is an incomplete type, go no further.
+if (ElemType->isIncompleteType())
+  return std::nullopt;
+
+// Calculate Delta = Index * sizeof(ElemType).
+NonLoc Size = SVB.makeArrayIndex(
+SVB.getContext().getTypeSizeInChars(ElemType).getQuantity());
+auto Delta = Calc(BO_Mul, *Index, Size);
+if (!Delta)
+  return std::nullopt;
+
+// Perform Offset += Delta, handling the initial nullopt as 0.
+if (!Offset) {
+  Offset = Delta;
+} else {
+  Offset = Calc(BO_Add, *Offset, *Delta);
+  if (!Offset)
+return std::nullopt;
+}
 
-  NonLoc getByteOffset() const { return byteOffset; }
-  const SubRegion *getRegion() const { return baseRegion; }
+// Continute the offset calculations with the SuperRegion.
+Region = ERegion->getSuperRegion()->getAs();
+  }
 
-  static std::optional
-  

[clang] [analyzer][NFC] Simplifications in ArrayBoundV2 (PR #67572)

2023-09-27 Thread via cfe-commits

https://github.com/DonatNagyE created 
https://github.com/llvm/llvm-project/pull/67572

I'm planning to improve diagnostics generation in `ArrayBoundCheckerV2` but 
before that I'm refactoring the source code to clean up some over-complicated 
code and an inaccurate comment.

Changes in this commit:
- Remove the `mutable std::unique_ptr` boilerplate, because it's no 
longer needed.
- Remove the code duplication between the methods `reportOOB()` and 
`reportTaintedOOB()`.
- Eliminate the class `RegionRawOffsetV2` because it's just a "reinvent the 
wheel" version of `std::pair` and it was used only once, as a temporary object 
that was immediately decomposed. (I suspect that `RegionRawOffset` in 
MemRegion.cpp could also be eliminated.)
- Flatten the code of `computeOffset()` which had contained six nested 
indentation levels before this commit.
- Ensure that `computeOffset()` returns `std::nullopt` instead of a `{Region, 
}` pair in the case when it encounters a `Location` that is 
not an `ElementRegion`. This ensures that the `checkLocation` callback returns 
early when it handles a memory access where it has "nothing to do" (no 
subscript operation or equivalent pointer arithmetic). (Note that this is still 
NFC because zero is a valid index everywhere, so the old logic without this 
shortcut eventually reached the same conclusion.)
- Correct a wrong explanation comment in `getSimplifiedOffsets()`.

>From 47c4b5c639d146f9f3469ed3fb3ac998a0ded483 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= 
Date: Wed, 6 Sep 2023 13:39:27 +0200
Subject: [PATCH] [analyzer][NFC] Simplifications in ArrayBoundV2

I'm planning to improve diagnostics generation in `ArrayBoundCheckerV2`
but before that I'm refactoring the source code to clean up some
over-complicated code and an inaccurate comment.

Changes in this commit:
- Remove the `mutable std::unique_ptr` boilerplate, because
  it's no longer needed.
- Remove the code duplication between the methods `reportOOB()` and
  `reportTaintedOOB()`.
- Eliminate the class `RegionRawOffsetV2` because it's just a "reinvent
  the wheel" version of `std::pair` and it was used only once, as a
  temporary object that was immediately decomposed. (I suspect that
  `RegionRawOffset` in MemRegion.cpp could also be eliminated.)
- Flatten the code of `computeOffset()` which had contained six nested
  indentation levels before this commit.
- Ensure that `computeOffset()` returns `std::nullopt` instead of a
  `{Region, }` pair in the case when it encounters a
  `Location` that is not an `ElementRegion`. This ensures that the
  `checkLocation` callback returns early when it handles a memory access
  where it has "nothing to do" (no subscript operation or equivalent
  pointer arithmetic). (Note that this is still NFC because zero is a
  valid index everywhere, so the old logic without this shortcut
  eventually reached the same conclusion.)
- Correct a wrong explanation comment in `getSimplifiedOffsets()`.
---
 .../Checkers/ArrayBoundCheckerV2.cpp  | 221 +++---
 1 file changed, 86 insertions(+), 135 deletions(-)

diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp 
b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index db4a2fcea9b2cdd..8c26649621fe431 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -32,15 +32,13 @@ using namespace taint;
 namespace {
 class ArrayBoundCheckerV2 :
 public Checker {
-  mutable std::unique_ptr BT;
-  mutable std::unique_ptr TaintBT;
+  BugType BT{this, "Out-of-bound access"};
+  BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
-  enum OOB_Kind { OOB_Precedes, OOB_Excedes };
+  enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-  void reportOOB(CheckerContext , ProgramStateRef errorState,
- OOB_Kind kind) const;
-  void reportTaintOOB(CheckerContext , ProgramStateRef errorState,
-  SVal TaintedSVal) const;
+  void reportOOB(CheckerContext , ProgramStateRef ErrorState, OOB_Kind Kind,
+ SVal TaintedSVal = UnknownVal()) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext );
 
@@ -49,25 +47,56 @@ class ArrayBoundCheckerV2 :
  CheckerContext ) const;
 };
 
-// FIXME: Eventually replace RegionRawOffset with this class.
-class RegionRawOffsetV2 {
-private:
-  const SubRegion *baseRegion;
-  NonLoc byteOffset;
+/// For a given Location that can be represented as a symbolic expression
+/// Arr[Idx] (or perhaps Arr[Idx1][Idx2] etc.), return the parent memory block
+/// Arr and the distance of Location from the beginning of Arr (expressed in a
+/// NonLoc that specifies the number of CharUnits). Returns nullopt when these
+/// cannot be determined.
+std::optional>
+computeOffset(ProgramStateRef State, SValBuilder , SVal Location) {
+  QualType T = SVB.getArrayIndexType();
+  auto Calc = [, State,