[clang] [clang][dataflow] Discard unneeded `ExprToLoc` and `ExprToVal` entries. (PR #72850)

2023-11-22 Thread via cfe-commits

https://github.com/martinboehme closed 
https://github.com/llvm/llvm-project/pull/72850
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][dataflow] Discard unneeded `ExprToLoc` and `ExprToVal` entries. (PR #72850)

2023-11-22 Thread via cfe-commits

martinboehme wrote:

Abandoning in favor of #72985.

https://github.com/llvm/llvm-project/pull/72850
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][dataflow] Discard unneeded `ExprToLoc` and `ExprToVal` entries. (PR #72850)

2023-11-22 Thread via cfe-commits

martinboehme wrote:

> +1 for the simpler approach. I have a high-level question about the stability 
> of the locations, its effect on convergence.

Answered in the inline comment.

> But I am happy with the `ExprToVal` part of the simple patch.

SG! Then let's continue the discussion on #72985?

https://github.com/llvm/llvm-project/pull/72850
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][dataflow] Discard unneeded `ExprToLoc` and `ExprToVal` entries. (PR #72850)

2023-11-22 Thread via cfe-commits

https://github.com/martinboehme edited 
https://github.com/llvm/llvm-project/pull/72850
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][dataflow] Discard unneeded `ExprToLoc` and `ExprToVal` entries. (PR #72850)

2023-11-22 Thread via cfe-commits


@@ -311,7 +318,10 @@ computeBlockInputState(const CFGBlock &Block, 
AnalysisContext &AC) {
 }
   }
 
-  JoinedStateBuilder Builder(AC);
+  // When performing the join, only retain state for those expressions that are
+  // consumed by this block. This avoids performing joins and potentially
+  // extending the flow condition for expressions that we won't need anyway.
+  JoinedStateBuilder Builder(AC, AC.CFCtx.getExprConsumedByBlock(&Block));

martinboehme wrote:

> We have getStableStorageLocation that will first look up an expression in 
> ExprToLoc and it will try to reuse the some location. In case we clear 
> ExprToLoc along the back edge, in the second iteration through processing B2 
> we would create new locations for the expressions.

Confusingly, there are two different maps called `ExprToLoc`:

*  
[`Environment::ExprToLoc`](https://github.com/llvm/llvm-project/blob/03f05a4e72891237264ab48adf62034aaac8bd46/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h#L661)

  This is the one from which this PR (and the simpler version in #72985) 
removes entries.

*  
[`DataflowAnalysisContext::ExprToLoc`](https://github.com/llvm/llvm-project/blob/03f05a4e72891237264ab48adf62034aaac8bd46/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h#L218)

  This is the one used by 
[`DataflowAnalysisContext::getStableStorageLocation()`](https://github.com/llvm/llvm-project/blob/03f05a4e72891237264ab48adf62034aaac8bd46/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp#L86).
 This PR does not remove any entries from this map.

So the answer is that stable storage locations are not affected by this PR. If 
we have a storage location that is cleared from `Environment::ExprToLoc` as a 
result of this PR, then if we call `Environment::createStorageLocation(const 
Expr&)` on the next iteration, that will call through to 
`DataflowAnalysisContext::getStableStorageLocation()`, which will return the 
same storage location as it did on the previous iteration.

In the PR description, I merely used `ExprToLoc` as shorthand for 
`Environment::ExprToLoc` since that is the one that I'm concerned with most 
often -- but I realize this is potentially confusing.

https://github.com/llvm/llvm-project/pull/72850
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][dataflow] Discard unneeded `ExprToLoc` and `ExprToVal` entries. (PR #72850)

2023-11-21 Thread Gábor Horváth via cfe-commits

Xazax-hun wrote:

+1 for the simpler approach. I think I have a high-level question about the 
stability of the locations, its effect on convergence. But I am happy with the 
`ExprToVal` part of the simple patch.

https://github.com/llvm/llvm-project/pull/72850
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][dataflow] Discard unneeded `ExprToLoc` and `ExprToVal` entries. (PR #72850)

2023-11-21 Thread Gábor Horváth via cfe-commits

https://github.com/Xazax-hun edited 
https://github.com/llvm/llvm-project/pull/72850
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][dataflow] Discard unneeded `ExprToLoc` and `ExprToVal` entries. (PR #72850)

2023-11-21 Thread Gábor Horváth via cfe-commits


@@ -311,7 +318,10 @@ computeBlockInputState(const CFGBlock &Block, 
AnalysisContext &AC) {
 }
   }
 
-  JoinedStateBuilder Builder(AC);
+  // When performing the join, only retain state for those expressions that are
+  // consumed by this block. This avoids performing joins and potentially
+  // extending the flow condition for expressions that we won't need anyway.
+  JoinedStateBuilder Builder(AC, AC.CFCtx.getExprConsumedByBlock(&Block));

Xazax-hun wrote:

Thanks! I think my main concerns are around convergence and `ExprToLoc`. 
Consider a simple loop like:
```
B1 ---> B2
^   |
 \__/
```

We have getStableStorageLocation that will first look up an expression in 
`ExprToLoc` and it will try to reuse the some location. In case we clear 
`ExprToLoc` along the back edge, in the second iteration through processing 
`B2` we would create new locations for the expressions.

Is this the intended effect here? 


https://github.com/llvm/llvm-project/pull/72850
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][dataflow] Discard unneeded `ExprToLoc` and `ExprToVal` entries. (PR #72850)

2023-11-21 Thread via cfe-commits

martinboehme wrote:

After some internal discussion with @ymand, I think I'd like to retract this PR 
in favor of https://github.com/llvm/llvm-project/pull/72985.

I will still keep this PR open for the time being however, until we've resolved 
the discussion.

https://github.com/llvm/llvm-project/pull/72850
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][dataflow] Discard unneeded `ExprToLoc` and `ExprToVal` entries. (PR #72850)

2023-11-21 Thread via cfe-commits

martinboehme wrote:

Update: Here's a draft implementation of the "simpler, bolder approach" 
referenced above:

https://github.com/llvm/llvm-project/pull/72985

The code is a _lot_ simpler -- it's mainly deleting code -- and I think I'm 
convinced at this point that this is the preferable thing to do. Would 
appreciate input.

If others agree, there's not a whole lot of work remaining on this draft -- 
main thing is to see how this affects SAT solver timeouts on real-world 
codebases (I expect the effect to be similar to this current patch) and some 
simplification in the code (converting two function templates to non-template 
functions because they now only get called with one specific set of argument 
types).

https://github.com/llvm/llvm-project/pull/72850
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][dataflow] Discard unneeded `ExprToLoc` and `ExprToVal` entries. (PR #72850)

2023-11-21 Thread via cfe-commits

https://github.com/martinboehme edited 
https://github.com/llvm/llvm-project/pull/72850
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][dataflow] Discard unneeded `ExprToLoc` and `ExprToVal` entries. (PR #72850)

2023-11-21 Thread via cfe-commits

https://github.com/martinboehme edited 
https://github.com/llvm/llvm-project/pull/72850
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][dataflow] Discard unneeded `ExprToLoc` and `ExprToVal` entries. (PR #72850)

2023-11-21 Thread via cfe-commits


@@ -311,7 +318,10 @@ computeBlockInputState(const CFGBlock &Block, 
AnalysisContext &AC) {
 }
   }
 
-  JoinedStateBuilder Builder(AC);
+  // When performing the join, only retain state for those expressions that are
+  // consumed by this block. This avoids performing joins and potentially
+  // extending the flow condition for expressions that we won't need anyway.
+  JoinedStateBuilder Builder(AC, AC.CFCtx.getExprConsumedByBlock(&Block));

martinboehme wrote:

Did want to reply to this code-level comment, as it's somewhat relevant to the 
higher-level discussion.

> What about expressions that are consumed by a successor of this block? Is it 
> OK to drop those?

Just to clarify what you're asking, let's assume we have a CFG with three 
blocks:

```
B1 -> B2 -> B3
```

Let's say we're currently processing B2. We're dropping all expressions from 
the environment that aren't consumed by B2.

I think you're asking: What about expressions that are consumed by B3? 
Presumably, we shouldn't drop those, as B3 will need them?

I think the answer is that this situation can't happen. If there's an edge 
between two expressions, I believe that there's also an edge between the blocks 
containing those expressions.

I have to admit, I'm not 100% sure whether this is true for the case of 
short-circuiting logical operators. We've already 
[noted](https://discourse.llvm.org/t/cfg-structure-for-short-circuiting-logical-operators/70775)
 before that the CFG structure for logical operators is a bit strange, so we 
might also anticipate more strangeness here. In any case, however, the 
framework already has specific code for logical operators to extract their 
operands from the environment for the block in which those operands were 
computed (rather than the current environment); see also my comment above.

All of this may become moot if we decide to go with the alternative 
implementation I proposed above, but I did want to respond to this point as it 
seemed relevant.

> Also, depending on the answer to these questions, this almost sounds like 
> "live expressions analysis". In case this is related, we might want to use 
> terminology from that as opposed to "consumed".

I think we won't need to go that far (fortunately) -- see above.

https://github.com/llvm/llvm-project/pull/72850
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][dataflow] Discard unneeded `ExprToLoc` and `ExprToVal` entries. (PR #72850)

2023-11-21 Thread via cfe-commits

martinboehme wrote:

I'm intentionally not responding to comments in the code for now -- wanted to 
wrap up this high-level discussion first.

> Overall seemed good (mostly just piping), but I think we need more 
> explanation (on the review thread and somewhere appropriate in the code) of 
> what exactly determines whether an expression is "needed".
> 
> I was wondering, when reading the code, why _any_ expression is needed once 
> we've finished processing the block. My recollection is that ternary 
> expressions are responsible for this, but in that case shouldn't we be 
> looking at them directly? If not, it seems worth explaining.

Yes, it's the ternaries that cause us to need to access an expression after 
we've finished processing the block. Here's a simple example:

https://godbolt.org/z/Ko7K86GKW

Essentially, this comes up whenever we have control flow within a 
*full-expression*. I believe the only cases where this can happen is for the 
ternary / conditional operator and for the short-circuiting logical operators.

The short-circuiting logical operators already have special treatment in the 
dataflow framework to retrieve their operands from the environment for the 
block in which those operands are computed (see also the comments added in this 
patch). So this leaves only the conditional operator as the other case.

The reason I went with a more conservative approach in this patch are:

*  I wasn't sure whether there was maybe some other case I wasn't thinking of.
* The approach of checking whether an expression is "consumed" by a block 
seemed more generic than singling out the conditional operator.

But maybe the right thing to do instead would be to be more bold:

*  Treat the operands of the conditional operator in the same way that we 
already treat the operands of the short-circuiting logical operators (see 
above), i.e. retrieve them from the environment for the block in which those 
operands are computed.
*  Clear out `ExprToLoc` and `ExprToVal` entirely when starting a new block.

This would be less complex to implement than what I've got in this patch, and 
it would require less computation. The main question mark I see is whether 
there's any cases I've overlooked where control flow happens within a full 
expression. But if we can't think of them, then presumably they're rare, and 
they should be easy to fix if we discover them?

So I think I'm leaning towards implementing this simpler, but bolder approach. 
Would appreciate input.

https://github.com/llvm/llvm-project/pull/72850
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][dataflow] Discard unneeded `ExprToLoc` and `ExprToVal` entries. (PR #72850)

2023-11-20 Thread Gábor Horváth via cfe-commits


@@ -52,23 +52,39 @@ class ControlFlowContext {
 return StmtToBlock;
   }
 
+  /// Returns the expressions consumed by `Block`. These are all children of
+  /// statements in `Block` as well as children of a possible terminator.
+  const llvm::DenseSet *
+  getExprConsumedByBlock(const CFGBlock *Block) const {
+auto It = ExprConsumedByBlock.find(Block);
+if (It == ExprConsumedByBlock.end())
+  return nullptr;
+return &It->second;
+  }
+
   /// Returns whether `B` is reachable from the entry block.
   bool isBlockReachable(const CFGBlock &B) const {
 return BlockReachable[B.getBlockID()];
   }
 
 private:
-  ControlFlowContext(const Decl &D, std::unique_ptr Cfg,
- llvm::DenseMap 
StmtToBlock,
- llvm::BitVector BlockReachable)
+  ControlFlowContext(
+  const Decl &D, std::unique_ptr Cfg,
+  llvm::DenseMap StmtToBlock,
+  llvm::DenseMap>
+  ExprConsumedByBlock,
+  llvm::BitVector BlockReachable)
   : ContainingDecl(D), Cfg(std::move(Cfg)),
 StmtToBlock(std::move(StmtToBlock)),
+ExprConsumedByBlock(std::move(ExprConsumedByBlock)),
 BlockReachable(std::move(BlockReachable)) {}
 
   /// The `Decl` containing the statement used to construct the CFG.
   const Decl &ContainingDecl;
   std::unique_ptr Cfg;
   llvm::DenseMap StmtToBlock;
+  const llvm::DenseMap>

Xazax-hun wrote:

I do not have strong feelings, but some people dislike const data members due 
to potential interactions with move ctors. 

https://github.com/llvm/llvm-project/pull/72850
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][dataflow] Discard unneeded `ExprToLoc` and `ExprToVal` entries. (PR #72850)

2023-11-20 Thread Gábor Horváth via cfe-commits


@@ -311,7 +318,10 @@ computeBlockInputState(const CFGBlock &Block, 
AnalysisContext &AC) {
 }
   }
 
-  JoinedStateBuilder Builder(AC);
+  // When performing the join, only retain state for those expressions that are
+  // consumed by this block. This avoids performing joins and potentially
+  // extending the flow condition for expressions that we won't need anyway.
+  JoinedStateBuilder Builder(AC, AC.CFCtx.getExprConsumedByBlock(&Block));

Xazax-hun wrote:

What about expressions that are consumed by a successor of this block?  Is it 
OK to drop those?
Also, depending on the answer to these questions, this almost sounds like "live 
expressions analysis".
In case this is related, we might want to use terminology from that as opposed 
to "consumed". 

https://github.com/llvm/llvm-project/pull/72850
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][dataflow] Discard unneeded `ExprToLoc` and `ExprToVal` entries. (PR #72850)

2023-11-20 Thread Yitzhak Mandelbaum via cfe-commits


@@ -24,25 +24,35 @@
 namespace clang {
 namespace dataflow {
 
-/// Returns a map from statements to basic blocks that contain them.
-static llvm::DenseMap
-buildStmtToBasicBlockMap(const CFG &Cfg) {
-  llvm::DenseMap StmtToBlock;
+/// Builds maps:
+/// - From statements to basic blocks that contain them.
+/// - From blocks to the expressions they consume.

ymand wrote:

Is the term "consume" defined somewhere?

https://github.com/llvm/llvm-project/pull/72850
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][dataflow] Discard unneeded `ExprToLoc` and `ExprToVal` entries. (PR #72850)

2023-11-20 Thread Yitzhak Mandelbaum via cfe-commits


@@ -256,8 +259,12 @@ class JoinedStateBuilder {
   // to enable building analyses like computation of dominators that
   // initialize the state of each basic block differently.
   return {AC.Analysis.typeErasedInitialElement(), AC.InitEnv.fork()};
+
 if (All.size() == 1)
-  return Owned.empty() ? All.front()->fork() : std::move(Owned.front());
+  // Join the environment with itself so that we can discard unwanted

ymand wrote:

This change may obviate the value of JoinedStateBuilder altogether, since a key 
point was to save on copies. Once we always build a fresh one, I'm not sure the 
complexity is doing much anymore. Please double check this.

https://github.com/llvm/llvm-project/pull/72850
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][dataflow] Discard unneeded `ExprToLoc` and `ExprToVal` entries. (PR #72850)

2023-11-20 Thread Yitzhak Mandelbaum via cfe-commits

https://github.com/ymand requested changes to this pull request.

Overall seemed good (mostly just piping), but I think we need more explanation 
(on the review thread and somewhere appropriate in the code) of what exactly 
determines whether an expression is "needed".

I was wondering, when reading the code, why *any* expression is needed once 
we've finished processing the block. My recollection is that ternary 
expressions are responsible for this, but in that case shouldn't we be looking 
at them directly? If not, it seems worth explaining.

https://github.com/llvm/llvm-project/pull/72850
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][dataflow] Discard unneeded `ExprToLoc` and `ExprToVal` entries. (PR #72850)

2023-11-20 Thread Yitzhak Mandelbaum via cfe-commits

https://github.com/ymand edited https://github.com/llvm/llvm-project/pull/72850
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][dataflow] Discard unneeded `ExprToLoc` and `ExprToVal` entries. (PR #72850)

2023-11-20 Thread via cfe-commits

llvmbot wrote:



@llvm/pr-subscribers-clang-analysis

@llvm/pr-subscribers-clang

Author: None (martinboehme)


Changes

This has two major benefits:

*  We avoid performing joins on boolean expressions and hence extending the flow
   condition in cases where this is not needed. Simpler flow conditions should
   reduce the amount of work we do in the SAT solver.

*  Debugging becomes easier when flow conditions are simpler and ExprToVal
   doesn’t contain any extraneous entries.

In tests on an internal codebase, we see a reduction in SAT solver timeouts of
over 40% and a reduction in "reached maximum iterations" errors of over 60%.

Benchmark results on Crubit's `pointer_nullability_analysis_benchmark show a
slight runtime increase for simple benchmarks, offset by runtime reductions
(some of them substantial) for more complex benchmarks:

```
name  old cpu/op   new cpu/op   delta
BM_PointerAnalysisCopyPointer 29.5µs ± 2%  32.0µs ± 3%   +8.42%  (p=0.000 
n=46+47)
BM_PointerAnalysisIntLoop  100µs ± 3%   109µs ± 3%   +9.40%  (p=0.000 
n=54+60)
BM_PointerAnalysisPointerLoop  376µs ± 4%   253µs ± 3%  -32.51%  (p=0.000 
n=48+55)
BM_PointerAnalysisBranch   116µs ± 3%   126µs ± 3%   +9.23%  (p=0.000 
n=60+59)
BM_PointerAnalysisLoopAndBranch776µs ± 3%   425µs ± 4%  -45.20%  (p=0.000 
n=59+59)
BM_PointerAnalysisTwoLoops 184µs ± 2%   200µs ± 3%   +8.48%  (p=0.000 
n=59+58)
BM_PointerAnalysisJoinFilePath17.3ms ± 3%   9.4ms ± 2%  -45.48%  (p=0.000 
n=60+60)
BM_PointerAnalysisCallInLoop  14.7ms ± 2%  11.0ms ± 3%  -25.02%  (p=0.000 
n=57+56)
```

Running the pointer nullability check on several real-world files shows no
significant change in runtime.


---

Patch is 21.99 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/72850.diff


7 Files Affected:

- (modified) clang/include/clang/Analysis/FlowSensitive/ControlFlowContext.h 
(+19-3) 
- (modified) clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h 
(+6-2) 
- (modified) clang/lib/Analysis/FlowSensitive/ControlFlowContext.cpp (+24-11) 
- (modified) clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp (+16-6) 
- (modified) clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp 
(+14-4) 
- (modified) clang/unittests/Analysis/FlowSensitive/DataflowEnvironmentTest.cpp 
(+36) 
- (modified) 
clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp 
(+135-23) 


``diff
diff --git a/clang/include/clang/Analysis/FlowSensitive/ControlFlowContext.h 
b/clang/include/clang/Analysis/FlowSensitive/ControlFlowContext.h
index 768387a121b920a..3a5c5235838f630 100644
--- a/clang/include/clang/Analysis/FlowSensitive/ControlFlowContext.h
+++ b/clang/include/clang/Analysis/FlowSensitive/ControlFlowContext.h
@@ -52,23 +52,39 @@ class ControlFlowContext {
 return StmtToBlock;
   }
 
+  /// Returns the expressions consumed by `Block`. These are all children of
+  /// statements in `Block` as well as children of a possible terminator.
+  const llvm::DenseSet *
+  getExprConsumedByBlock(const CFGBlock *Block) const {
+auto It = ExprConsumedByBlock.find(Block);
+if (It == ExprConsumedByBlock.end())
+  return nullptr;
+return &It->second;
+  }
+
   /// Returns whether `B` is reachable from the entry block.
   bool isBlockReachable(const CFGBlock &B) const {
 return BlockReachable[B.getBlockID()];
   }
 
 private:
-  ControlFlowContext(const Decl &D, std::unique_ptr Cfg,
- llvm::DenseMap 
StmtToBlock,
- llvm::BitVector BlockReachable)
+  ControlFlowContext(
+  const Decl &D, std::unique_ptr Cfg,
+  llvm::DenseMap StmtToBlock,
+  llvm::DenseMap>
+  ExprConsumedByBlock,
+  llvm::BitVector BlockReachable)
   : ContainingDecl(D), Cfg(std::move(Cfg)),
 StmtToBlock(std::move(StmtToBlock)),
+ExprConsumedByBlock(std::move(ExprConsumedByBlock)),
 BlockReachable(std::move(BlockReachable)) {}
 
   /// The `Decl` containing the statement used to construct the CFG.
   const Decl &ContainingDecl;
   std::unique_ptr Cfg;
   llvm::DenseMap StmtToBlock;
+  const llvm::DenseMap>
+  ExprConsumedByBlock;
   llvm::BitVector BlockReachable;
 };
 
diff --git a/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h 
b/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
index 963197b728f4273..a2f97122567fbe7 100644
--- a/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
+++ b/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
@@ -211,12 +211,16 @@ class Environment {
   /// Joins two environments by taking the intersection of storage locations 
and
   /// values that are stored in them. Distinct values that are assigned to the
   /// same storage locations in `EnvA` and `EnvB` are merged using `Model`.
+  /// If `ExprsToKeep` is non-null, only retains state for expressions 
contained
+  //

[clang] [clang][dataflow] Discard unneeded `ExprToLoc` and `ExprToVal` entries. (PR #72850)

2023-11-20 Thread via cfe-commits

https://github.com/martinboehme created 
https://github.com/llvm/llvm-project/pull/72850

This has two major benefits:

*  We avoid performing joins on boolean expressions and hence extending the flow
   condition in cases where this is not needed. Simpler flow conditions should
   reduce the amount of work we do in the SAT solver.

*  Debugging becomes easier when flow conditions are simpler and ExprToVal
   doesn’t contain any extraneous entries.

In tests on an internal codebase, we see a reduction in SAT solver timeouts of
over 40% and a reduction in "reached maximum iterations" errors of over 60%.

Benchmark results on Crubit's `pointer_nullability_analysis_benchmark show a
slight runtime increase for simple benchmarks, offset by runtime reductions
(some of them substantial) for more complex benchmarks:

```
name  old cpu/op   new cpu/op   delta
BM_PointerAnalysisCopyPointer 29.5µs ± 2%  32.0µs ± 3%   +8.42%  (p=0.000 
n=46+47)
BM_PointerAnalysisIntLoop  100µs ± 3%   109µs ± 3%   +9.40%  (p=0.000 
n=54+60)
BM_PointerAnalysisPointerLoop  376µs ± 4%   253µs ± 3%  -32.51%  (p=0.000 
n=48+55)
BM_PointerAnalysisBranch   116µs ± 3%   126µs ± 3%   +9.23%  (p=0.000 
n=60+59)
BM_PointerAnalysisLoopAndBranch776µs ± 3%   425µs ± 4%  -45.20%  (p=0.000 
n=59+59)
BM_PointerAnalysisTwoLoops 184µs ± 2%   200µs ± 3%   +8.48%  (p=0.000 
n=59+58)
BM_PointerAnalysisJoinFilePath17.3ms ± 3%   9.4ms ± 2%  -45.48%  (p=0.000 
n=60+60)
BM_PointerAnalysisCallInLoop  14.7ms ± 2%  11.0ms ± 3%  -25.02%  (p=0.000 
n=57+56)
```

Running the pointer nullability check on several real-world files shows no
significant change in runtime.


>From d5c011c6f922e98da44b1ffa573c3630e3532b3d Mon Sep 17 00:00:00 2001
From: Martin Braenne 
Date: Mon, 20 Nov 2023 10:56:28 +
Subject: [PATCH] [clang][dataflow] Discard unneeded `ExprToLoc` and
 `ExprToVal` entries.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This has two major benefits:

*  We avoid performing joins on boolean expressions and hence extending the flow
   condition in cases where this is not needed. Simpler flow conditions should
   reduce the amount of work we do in the SAT solver.

*  Debugging becomes easier when flow conditions are simpler and ExprToVal
   doesn’t contain any extraneous entries.

In tests on an internal codebase, we see a reduction in SAT solver timeouts of
over 40% and a reduction in "reached maximum iterations" errors of over 60%.

Benchmark results on Crubit's `pointer_nullability_analysis_benchmark show a
slight runtime increase for simple benchmarks, offset by runtime reductions
(some of them substantial) for more complex benchmarks:

```
name  old cpu/op   new cpu/op   delta
BM_PointerAnalysisCopyPointer 29.5µs ± 2%  32.0µs ± 3%   +8.42%  (p=0.000 
n=46+47)
BM_PointerAnalysisIntLoop  100µs ± 3%   109µs ± 3%   +9.40%  (p=0.000 
n=54+60)
BM_PointerAnalysisPointerLoop  376µs ± 4%   253µs ± 3%  -32.51%  (p=0.000 
n=48+55)
BM_PointerAnalysisBranch   116µs ± 3%   126µs ± 3%   +9.23%  (p=0.000 
n=60+59)
BM_PointerAnalysisLoopAndBranch776µs ± 3%   425µs ± 4%  -45.20%  (p=0.000 
n=59+59)
BM_PointerAnalysisTwoLoops 184µs ± 2%   200µs ± 3%   +8.48%  (p=0.000 
n=59+58)
BM_PointerAnalysisJoinFilePath17.3ms ± 3%   9.4ms ± 2%  -45.48%  (p=0.000 
n=60+60)
BM_PointerAnalysisCallInLoop  14.7ms ± 2%  11.0ms ± 3%  -25.02%  (p=0.000 
n=57+56)
```

Running the pointer nullability check on several real-world files shows no
significant change in runtime.
---
 .../FlowSensitive/ControlFlowContext.h|  22 ++-
 .../FlowSensitive/DataflowEnvironment.h   |   8 +-
 .../FlowSensitive/ControlFlowContext.cpp  |  35 ++--
 .../FlowSensitive/DataflowEnvironment.cpp |  22 ++-
 .../TypeErasedDataflowAnalysis.cpp|  18 +-
 .../FlowSensitive/DataflowEnvironmentTest.cpp |  36 
 .../TypeErasedDataflowAnalysisTest.cpp| 158 +++---
 7 files changed, 250 insertions(+), 49 deletions(-)

diff --git a/clang/include/clang/Analysis/FlowSensitive/ControlFlowContext.h 
b/clang/include/clang/Analysis/FlowSensitive/ControlFlowContext.h
index 768387a121b920a..3a5c5235838f630 100644
--- a/clang/include/clang/Analysis/FlowSensitive/ControlFlowContext.h
+++ b/clang/include/clang/Analysis/FlowSensitive/ControlFlowContext.h
@@ -52,23 +52,39 @@ class ControlFlowContext {
 return StmtToBlock;
   }
 
+  /// Returns the expressions consumed by `Block`. These are all children of
+  /// statements in `Block` as well as children of a possible terminator.
+  const llvm::DenseSet *
+  getExprConsumedByBlock(const CFGBlock *Block) const {
+auto It = ExprConsumedByBlock.find(Block);
+if (It == ExprConsumedByBlock.end())
+  return nullptr;
+return &It->second;
+  }
+
   /// Returns whether `B` is reachable from the entry block.
   bool isBlockReachable(const CFGBlo