[PATCH] D78933: [analyzer] RangeConstraintManager optimizations in comparison expressions

Gábor Horváth via Phabricator via cfe-commits Wed, 06 May 2020 08:05:42 -0700

xazax.hun added a comment.

In D78933#2022684 <https://reviews.llvm.org/D78933#2022684>, @ASDenysPetrov 
wrote:


> I have never run them before. How I can do this? What project to use as a 
> sample?


Unfortunately, we do not really have a well-defined set of benchmarks. But as a 
first step, you could run the analyzer on LLVM before and after applying your 
patch and measure the run-time. Or you could start with a smaller project (you 
can pick any well-known project from GitHub or other similar sites) as checking 
LLVM can be really time-consuming (especially in debug mode). To run the 
analyzer on a project I recommend using scan-build or CodeChecker. If you feel 
like, you can also experiment with the CSA-testbench 
(https://github.com/Xazax-hun/csa-testbench), but it requires quite a lot of 
effort to set up upfront.

>> Also, it would be useful to see how statistics like coverage patterns, 
>> number of exploded nodes etc changes.
> 
> You can check F11839257: before.html <https://reviews.llvm.org/F11839257> and 
> F11839255: after.html <https://reviews.llvm.org/F11839255> exploded graphs, 
> to see the number of nodes.

This is only for a small artificial example. We would like to learn more about 
how the node count changes for a large real-world project like LLVM. We do not 
need to see the actual exploded graph for those big projects as they are really 
hard to interpret, we just want to know the numbers.

>> See the output of `-analyzer-stats` option for details.
> 
> I didn't find this option in `clang --help`. I've never use it. Please, 
> explain how to use it.

I forgot to mention, not all of the analyzer flags have a corresponding flag in 
the driver. You can check all the analyzer related flags in the frontend (as 
opposed to the driver) using the following command: `clang -cc1 -help | grep 
"analyzer"`
You can either add `-analyzer-stats` to your cc1 invocation or you can use the 
`-Xclang` to pass the flag through the driver this way: `-Xclang 
-analyzer-stats`. This will modify the output of the analyzer to include 
additional statistics about the analysis.



================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:588
+  // AnyX2 means that two expressions marked as `Any` are met in code,
+  // and there is a special column for that, for example:
+  // if (x >= y)
----------------
ASDenysPetrov wrote:
> xazax.hun wrote:
> > I have really hard time processing how to interpret `AnyX2`.
> > 
> > For example in the code below:
> > ```
> >    if (x >= y)
> >      if (x != y)
> >        if (x <= y)
> >          return false
> > ```
> > 
> > ```
> >    if (x >= y)
> >      if (x == y)
> >        if (x <= y)
> >          return true
> > ```
> > 
> > We would get different results for `<=`. So I do not really get how I 
> > should read the `AnyX2` column.
> I'm sorry for the obscure explanation. I was really thinking about how to 
> explain it more clear.
> Let's consider this code:
> ```
> if (x >= y)
>   if (x != y)
>     if (x <= y)
> ```
> If 'x >= y' is //true//, then following `x <= y` is  `Any` ( can be //true// 
> or //false// ). //See table in the summary.//
> If 'x != y' is //true//, then following `x <= y` is  `Any` ( can be //true// 
> or //false// ).
> If 'x >= y && x != y' is //true//, then following `x <= y` is `False` only 
> (it means that we previously met two Any states, what I've called as 
> **AnyX2**).
> 
> 
> 
> 
Could you tell me how the reasoning would be different for 
```
if (x >= y)
  if (x == y)
    if (x <= y)
```
?


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:622
+  auto IndexOP = IndexFromOp(OP);
+  auto LHS = SSE->getLHS();
+  auto RHS = SSE->getRHS();
----------------
ASDenysPetrov wrote:
> xazax.hun wrote:
> > Could `LHS` and `RHS` be other expressions? Does it make sense to continue 
> > executing this function if one of them is not a simple symbol?
> I think it makes sense, because `(x+3*y) > z` and `z < (x+3*y)` perfectly 
> fits in the logic.
Makes sense,  thanks. Maybe it would be worth to rephrase the comment to note 
that `x` and `y` can also stand for subexpressions, not only for actual symbols.


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:629-630
+  auto &One = BV.getValue(1, T);
+  const RangeSet TrueRangeSet(F, One, One);
+  const RangeSet FalseRangeSet(F, Zero, Zero);
+  int AnyStates = 0;
----------------
ASDenysPetrov wrote:
> xazax.hun wrote:
> > ASDenysPetrov wrote:
> > > Folk, is this a good idea to explicitly create bool ranges as a return 
> > > value?
> > > As for me, comparisons like `>`, `<`, etc. can only produce bool-based 
> > > ranges, otherwise it would be weird.
> > I think modeling booleans with ranges is OK. This is what the analyzer is 
> > doing at the moment. But I guess the question is about whether boolean 
> > results of comparisons is a good way to store the relationships between the 
> > symbols. I am not convinced about that see my inline comment below.
> Once I met a case when logic operator `<` retruns me int-based ranges {0,0} 
> on the //false// branch and {1, 429496725} (instead of bool-based {1,1}) on 
> the //true// branch. It confused me, so the question arised.
I see. I do not see any problem with that immediately, but @NoQ is much more 
well-versed to answer this question.


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:635
+
+    // Let's find an expression e.g. (x < y).
+    OP = OpFromIndex(i);
----------------
ASDenysPetrov wrote:
> xazax.hun wrote:
> > I am not really familiar with how constraints are represented but I vaguely 
> > recall the analyzer sometimes normalizing some expressions like conversion 
> > `A == B` to `A - B == 0`. I am just wondering if this API to look this 
> > expression up is not the right abstraction as it might be better to handle 
> > such normalizations in a unified, central way.
> > 
> > Also, note that this method does not handle transitivity. I wonder if 
> > maintaining set of ranges is the right representation for this information 
> > at all. The ordering between the symbols will define a lattice. 
> > Representing that lattice directly instead of using ranges might be more 
> > efficient.
> >conversion A == B to A - B == 0. 
> All my samples `x == y` worked as `x == y`. I havn't seen such 
> transformations.
> >Representing that lattice directly instead of using ranges might be more 
> >efficient.
> How do you imagine that lattice? Could you detail it?
A lattice is basically a DAG, where the nodes are the values (symbols, or 
subexpressions), and the edges are ordering. So it defines an ordering. E.g., A 
is smaller than B, if and only if there is a path from A to B. I am not sure 
whether this is the right representation, but it would take care of 
transitivity. 


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78933/new/

https://reviews.llvm.org/D78933



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D78933: [analyzer] RangeConstraintManager optimizations in comparison expressions

Reply via email to