[PATCH] D116368: [clang][dataflow] Add transfer function for VarDecl statements

Gábor Horváth via Phabricator via cfe-commits Thu, 30 Dec 2021 09:53:36 -0800

xazax.hun accepted this revision.
xazax.hun added a comment.
This revision is now accepted and ready to land.


I have one nit inline, and some topics that should probably be deferred to 
follow-up PRs. Overall, looks good to me, thanks!



================
Comment at: 
clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h:76
+  std::vector<std::unique_ptr<StorageLocation>> Locs;
+  std::vector<std::unique_ptr<Value>> Vals;
+  llvm::DenseMap<const VarDecl *, StorageLocation *> VarDeclToLoc;
----------------
sgatev wrote:
> xazax.hun wrote:
> > Just curious: would it make sense to deduplicate these values?
> I think no because even if the values are equal we'd like to track them 
> separately as they represent different identities which is important for 
> inference. For example:
> 
> ```
> std::optional<int> a = foo();
> std::optional<int> b = bar();
> std::optional<int> c = a;
> // at this point we can model `a`, `b`, and `c` with the same value, however
> if (a.has_value()) {
>   // here we gain knowledge only about the values of `a` and `c`
>   a.value(); // safe
>   c.value(); // safe
>   b.value(); // unsafe 
> }
> ```
> 
> In our analysis we model the above by assigning the same value to the storage 
> locations for `a` and `c` and assigning a different value to the storage 
> location for `b`, even if the two values are equal.
I see. This makes sense, thanks!


================
Comment at: clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp:32
+template <typename K, typename V>
+bool denseMapsAreEqual(const llvm::DenseMap<K, V> &Map1,
+                       const llvm::DenseMap<K, V> &Map2) {
----------------
sgatev wrote:
> xazax.hun wrote:
> > Shouldn't we add `operator==` instead?
> I'd be happy to do that. Do we need reviews from other folks for it? Would it 
> make sense to move the code to the other location in a follow-up PR, to limit 
> the scope of the change?
Actually, I think DenseMaps should already have `operator==` so we should be 
able to remove this code.


================
Comment at: clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp:46
+template <typename K, typename V>
+llvm::DenseMap<K, V> intersectDenseMaps(const llvm::DenseMap<K, V> &Map1,
+                                        const llvm::DenseMap<K, V> &Map2) {
----------------
sgatev wrote:
> xazax.hun wrote:
> > I wonder if these algorithms should rather be somewhere in the support 
> > library.
> I'd be happy to do that. Do we need reviews from other folks for it? Would it 
> make sense to move the code to the other location in a follow-up PR, to limit 
> the scope of the change?
Yep, I'm fine with moving this in a follow-up PR.


================
Comment at: clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp:88
+    for (const FieldDecl *Field : Type->getAsRecordDecl()->fields()) {
+      FieldLocs.insert({Field, &createStorageLocation(Field->getType())});
+    }
----------------
sgatev wrote:
> xazax.hun wrote:
> > Could this end up creating an overly large state? There might be objects 
> > with quite a lot fields but each function would only access a small subset 
> > of those. Alternatively, we could attempt to create the representations for 
> > fields on demand (this is the approach what the clang static analyzer is 
> > using). 
> That's a great point, however I don't think initialization on demand plays 
> well with the dataflow algorithm. For example:
> 
> ```
> struct S {
>   int Val;
> };
> 
> void target(bool b) {
>   // basic block 1:
>   S Foo;
>   int Bar;
>   if (b) {
>     // basic block 2:
>     Bar = Foo.Val;
>   } else {
>     // basic block 3:
>     Bar = Foo.Val;
>   }
>   // basic block 4:
>   ...
> }
> ```
> In basic block 4 we should be able to infer that the value that is assigned 
> to the storage location for `Bar` is unambiguous. However, since `Foo.Value` 
> isn't created in basic block 1, this means that it's not inherited by basic 
> block 2 and basic block 3. Each of these blocks will end up creating distinct 
> values to assign to the storage location for `Foo.Value` and so in basic 
> block 4 the value for `Bar` will end up being ambiguous.
> 
> Alternatively, we can do a pass over all statements in all basic blocks 
> before running the analysis to identify which fields are used in the 
> function. We can use this information here to initialize only parts that are 
> being used.
> 
> What do you think?
I am not convinced about this example. I think it should not matter where and 
when we create the location for Foo.Val, it always should be equivalent to the 
others, i.e., they should have the same identity regardless their creation. 
Basically, I think the identity of such a location should be determined by the 
base object (Foo in this case), and the access path to the subobject. 

This would be a non-trivial change, so I'm ok with deferring this discussion to 
some follow-up PRs. But I'd love to see some alternatives explored because 
eagerly evaluating all fields used to bite me in the past in other static 
analysis tools performance-wise.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116368/new/

https://reviews.llvm.org/D116368

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D116368: [clang][dataflow] Add transfer function for VarDecl statements

Reply via email to