[PATCH] D36737: [analyzer] Store design discussions in docs/analyzer for future use.

2017-09-26 Thread Phabricator via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL314218: [analyzer] Keep track of design discusions as part 
of analyzer documentation. (authored by dergachev).

Changed prior to commit:
  https://reviews.llvm.org/D36737?vs=116520=116724#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D36737

Files:
  cfe/trunk/docs/analyzer/DesignDiscussions/InitializerLists.rst

Index: cfe/trunk/docs/analyzer/DesignDiscussions/InitializerLists.rst
===
--- cfe/trunk/docs/analyzer/DesignDiscussions/InitializerLists.rst
+++ cfe/trunk/docs/analyzer/DesignDiscussions/InitializerLists.rst
@@ -0,0 +1,321 @@
+This discussion took place in https://reviews.llvm.org/D35216
+"Escape symbols when creating std::initializer_list".
+
+It touches problems of modelling C++ standard library constructs in general,
+including modelling implementation-defined fields within C++ standard library
+objects, in particular constructing objects into pointers held by such fields,
+and separation of responsibilities between analyzer's core and checkers.
+
+**Artem:**
+
+I've seen a few false positives that appear because we construct
+C++11 std::initializer_list objects with brace initializers, and such
+construction is not properly modeled. For instance, if a new object is
+constructed on the heap only to be put into a brace-initialized STL container,
+the object is reported to be leaked.
+
+Approach (0): This can be trivially fixed by this patch, which causes pointers
+passed into initializer list expressions to immediately escape.
+
+This fix is overly conservative though. So i did a bit of investigation as to
+how model std::initializer_list better.
+
+According to the standard, std::initializer_list is an object that has
+methods begin(), end(), and size(), where begin() returns a pointer to continous
+array of size() objects of type T, and end() is equal to begin() plus size().
+The standard does hint that it should be possible to implement
+std::initializer_list as a pair of pointers, or as a pointer and a size
+integer, however specific fields that the object would contain are an
+implementation detail.
+
+Ideally, we should be able to model the initializer list's methods precisely.
+Or, at least, it should be possible to explain to the analyzer that the list
+somehow "takes hold" of the values put into it. Initializer lists can also be
+copied, which is a separate story that i'm not trying to address here.
+
+The obvious approach to modeling std::initializer_list in a checker would be to
+construct a SymbolMetadata for the memory region of the initializer list object,
+which would be of type T* and represent begin(), so we'd trivially model begin()
+as a function that returns this symbol. The array pointed to by that symbol
+would be bindLoc()ed to contain the list's contents (probably as a CompoundVal
+to produce less bindings in the store). Extent of this array would represent
+size() and would be equal to the length of the list as written.
+
+So this sounds good, however apparently it does nothing to address our false
+positives: when the list escapes, our RegionStoreManager is not magically
+guessing that the metadata symbol attached to it, together with its contents,
+should also escape. In fact, it's impossible to trigger a pointer escape from
+within the checker.
+
+Approach (1): If only we enabled ProgramState::bindLoc(..., notifyChanges=true)
+to cause pointer escapes (not only region changes) (which sounds like the right
+thing to do anyway) such checker would be able to solve the false positives by
+triggering escapes when binding list elements to the list. However, it'd be as
+conservative as the current patch's solution. Ideally, we do not want escapes to
+happen so early. Instead, we'd prefer them to be delayed until the list itself
+escapes.
+
+So i believe that escaping metadata symbols whenever their base regions escape
+would be the right thing to do. Currently we didn't think about that because we
+had neither pointer-type metadatas nor non-pointer escapes.
+
+Approach (2): We could teach the Store to scan itself for bindings to
+metadata-symbolic-based regions during scanReachableSymbols() whenever a region
+turns out to be reachable. This requires no work on checker side, but it sounds
+performance-heavy.
+
+Approach (3): We could let checkers maintain the set of active metadata symbols
+in the program state (ideally somewhere in the Store, which sounds weird but
+causes the smallest amount of layering violations), so that the core knew what
+to escape. This puts a stress on the checkers, but with a smart data map it
+wouldn't be a problem.
+
+Approach (4): We could allow checkers to trigger pointer escapes in arbitrary
+moments. If we allow doing this within checkPointerEscape callback itself, we
+would be able to express facts like "when this region escapes, that metadata
+symbol attached to it should also 

[PATCH] D36737: [analyzer] Store design discussions in docs/analyzer for future use.

2017-09-25 Thread Anna Zaks via Phabricator via cfe-commits
zaks.anna accepted this revision.
zaks.anna added a comment.
This revision is now accepted and ready to land.

Thanks!


https://reviews.llvm.org/D36737



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36737: [analyzer] Store design discussions in docs/analyzer for future use.

2017-09-25 Thread Artem Dergachev via Phabricator via cfe-commits
NoQ updated this revision to Diff 116520.
NoQ added a comment.

Update to use `.rst` formatting.


https://reviews.llvm.org/D36737

Files:
  docs/analyzer/DesignDiscussions/InitializerLists.rst

Index: docs/analyzer/DesignDiscussions/InitializerLists.rst
===
--- /dev/null
+++ docs/analyzer/DesignDiscussions/InitializerLists.rst
@@ -0,0 +1,321 @@
+This discussion took place in https://reviews.llvm.org/D35216
+"Escape symbols when creating std::initializer_list".
+
+It touches problems of modelling C++ standard library constructs in general,
+including modelling implementation-defined fields within C++ standard library
+objects, in particular constructing objects into pointers held by such fields,
+and separation of responsibilities between analyzer's core and checkers.
+
+**Artem:**
+
+I've seen a few false positives that appear because we construct
+C++11 std::initializer_list objects with brace initializers, and such
+construction is not properly modeled. For instance, if a new object is
+constructed on the heap only to be put into a brace-initialized STL container,
+the object is reported to be leaked.
+
+Approach (0): This can be trivially fixed by this patch, which causes pointers
+passed into initializer list expressions to immediately escape.
+
+This fix is overly conservative though. So i did a bit of investigation as to
+how model std::initializer_list better.
+
+According to the standard, std::initializer_list is an object that has
+methods begin(), end(), and size(), where begin() returns a pointer to continous
+array of size() objects of type T, and end() is equal to begin() plus size().
+The standard does hint that it should be possible to implement
+std::initializer_list as a pair of pointers, or as a pointer and a size
+integer, however specific fields that the object would contain are an
+implementation detail.
+
+Ideally, we should be able to model the initializer list's methods precisely.
+Or, at least, it should be possible to explain to the analyzer that the list
+somehow "takes hold" of the values put into it. Initializer lists can also be
+copied, which is a separate story that i'm not trying to address here.
+
+The obvious approach to modeling std::initializer_list in a checker would be to
+construct a SymbolMetadata for the memory region of the initializer list object,
+which would be of type T* and represent begin(), so we'd trivially model begin()
+as a function that returns this symbol. The array pointed to by that symbol
+would be bindLoc()ed to contain the list's contents (probably as a CompoundVal
+to produce less bindings in the store). Extent of this array would represent
+size() and would be equal to the length of the list as written.
+
+So this sounds good, however apparently it does nothing to address our false
+positives: when the list escapes, our RegionStoreManager is not magically
+guessing that the metadata symbol attached to it, together with its contents,
+should also escape. In fact, it's impossible to trigger a pointer escape from
+within the checker.
+
+Approach (1): If only we enabled ProgramState::bindLoc(..., notifyChanges=true)
+to cause pointer escapes (not only region changes) (which sounds like the right
+thing to do anyway) such checker would be able to solve the false positives by
+triggering escapes when binding list elements to the list. However, it'd be as
+conservative as the current patch's solution. Ideally, we do not want escapes to
+happen so early. Instead, we'd prefer them to be delayed until the list itself
+escapes.
+
+So i believe that escaping metadata symbols whenever their base regions escape
+would be the right thing to do. Currently we didn't think about that because we
+had neither pointer-type metadatas nor non-pointer escapes.
+
+Approach (2): We could teach the Store to scan itself for bindings to
+metadata-symbolic-based regions during scanReachableSymbols() whenever a region
+turns out to be reachable. This requires no work on checker side, but it sounds
+performance-heavy.
+
+Approach (3): We could let checkers maintain the set of active metadata symbols
+in the program state (ideally somewhere in the Store, which sounds weird but
+causes the smallest amount of layering violations), so that the core knew what
+to escape. This puts a stress on the checkers, but with a smart data map it
+wouldn't be a problem.
+
+Approach (4): We could allow checkers to trigger pointer escapes in arbitrary
+moments. If we allow doing this within checkPointerEscape callback itself, we
+would be able to express facts like "when this region escapes, that metadata
+symbol attached to it should also escape". This sounds like an ultimate freedom,
+with maximum stress on the checkers - still not too much stress when we have
+smart data maps.
+
+I'm personally liking the approach (2) - it should be possible to avoid
+performance overhead, and clarity seems nice.
+
+**Gabor:**
+
+At this 

[PATCH] D36737: [analyzer] Store design discussions in docs/analyzer for future use.

2017-08-27 Thread Anna Zaks via Phabricator via cfe-commits
zaks.anna added a comment.

I think we should have these is .rst format as this is what the rest of the 
documentation predominantly uses. (Note, the formatting can be very basic, it's 
the format that I care about.)

Otherwise, great to see this addition!


https://reviews.llvm.org/D36737



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36737: [analyzer] Store design discussions in docs/analyzer for future use.

2017-08-16 Thread Devin Coughlin via Phabricator via cfe-commits
dcoughlin added a comment.

Thanks for doing this!


https://reviews.llvm.org/D36737



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36737: [analyzer] Store design discussions in docs/analyzer for future use.

2017-08-15 Thread Gábor Horváth via Phabricator via cfe-commits
xazax.hun added a comment.

Good idea!

I think some of your extremely helpful responses from the mailing list would 
worth archiving in a form of documentation as well.
Also, as far as I can recall, you are also maintaining a pdf which is a very 
good reference. In some form, I would love to see that information upstream too.


https://reviews.llvm.org/D36737



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36737: [analyzer] Store design discussions in docs/analyzer for future use.

2017-08-15 Thread Artem Dergachev via Phabricator via cfe-commits
NoQ created this revision.

Because i tend to raise important design questions in weird places like tiny 
phabricator patches, we had a thought that it's better to archive them in 
`docs/analyzer` for future reference. We should be able to pick them up 
whenever time comes to solve these problems.

I initialize a new folder in `docs/analyzer` for this sort of stuff with the 
discussion in https://reviews.llvm.org/D35216; we also have a few more 
discussions that might look good in such archive.


https://reviews.llvm.org/D36737

Files:
  docs/analyzer/DesignDiscussions/InitializerLists.txt

Index: docs/analyzer/DesignDiscussions/InitializerLists.txt
===
--- /dev/null
+++ docs/analyzer/DesignDiscussions/InitializerLists.txt
@@ -0,0 +1,325 @@
+This discussion took place in https://reviews.llvm.org/D35216
+"Escape symbols when creating std::initializer_list".
+
+It touches problems of modelling C++ standard library constructs in general,
+including modelling implementation-defined fields within C++ standard library
+objects, in particular constructing objects into pointers held by such fields,
+and separation of responsibilities between analyzer's core and checkers.
+
+
+
+== Artem: ==
+
+I've seen a few false positives that appear because we construct C++11
+std::initializer_list objects with brace initializers, and such construction is
+not properly modeled. For instance, if a new object is constructed on the heap
+only to be put into a brace-initialized STL container, the object is reported to
+be leaked.
+
+Approach (0): This can be trivially fixed by this patch, which causes pointers
+passed into initializer list expressions to immediately escape.
+
+This fix is overly conservative though. So i did a bit of investigation as to
+how model std::initializer_list better.
+
+According to the standard, std::initializer_list is an object that has
+methods begin(), end(), and size(), where begin() returns a pointer to continous
+array of size() objects of type T, and end() is equal to begin() plus size().
+The standard does hint that it should be possible to implement
+std::initializer_list as a pair of pointers, or as a pointer and a size
+integer, however specific fields that the object would contain are an
+implementation detail.
+
+Ideally, we should be able to model the initializer list's methods precisely.
+Or, at least, it should be possible to explain to the analyzer that the list
+somehow "takes hold" of the values put into it. Initializer lists can also be
+copied, which is a separate story that i'm not trying to address here.
+
+The obvious approach to modeling std::initializer_list in a checker would be to
+construct a SymbolMetadata for the memory region of the initializer list object,
+which would be of type T* and represent begin(), so we'd trivially model begin()
+as a function that returns this symbol. The array pointed to by that symbol
+would be bindLoc()ed to contain the list's contents (probably as a CompoundVal
+to produce less bindings in the store). Extent of this array would represent
+size() and would be equal to the length of the list as written.
+
+So this sounds good, however apparently it does nothing to address our false
+positives: when the list escapes, our RegionStoreManager is not magically
+guessing that the metadata symbol attached to it, together with its contents,
+should also escape. In fact, it's impossible to trigger a pointer escape from
+within the checker.
+
+Approach (1): If only we enabled ProgramState::bindLoc(..., notifyChanges=true)
+to cause pointer escapes (not only region changes) (which sounds like the right
+thing to do anyway) such checker would be able to solve the false positives by
+triggering escapes when binding list elements to the list. However, it'd be as
+conservative as the current patch's solution. Ideally, we do not want escapes to
+happen so early. Instead, we'd prefer them to be delayed until the list itself
+escapes.
+
+So i believe that escaping metadata symbols whenever their base regions escape
+would be the right thing to do. Currently we didn't think about that because we
+had neither pointer-type metadatas nor non-pointer escapes.
+
+Approach (2): We could teach the Store to scan itself for bindings to
+metadata-symbolic-based regions during scanReachableSymbols() whenever a region
+turns out to be reachable. This requires no work on checker side, but it sounds
+performance-heavy.
+
+Approach (3): We could let checkers maintain the set of active metadata symbols
+in the program state (ideally somewhere in the Store, which sounds weird but
+causes the smallest amount of layering violations), so that the core knew what
+to escape. This puts a stress on the checkers, but with a smart data map it
+wouldn't be a problem.
+
+Approach (4): We could allow checkers to trigger pointer escapes in