[PATCH] D36737: [analyzer] Store design discussions in docs/analyzer for future use.
This revision was automatically updated to reflect the committed changes. Closed by commit rL314218: [analyzer] Keep track of design discusions as part of analyzer documentation. (authored by dergachev). Changed prior to commit: https://reviews.llvm.org/D36737?vs=116520=116724#toc Repository: rL LLVM https://reviews.llvm.org/D36737 Files: cfe/trunk/docs/analyzer/DesignDiscussions/InitializerLists.rst Index: cfe/trunk/docs/analyzer/DesignDiscussions/InitializerLists.rst === --- cfe/trunk/docs/analyzer/DesignDiscussions/InitializerLists.rst +++ cfe/trunk/docs/analyzer/DesignDiscussions/InitializerLists.rst @@ -0,0 +1,321 @@ +This discussion took place in https://reviews.llvm.org/D35216 +"Escape symbols when creating std::initializer_list". + +It touches problems of modelling C++ standard library constructs in general, +including modelling implementation-defined fields within C++ standard library +objects, in particular constructing objects into pointers held by such fields, +and separation of responsibilities between analyzer's core and checkers. + +**Artem:** + +I've seen a few false positives that appear because we construct +C++11 std::initializer_list objects with brace initializers, and such +construction is not properly modeled. For instance, if a new object is +constructed on the heap only to be put into a brace-initialized STL container, +the object is reported to be leaked. + +Approach (0): This can be trivially fixed by this patch, which causes pointers +passed into initializer list expressions to immediately escape. + +This fix is overly conservative though. So i did a bit of investigation as to +how model std::initializer_list better. + +According to the standard, std::initializer_list is an object that has +methods begin(), end(), and size(), where begin() returns a pointer to continous +array of size() objects of type T, and end() is equal to begin() plus size(). +The standard does hint that it should be possible to implement +std::initializer_list as a pair of pointers, or as a pointer and a size +integer, however specific fields that the object would contain are an +implementation detail. + +Ideally, we should be able to model the initializer list's methods precisely. +Or, at least, it should be possible to explain to the analyzer that the list +somehow "takes hold" of the values put into it. Initializer lists can also be +copied, which is a separate story that i'm not trying to address here. + +The obvious approach to modeling std::initializer_list in a checker would be to +construct a SymbolMetadata for the memory region of the initializer list object, +which would be of type T* and represent begin(), so we'd trivially model begin() +as a function that returns this symbol. The array pointed to by that symbol +would be bindLoc()ed to contain the list's contents (probably as a CompoundVal +to produce less bindings in the store). Extent of this array would represent +size() and would be equal to the length of the list as written. + +So this sounds good, however apparently it does nothing to address our false +positives: when the list escapes, our RegionStoreManager is not magically +guessing that the metadata symbol attached to it, together with its contents, +should also escape. In fact, it's impossible to trigger a pointer escape from +within the checker. + +Approach (1): If only we enabled ProgramState::bindLoc(..., notifyChanges=true) +to cause pointer escapes (not only region changes) (which sounds like the right +thing to do anyway) such checker would be able to solve the false positives by +triggering escapes when binding list elements to the list. However, it'd be as +conservative as the current patch's solution. Ideally, we do not want escapes to +happen so early. Instead, we'd prefer them to be delayed until the list itself +escapes. + +So i believe that escaping metadata symbols whenever their base regions escape +would be the right thing to do. Currently we didn't think about that because we +had neither pointer-type metadatas nor non-pointer escapes. + +Approach (2): We could teach the Store to scan itself for bindings to +metadata-symbolic-based regions during scanReachableSymbols() whenever a region +turns out to be reachable. This requires no work on checker side, but it sounds +performance-heavy. + +Approach (3): We could let checkers maintain the set of active metadata symbols +in the program state (ideally somewhere in the Store, which sounds weird but +causes the smallest amount of layering violations), so that the core knew what +to escape. This puts a stress on the checkers, but with a smart data map it +wouldn't be a problem. + +Approach (4): We could allow checkers to trigger pointer escapes in arbitrary +moments. If we allow doing this within checkPointerEscape callback itself, we +would be able to express facts like "when this region escapes, that metadata +symbol attached to it should also
[PATCH] D36737: [analyzer] Store design discussions in docs/analyzer for future use.
zaks.anna accepted this revision. zaks.anna added a comment. This revision is now accepted and ready to land. Thanks! https://reviews.llvm.org/D36737 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36737: [analyzer] Store design discussions in docs/analyzer for future use.
NoQ updated this revision to Diff 116520. NoQ added a comment. Update to use `.rst` formatting. https://reviews.llvm.org/D36737 Files: docs/analyzer/DesignDiscussions/InitializerLists.rst Index: docs/analyzer/DesignDiscussions/InitializerLists.rst === --- /dev/null +++ docs/analyzer/DesignDiscussions/InitializerLists.rst @@ -0,0 +1,321 @@ +This discussion took place in https://reviews.llvm.org/D35216 +"Escape symbols when creating std::initializer_list". + +It touches problems of modelling C++ standard library constructs in general, +including modelling implementation-defined fields within C++ standard library +objects, in particular constructing objects into pointers held by such fields, +and separation of responsibilities between analyzer's core and checkers. + +**Artem:** + +I've seen a few false positives that appear because we construct +C++11 std::initializer_list objects with brace initializers, and such +construction is not properly modeled. For instance, if a new object is +constructed on the heap only to be put into a brace-initialized STL container, +the object is reported to be leaked. + +Approach (0): This can be trivially fixed by this patch, which causes pointers +passed into initializer list expressions to immediately escape. + +This fix is overly conservative though. So i did a bit of investigation as to +how model std::initializer_list better. + +According to the standard, std::initializer_list is an object that has +methods begin(), end(), and size(), where begin() returns a pointer to continous +array of size() objects of type T, and end() is equal to begin() plus size(). +The standard does hint that it should be possible to implement +std::initializer_list as a pair of pointers, or as a pointer and a size +integer, however specific fields that the object would contain are an +implementation detail. + +Ideally, we should be able to model the initializer list's methods precisely. +Or, at least, it should be possible to explain to the analyzer that the list +somehow "takes hold" of the values put into it. Initializer lists can also be +copied, which is a separate story that i'm not trying to address here. + +The obvious approach to modeling std::initializer_list in a checker would be to +construct a SymbolMetadata for the memory region of the initializer list object, +which would be of type T* and represent begin(), so we'd trivially model begin() +as a function that returns this symbol. The array pointed to by that symbol +would be bindLoc()ed to contain the list's contents (probably as a CompoundVal +to produce less bindings in the store). Extent of this array would represent +size() and would be equal to the length of the list as written. + +So this sounds good, however apparently it does nothing to address our false +positives: when the list escapes, our RegionStoreManager is not magically +guessing that the metadata symbol attached to it, together with its contents, +should also escape. In fact, it's impossible to trigger a pointer escape from +within the checker. + +Approach (1): If only we enabled ProgramState::bindLoc(..., notifyChanges=true) +to cause pointer escapes (not only region changes) (which sounds like the right +thing to do anyway) such checker would be able to solve the false positives by +triggering escapes when binding list elements to the list. However, it'd be as +conservative as the current patch's solution. Ideally, we do not want escapes to +happen so early. Instead, we'd prefer them to be delayed until the list itself +escapes. + +So i believe that escaping metadata symbols whenever their base regions escape +would be the right thing to do. Currently we didn't think about that because we +had neither pointer-type metadatas nor non-pointer escapes. + +Approach (2): We could teach the Store to scan itself for bindings to +metadata-symbolic-based regions during scanReachableSymbols() whenever a region +turns out to be reachable. This requires no work on checker side, but it sounds +performance-heavy. + +Approach (3): We could let checkers maintain the set of active metadata symbols +in the program state (ideally somewhere in the Store, which sounds weird but +causes the smallest amount of layering violations), so that the core knew what +to escape. This puts a stress on the checkers, but with a smart data map it +wouldn't be a problem. + +Approach (4): We could allow checkers to trigger pointer escapes in arbitrary +moments. If we allow doing this within checkPointerEscape callback itself, we +would be able to express facts like "when this region escapes, that metadata +symbol attached to it should also escape". This sounds like an ultimate freedom, +with maximum stress on the checkers - still not too much stress when we have +smart data maps. + +I'm personally liking the approach (2) - it should be possible to avoid +performance overhead, and clarity seems nice. + +**Gabor:** + +At this
[PATCH] D36737: [analyzer] Store design discussions in docs/analyzer for future use.
zaks.anna added a comment. I think we should have these is .rst format as this is what the rest of the documentation predominantly uses. (Note, the formatting can be very basic, it's the format that I care about.) Otherwise, great to see this addition! https://reviews.llvm.org/D36737 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36737: [analyzer] Store design discussions in docs/analyzer for future use.
dcoughlin added a comment. Thanks for doing this! https://reviews.llvm.org/D36737 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36737: [analyzer] Store design discussions in docs/analyzer for future use.
xazax.hun added a comment. Good idea! I think some of your extremely helpful responses from the mailing list would worth archiving in a form of documentation as well. Also, as far as I can recall, you are also maintaining a pdf which is a very good reference. In some form, I would love to see that information upstream too. https://reviews.llvm.org/D36737 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36737: [analyzer] Store design discussions in docs/analyzer for future use.
NoQ created this revision. Because i tend to raise important design questions in weird places like tiny phabricator patches, we had a thought that it's better to archive them in `docs/analyzer` for future reference. We should be able to pick them up whenever time comes to solve these problems. I initialize a new folder in `docs/analyzer` for this sort of stuff with the discussion in https://reviews.llvm.org/D35216; we also have a few more discussions that might look good in such archive. https://reviews.llvm.org/D36737 Files: docs/analyzer/DesignDiscussions/InitializerLists.txt Index: docs/analyzer/DesignDiscussions/InitializerLists.txt === --- /dev/null +++ docs/analyzer/DesignDiscussions/InitializerLists.txt @@ -0,0 +1,325 @@ +This discussion took place in https://reviews.llvm.org/D35216 +"Escape symbols when creating std::initializer_list". + +It touches problems of modelling C++ standard library constructs in general, +including modelling implementation-defined fields within C++ standard library +objects, in particular constructing objects into pointers held by such fields, +and separation of responsibilities between analyzer's core and checkers. + + + +== Artem: == + +I've seen a few false positives that appear because we construct C++11 +std::initializer_list objects with brace initializers, and such construction is +not properly modeled. For instance, if a new object is constructed on the heap +only to be put into a brace-initialized STL container, the object is reported to +be leaked. + +Approach (0): This can be trivially fixed by this patch, which causes pointers +passed into initializer list expressions to immediately escape. + +This fix is overly conservative though. So i did a bit of investigation as to +how model std::initializer_list better. + +According to the standard, std::initializer_list is an object that has +methods begin(), end(), and size(), where begin() returns a pointer to continous +array of size() objects of type T, and end() is equal to begin() plus size(). +The standard does hint that it should be possible to implement +std::initializer_list as a pair of pointers, or as a pointer and a size +integer, however specific fields that the object would contain are an +implementation detail. + +Ideally, we should be able to model the initializer list's methods precisely. +Or, at least, it should be possible to explain to the analyzer that the list +somehow "takes hold" of the values put into it. Initializer lists can also be +copied, which is a separate story that i'm not trying to address here. + +The obvious approach to modeling std::initializer_list in a checker would be to +construct a SymbolMetadata for the memory region of the initializer list object, +which would be of type T* and represent begin(), so we'd trivially model begin() +as a function that returns this symbol. The array pointed to by that symbol +would be bindLoc()ed to contain the list's contents (probably as a CompoundVal +to produce less bindings in the store). Extent of this array would represent +size() and would be equal to the length of the list as written. + +So this sounds good, however apparently it does nothing to address our false +positives: when the list escapes, our RegionStoreManager is not magically +guessing that the metadata symbol attached to it, together with its contents, +should also escape. In fact, it's impossible to trigger a pointer escape from +within the checker. + +Approach (1): If only we enabled ProgramState::bindLoc(..., notifyChanges=true) +to cause pointer escapes (not only region changes) (which sounds like the right +thing to do anyway) such checker would be able to solve the false positives by +triggering escapes when binding list elements to the list. However, it'd be as +conservative as the current patch's solution. Ideally, we do not want escapes to +happen so early. Instead, we'd prefer them to be delayed until the list itself +escapes. + +So i believe that escaping metadata symbols whenever their base regions escape +would be the right thing to do. Currently we didn't think about that because we +had neither pointer-type metadatas nor non-pointer escapes. + +Approach (2): We could teach the Store to scan itself for bindings to +metadata-symbolic-based regions during scanReachableSymbols() whenever a region +turns out to be reachable. This requires no work on checker side, but it sounds +performance-heavy. + +Approach (3): We could let checkers maintain the set of active metadata symbols +in the program state (ideally somewhere in the Store, which sounds weird but +causes the smallest amount of layering violations), so that the core knew what +to escape. This puts a stress on the checkers, but with a smart data map it +wouldn't be a problem. + +Approach (4): We could allow checkers to trigger pointer escapes in