Szelethus created this revision. Szelethus added reviewers: NoQ, george.karpenkov, xazax.hun, MTC. Herald added subscribers: cfe-commits, jfb, mikhail.ramalho, a.sidorin, rnkovacs, szepet, whisperity.
Added some extra tasks to the open projects. These are the ideas of @NoQ and @george.karpenkov, I just converted them to HTML. Repository: rC Clang https://reviews.llvm.org/D53024 Files: www/analyzer/open_projects.html
Index: www/analyzer/open_projects.html =================================================================== --- www/analyzer/open_projects.html +++ www/analyzer/open_projects.html @@ -25,6 +25,86 @@ <ul> <li>Core Analyzer Infrastructure <ul> + <li>Implement a dataflow flamework. + <p><!-- TODO: Explain. --> + <i> (Difficulty: Hard) </i></p> + </li> + + <li>Handle aggregate construction. + <p>Aggregates are object that can be brace-initialized without calling a + constructor (i.e., no <code><a href="https://clang.llvm.org/doxygen/classclang_1_1CXXConstructExpr.html"> + CXXConstructExpr</a></code> in the AST), but potentially calling + constructors for their fields and (since C++17) base classes - and these + constructors of sub-objects need to know what object (field in the + aggregate) they are constructing. Moreover, if the aggregate contains + references, lifetime extension needs to be modeled. Aggregates can be + nested, so <code><a href="https://clang.llvm.org/doxygen/classclang_1_1ConstructionContext.html"> + ConstructionContext</a></code> can potentially cover an unlimited amount of + statements. One can start untangling this problem by trying to replace the + hacky <code><a href="https://clang.llvm.org/doxygen/classclang_1_1ParentMap.html"> + ParentMap</a></code> lookup in <a href="https://clang.llvm.org/doxygen/ExprEngineCXX_8cpp_source.html#l00430"> + <code>CXXConstructExpr::CK_NonVirtualBase</code> branch of + <code>ExprEngine::VisitCXXConstructExpr()</code></a> with some actual + support for the feature. + <i> (Difficulty: Medium) </i></p> + </li> + + <li>Fix CFG for GNU "binary conditional" operator <code>?:</code>. + <p>CFG for GNU "binary conditional" operator <code>?:</code> is broken in + C++. Its condition-and-LHS need to only be evaluated once. + <i>(Difficulty: Easy)</i><p> + </li> + + <li>Handle unions. + <p>Currently in the analyzer, the value of a union is always regarded as + unknown. There has been some discussion about this on the <a href="http://lists.llvm.org/pipermail/cfe-dev/2017-March/052864.html"> + mailing list already</a>, but it is still an untouched area. + <i> (Difficulty: Medium) </i></p> + </li> + + <li>Enhance the modeling of the standard library. + <p>There is a huge amount of checker work for teaching the Static Analyzer + about the C++ standard library. It is very easy to explain to the static + analyzer that calling <code>.length()</code> on an empty <code>std::string + </code> will yield 0, and vice versa, but supporting all of them is a huge + amount of work. One good thing to start with here would be to notice that + inlining methods of C++ "containers" is currently outright forbidden in + order to suppress a lot of false alarms due to weird <code>assume()</code>s + made within inlined methods. There’s a hypothesis that these suppressions + should have been instead implemented as bug report visitors, which would + still suppress false positives, but will not prevent us from inlining the + ethods, and therefore will not cause other false positives. Verifying this + hypothesis would be a wonderful accomplishment. Previous understanding of + the "inlined defensive checks" problem is a pre-requisite for this project. + <i>(Difficulty: Medium)</i><p> + </li> + + <li>Reimplement the representation for various symbolic values. + <p><code><a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1nonloc_1_1LocAsInteger.html"> + LocAsInteger</a></code> is annoying, but alternatives are vague. Casts into + the opposite direction - integers to pointers - are completely unsupported. + Pointer-to-pointer casts are a mess; modeling them with <code><a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1ElementRegion.html"> + ElementRegion </a></code> is a disaster and we are suffering a lot from this + hack, but coming up with a significantly better solution is very hard, as + there are a lot of corner-cases to cover, and it’s hard to maintain balance + between richness of our representation of symbolic values and our ability to + understand when the two different values in fact represent the same thing. + <i>(Difficulty: Hard)</i><p> + </li> + + <li> Provide better alternatives to inlining. + <p>Sometimes instead of inlining, a much simpler behavior would be more + efficient. For instance, if the function is pure, then a single bit of + information “this function is pure” would already be much better than + conservative evaluation, and sometimes good enough to make inlining not + worth the effort. Gathering such snippets of information - “partial + summaries" - automatically, from the more simple to the more complex + summaries, and re-using them later, probably across translation units, might + improve our analysis quite a lot, while being something that can be worked + on incrementally and doesn’t require checkers to react immediately. + <i>(Difficulty: Hard)</i><p> + </li> + <li>Explicitly model standard library functions with <tt>BodyFarm</tt>. <p><tt><a href="http://clang.llvm.org/doxygen/classclang_1_1BodyFarm.html">BodyFarm</a></tt> allows the analyzer to explicitly model functions whose definitions are @@ -57,28 +137,32 @@ <p>There is an existing implementation of this, but it's not complete and is disabled in the analyzer. <i>(Difficulty: Medium; current contact: Alex McCarthy)</i></p> + </li> <li>Enhance CFG to model exception-handling properly. <p>Currently exceptions are treated as "black holes", and exception-handling control structures are poorly modeled (to be conservative). This could be much improved for both C++ and Objective-C exceptions. <i>(Difficulty: Medium)</i></p> + </li> <li>Enhance CFG to model C++ <code>new</code> more precisely. <p>The current representation of <code>new</code> does not provide an easy way for the analyzer to model the call to a memory allocation function (<code>operator new</code>), then initialize the result with a constructor call. The problem is discussed at length in <a href="http://llvm.org/bugs/show_bug.cgi?id=12014">PR12014</a>. <i>(Difficulty: Easy; current contact: Karthik Bhat)</i></p> + </li> <li>Enhance CFG to model C++ <code>delete</code> more precisely. <p>Similarly, the representation of <code>delete</code> does not include the call to the destructor, followed by the call to the deallocation function (<code>operator delete</code>). One particular issue (<tt>noreturn</tt> destructors) is discussed in <a href="http://llvm.org/bugs/show_bug.cgi?id=15599">PR15599</a> <i>(Difficulty: Easy; current contact: Karthik Bhat)</i></p> + </li> <li>Implement a BitwiseConstraintManager to handle <a href="http://llvm.org/bugs/show_bug.cgi?id=3098">PR3098</a>. <p>Constraints on the bits of an integer are not easily representable as @@ -92,16 +176,45 @@ dynamic type based on what operations the code is performing. Casts are a rich source of type information that the analyzer currently ignores. They are tricky to get right, but might have very useful consequences. - <i>(Difficulty: Medium)</i></p> + <i>(Difficulty: Medium)</i></p> + </li> <li>Design and implement alpha-renaming. <p>Implement unifying two symbolic values along a path after they are determined to be equal via comparison. This would allow us to reduce the number of false positives and would be a building step to more advanced analyses, such as summary-based interprocedural and cross-translation-unit analysis. <i>(Difficulty: Hard)</i></p> - </li> + </li> + + <li>Make the values in <code><a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1Environment.html"> + Environment</a></code> immutable + <p>Values in the <code><a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1Environment.html"> + Environment</a></code> should never change. Once value of an expression is + computed, it remains the same until the expression goes out of scope, so the + operation of substituting a value in <code><a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1Environment.html"> + Environment</a></code> should have never been supported. Unfortunately, + adding such assertion crashes an awful lot of tests, because <code><a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1Environment.html"> + Environment</a></code> is used for a lot of weird hacks even while modeling + simple operations like <code>+=</code>. + <i>(Difficulty: Medium)</i></p> + </li> + + <li>Enforcing the invariant that glvalue expressions and pointer-type + prvalue expressions have <code><a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1Loc.html"> + Loc</a></code> values, other prvalue expressions have <code><a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1NonLoc.html"> + NonLoc</a></code> values. + <i>(Difficulty: Medium)</i></p> + </li> + + <li>Refactor <code><a href="https://clang.llvm.org/doxygen/namespaceclang_1_1ento_1_1bugreporter.html#a532da89ff4c3d8214a04da342a553dc5"> + trackExpressionValue</a></code>. + <p>The process of tracking values is very unclear. One should understand the + basic simple idea behind the solution and then throw away the existing code + and implement the clean solution. + <i>(Difficulty: Medium)</i></p> + </li> </ul> </li> @@ -116,6 +229,51 @@ </ul> </li> + <li>Move checkers out of alpha + <p>Because LLVM doesn't have branches, unfinished checkers first land in + alpha, and are only moved out once they are production-ready. Howeever, over + the years many checkers got stuck in alpha, and their developtment have + stalled. + <p>Checkers that presumably can't be made to work should be removed from the + trunk and the rest should ideally be finished. + <ul> + <li><code>alpha.security.ArrayBound</code> and <code> + alpha.security.ArrayBoundV2</code> + <p>Array bounds checker is a very wanted feature, but it doesn’t sound like + we can fix all of its false positives without full-featured loop widening + support. At the same time, array bound violations based on <b>tainted</b> + index values sound like a much more possible task. + <i>(Difficulty: Medium)</i></p> + </li> + + <li><code>alpha.cplusplus.MisusedMovedObject</code> + <p>Something needs to be done about false positives caused by classes that + can in fact be used after being moved from, i.e. the empty space that + remains after the move has well-defined contents. This is not the case for + STL objects, but this is the case for many other user-defined objects. + <i>(Difficulty: Easy)</i></p> + </li> + + <li><code>alpha.unix.Stream</code> + <p>One of the more annoying parts in this is handling state splits for error + return values. A “Schrödinger state” technique that was first implemented in + the PthreadLockChecker (where a mutex was destroyed and not destroyed at the + same time, until the return value of pthread_mutex_destroy was checked by a + branch in the code). + <i>(Difficulty: Easy)</i></p> + </li> + + <li>Many alpha checks can be turned into opt-in lint-like checks + <p>Path-sensitive lint checks are interesting and they can’t be implemented + in clang-tidy and there’s clearly an interest in them, but we here aren’t + having enough maintenance power to respond to bugs and false positives. If + anybody with a maintainer attitude grabs takes responsibility of this part, + it may have a much better chance to grow and thrive. + <i>(Difficulty: Easy)</i></p> + </li> + </ul> + </li> + <li>Other Infrastructure <ul> <li>Rewrite <tt>scan-build</tt> (in Python). @@ -162,7 +320,12 @@ </li> <li>Implement a BitwiseMaskingChecker to handle <a href="http://llvm.org/bugs/show_bug.cgi?id=16615">PR16615</a>. - <p>Symbolic expressions of the form <code>$sym & CONSTANT</code> can range from 0 to <code>CONSTANT-</code>1 if CONSTANT is <code>2^n-1</code>, e.g. 0xFF (0b11111111), 0x7F (0b01111111), 0x3 (0b0011), 0xFFFF, etc. Even without handling general bitwise operations on symbols, we can at least bound the value of the resulting expression. Bonus points for handling masks followed by shifts, e.g. <code>($sym & 0b1100) >> 2</code>. + <p>Symbolic expressions of the form <code>$sym & CONSTANT</code> can + range from 0 to <code>CONSTANT-</code>1 if CONSTANT is <code>2^n-1</code>, + e.g. 0xFF (0b11111111), 0x7F (0b01111111), 0x3 (0b0011), 0xFFFF, etc. Even + without handling general bitwise operations on symbols, we can at least + bound the value of the resulting expression. Bonus points for handling masks + followed by shifts, e.g. <code>($sym & 0b1100) >> 2</code>. <i>(Difficulty: Easy)</i></p> </li>
_______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits