NoQ added a comment.

Thx!~

In D58368#1404747 <https://reviews.llvm.org/D58368#1404747>, @Charusso wrote:

> This is a cool approach, but it is difficult to use this API in other 
> checkers. If you do not out-chain D58367 <https://reviews.llvm.org/D58367> I 
> would like to see something like that here:
>
>   SmallString<64> Str;
>   llvm::raw_svector_ostream OS(Str);
>   OS << "Deallocating object passed through parameter '" << PVD->getName()
>      << '\'';
>  
>   C.addNote(C.getState()->set<ReleasedParameter>(true), OS.str());
>


The idea to //allow// passing a raw std::string instead of a lambda that 
generates a string sounds like a great improvement, i'll do this.

I don't think i want to //disallow// lambdas entirely, because

- Some dynamic logic is definitely often necessary - eg., the MallocChecker 
example above. Essentially, in most cases we don't know what message should we 
produce (or even whether we should produce it at all) until the bug report is 
actually emitted. This specific checker is very lucky in that regard: every 
transition produced by this checker on the execution path does deserve a 
message, and this message doesn't depend on the bug report in any manner, but 
only on the event that caused it to appear.
- I want to make the API flexible enough to avoid performance problems. 
Generating the string is usually cheap, but transitions are made much more 
often than warnings are issued (simply because most code doesn't cause us to 
produce a bug), so producing the message every time we make a transition is a 
much higher multiplier for the cost of generating the string than producing the 
message every time we make a transition-that-leads-to-a-bug.

----

Adding a comfy wrapper around `addTransition` that constructs the tag 
automatically is also a great idea, i'll do this, but i'd like to delay this a 
little bit because, well, `addTransition` API is generally a large source of 
pain for checker developers and could use a rework in order to make it more 
understandable.

In particular, the implementation you propose does a lot more than adding a 
note: it also adds, well, a transition. Namely, the reader would most likely 
expect the following code

  State1 = State->set<Trait1>(Value1);
  C.addNote(State1, "Message 1");
  State2 = State1->set<Trait2>(Value2);
  C.addNote(State2, "Message 2");

to make two updates to the state and display a note for each of them:

  /-------------\
  | Predecessor |
  | State       |
  \-------------/
        ||
        \/
  /-------------\
  | Node1       |
  | State1      |
  | "Message 1" |
  \-------------/
        ||
        \/
  /-------------\
  | Node2       |
  | State2      |
  | "Message 2" |
  \-------------/

The actual behavior, however, would be to split the execution path in two 
parallel paths, analyze them separately later, and mark each of them with the 
respective message:

           /-------------\
           | Predecessor |
           | State       |
           \-------------/
             ||       ||
             \/       \/
  /-------------\   /-------------\
  | Node1       |   | Node2       |
  | State1      |   | State2      |
  | "Message 1" |   | "Message 2" |
  \-------------/   \-------------/

We already have this problem with `generateNonFatalErrorNode()`: even though it 
explicitly states that it will generate a node, it's very unobvious that 
generating two error nodes not only allows you to report two bugs, but also 
causes an accidental state split. Additionally, such accidental state splits 
are hard to notice because most of the time it doesn't cause any visible 
differences apart from a 2x slowdown of the remaining analysis on that 
execution path. In case of `addNote()` it should be a bit better because you'd 
be able to notice that one of the state updates isn't taking place, leading to 
false positives, but in case of `generateNonFatalErrorNode()` there are often 
no state updates being made (which is bad on its own, but for a separate 
reason).

I do have one idea on how to make this less confusing, but it's a bigger piece 
of work.

In D58367#1404722 <https://reviews.llvm.org/D58367#1404722>, @Charusso wrote:

> we do not care that much about the core `BugReporterVisitors` as they do 
> their job enough well.


If only that was so :)

In D58367#1404722 <https://reviews.llvm.org/D58367#1404722>, @Charusso wrote:

> So with that, I would out-chain this patch from the MIG-patches because it is 
> a lot more serious problem-set. I also would like to ask you to take it more 
> serious, as all your mentioned benefits will rely on this simple and cool 
> approach, so it has to be flexible as possible.


I've been thinking for about a month about this until now, and i'm actually 
very surprised that i see no obvious flexibility issues here. Stateful visitors 
(eg., the ones that only highlight the *last* interesting event) can be easily 
implemented by keeping via lambdas as a private state data in the bug reporter. 
Mutually recursive systems of multiple visitors that add each other dynamically 
during the visit (such as the `trackExpressionValue` that's infamous for that 
exact reason) should be (and most likely could be) untangled anyway, and once 
they're untangled (eg., by keeping just one instance of the visitor while 
dynamically updating its tracking information), the flexibility issue 
disappears; i'm almost happy that it would no longer be possible to entangle 
the code that way. Dunno, this is weird - usually i quickly come up with 
examples of why something wouldn't work and decide to implement it anyway, but 
this time i'm surprisingly secure about implementing it.

In D58367#1404722 <https://reviews.llvm.org/D58367#1404722>, @Charusso wrote:

> I am not into the internal stuff that much, so I cannot argue about the lack 
> of `CheckerContext` functions, but I think this is a huge problem and breaks 
> the flexibility a lot.


The issue here is that the checker doesn't always have an ability to mutate the 
graph by adding nodes. Now it becomes more of a problem because the checker may 
want to emit notes more often than it wants to generate nodes. This problem can 
be easily mitigated by passing a different sort of context (not 
`CheckerContext` but something else) into these callbacks that will store 
lambdas in a different manner. It slightly sucks that we'd have to come up with 
a duplicate API for that, but that's not something we need to solve immediately 
- i believe that we can always add these later.

Another way to describe this limitation is that "lambda visitors" are only 
capable of adding notes to nodes that were produced by the checker itself. 
Old-style visitors, on the other hand, can drop notes in arbitrary places - 
only once per node, but it's still something. Lambdas, of course, also produce 
at most one message per node, but for lambdas it's not a problem because the 
checker can make as many nodes as it wants. Whereas an arbitrary visitor cannot 
retroactively provide itself with more nodes if they weren't constructed 
artificially ahead of time. I think this is not a big flexibility issue on its 
own because it should be possible to make lambdas from multiple sources 
cooperate with each other by sharing a certain amount of arbitrary state 
information within the BugReport object.

In D58367#1404722 <https://reviews.llvm.org/D58367#1404722>, @Charusso wrote:

> `getNoteTag()`: you could differentiate a setter with the true getter what 
> you only use in `TagVisitor::VisitNode()` (where you truly hook your 
> `Callback` to get extra information later on):
>
>   const NoteTag *setNoteTag(StringRef Message) {
>     return Eng.getNoteTags().setNoteTag(Message);
>   }
>   


Mmm, i don't think i understand this part. Like, `getNoteTag()` is not really a 
getter. It's a factory method on an allocator that causes it to produce an 
object. What is the semantics of a setter that would correspond to it?

In D58367#1404722 <https://reviews.llvm.org/D58367#1404722>, @Charusso wrote:

> - Documentation: if we are about the rewrite the entire bug-reporting 
> business, it worth some long doxygen comments.


Yup, totally.

Like, we're not really rewriting anything yet, just trying out a new approach. 
I'll definitely not jump into rewriting everything until the new approach gets 
tested a lot and turns out to actually work.

One important thing to document is the lifetime of these lambdas (equal to 
ExprEngine's lifetime, which is a fairly popular lifetime to have in the 
Analyzer). Captures would need to live that long and can probably be cleaned up 
at top-level `checkBeginFunction()` if they are memory-managed manually.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D58367/new/

https://reviews.llvm.org/D58367



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to