On Wed, 11 Jan 2023 21:51:45 GMT, Maurizio Cimadamore <[email protected]>
wrote:
> So, in this example though you are calling an instance method before the
> object is initialized, which would seem to me like a leak
D'oh, you're right. But if you made `returnMe()` static or private then the
argument would still hold.
> And, if the method is static, same story - you are passing ref3 somewhere
> else, and ref3 potentially contains this.
Not true... static methods are safe to "invoke" because they can't be
overridden. When the analyzer "invokes" the method, it will see that all it
does with its parameter is return it.
In other words, any method (or constructor) in the current compilation unit
that can't be overridden is never considered "somewhere else".
> While I know that this is not perfect, and bound to generate false positives,
> I get the spirit of my question is, really: is there a (much) simpler scheme
> we can get away with, which has bounded complexity, and which has the
> property we care about (which seems to be no false negative). I'm less
> worried about contrived cases emitting false positives, as it's an optional
> warning that can be shut down - but I'd like perhaps to move the discussion
> from trying to detect precisely if the leak happens to try to detect if
> potentially a leak can happen, and see if there's some simpler analysis that
> can be used to get there (e.g. one which doesn't require flooding). It's
> possible that you have already considered all these options and the analysis
> you have here is the best trade off between complexity and precision - but
> I'd like to have a better understanding of what the trade offs are, and, more
> importantly, what happens to real code when we tweak the analysis this or
> that way (as this is a
problem where I feel it's easy to get in the land of diminishing returns).
I can only report from my own experience. I thought about this a bit and tried
a few different things and what I ended up was the best I could come up with in
terms of that trade-off. Of course there was a good bit of intuition and
SWAG'ing in that and also a lot of thought experiments.
Of course if you have a clever idea for how to do this in a simpler way that
achieves basically the same result, I'm all ears :)
But I don't really have an analysis of all the trade-offs. Really, at a certain
point I stumbled on what I thought was a fairly straightforward way to achieve
a good false negative rate and a very low false positive rate, without a lot of
work. And frankly at that point I stopped caring about the question you're
asking, although I agree it's certainly interesting from an academic point of
view (I'm a practical person).
A lot of my initial ideas were too simple for my taste, because I could easily
find real-world false negatives.
An example of "too simple": at first I was not trying to track an outer 'this'.
But I found a lot of constructors out there that instantiate nested classes and
then do things with them. Without tracking outer 'this' these would all be
missed.
To take a random example of that:
public class FileChooserDemo extends JPanel implements ActionListener {
...
public FileChooserDemo() {
...
// create a radio listener to listen to option changes
OptionListener optionListener = new OptionListener();
// Create options
openRadioButton = new JRadioButton("Open");
openRadioButton.setSelected(true);
openRadioButton.addActionListener(optionListener); // <- LEAK HERE
...
}
private class OptionListener implements ActionListener {
...
}
}
That particular leak is (probably) innocuous, but it still qualifies as a leak
and should be reported. Note that failing to track an outer 'this' causes false
negatives, which are a bigger problem than false positives.
On the flip side, I started out trying to explicitly track field references,
but that seemed like more complexity than needed, at least for phase 1, so that
was left out.
The "flooding" aspect didn't really worry me because the reference set is
"append only" and there is small, finite set of possible references at each
scope, so it can't really get that out of hand. Testing indeed shows it's not a
problem. By the way recursion also "floods" until convergence, just like
looping.
I would have been pleased to find "a much simpler scheme". But the more I
thought about those options, the more I came up with easy misses. And after a
certain point, it actually became easier to simply "execute" the code by
scanning the AST while carrying around a `Set<Ref>` to track possible
references. It's really that simple. Instead of trying to be "smart" we just
let the code tell us what happens.
Apologies if this is not a very good answer to your question.
In the big picture, the false positive rate is traded-off against code
complexity, and we want the best possible trade-off, right?
But how are you defining "complexity"?
If you really mean performance, then I don't see a problem... the JDK build
times are essentially unchanged.
If you mean lines of code or whatever, then it doesn't seem inordinate. And the
algorithm is more or less just an AST scan, like lots of other examples in the
compiler code. So I don't see a problem there either.
Yes, there may be a much simpler way that's just as good, but if I could have
thought of it I would have already. Perhaps you or someone else has better
insight.
-------------
PR: https://git.openjdk.org/jdk/pull/11874