Re: Null checking in Beam

Jan Lukavský Tue, 16 Mar 2021 02:14:06 -0700

I don't know the details of the checkerframework, but there seems acontradiction between what code is (currently) generated and what wetherefore release and what actually the checkerframework states [1]:


@UnknownKeyFor:

Used internally by the type system; should never be written by aprogrammer.

If this annotation is generated for overwritten methods, then I'd say,that it means we place a great burden to our users - either not usingautogenerated methods, or erase all the generated annotationsafterwards. Either way, that is not "polite" from Beam.

What we should judge is not only a formal purity of code, but whatstands on the other side is how usable APIs we provide. We should nothead for 100% pure code and sacrificing use comfort. Every check thatleaks to user code is at a price and we should not ignore that. No freelunch.


From my point of view:

a) if a check does not modify the bytecode, it is fine and we can useit - we are absolutely free to use any tooling we agree on, if our userscannot be affected anyhow

b) if a tool needs to be leaked to user, it should be as small leakageas possible

c) if a check significantly affects compile performance, it should bepossible to opt-out


I think that our current setup violates all these three points.

Moving the check to different CI is a possibility (a)), it would thenrequire opt-in flag to be able to run the check locally. It would alsostop the leakage (if we would release code without this check).

If we want to keep some annotations for user's benefit (which might befine), it should be really limited to the bare minimum (e.g. only@Nullable for method arguments and return values, possibly more, I don'tknow if and how we can configure that). Definitely not @UnknownKeyFor,that is simply internal to the checker. We should then have opt-out flagfor local development before committing changes.

Jan

[1]https://checkerframework.org/api/org/checkerframework/checker/nullness/qual/UnknownKeyFor.html


On 3/16/21 8:33 AM, Reuven Lax wrote:

On Mon, Mar 15, 2021 at 11:42 PM Reuven Lax <re...@google.com<mailto:re...@google.com>> wrote:




    On Mon, Mar 15, 2021 at 9:12 PM Kenneth Knowles <k...@apache.org
    <mailto:k...@apache.org>> wrote:

        I will be blunt about my opinions about the general issue:

        - NullPointerExceptions (and similar) are a solved problem.
           * They have been since 2003 at the latest [1] (this is when
        the types were hacked into Java - the foundation dates back to
        the 70s or earlier)


    Huh - Fahndrich tried to hire me once to work on a project called
    Singularity. Small world. Also note that Sanjay Ghemawat is listed
    in the citations!

Umm, I was confusing names. Fahndrich is actually a former coworker atGoogle :)

* Checkerframework is a _pluggable_ system where that
nullness type system is a "hello, world" level demo, since
2008 at the latest [2].
* Our users should know this and judge us accordingly.

- Checkerframework should be thought of and described as type
checking, because it is. It is not heuristic nor approximate.
- If your code was unclear about whether something could be
null, it was probably unclear to a person reading it also, and
very likely to have actual bugs.
- APIs that accept a lot of nullable parameters are, generally
speaking, bad APIs. They are hard to use correctly, less
readable, and very likely to cause actual bugs. You are
forcing your users to deal with accidental complexity you left
behind.
* Corollary to the above two points: Almost all the time,
the changes to clearify nullness make your code better, more
readable, easier for users or editors.
- It is true that there is a learning curve to programming in
this way.

I agree with the above in a closed system. However as mentioned,
some of the APIs we use suffer from this.

In a previous life, I saw up close an effort to add such analysis
to a large codebase. Two separate tools called Prefix and Prefast
were used (the difference between the two is actually quite
interesting, but not relevant here). However in order to make this
analysis useful, there was a massive effort to properly annotate
the entire codebase, including all libraries used. This isn't a
perfect example - this was a C++ codebase which is much harder to
analyze, and these tools identified far more than simply nullness
errors (resource leaks, array indices, proper string null
termination, exception behavior, etc.). However the closer we can
get to properly annotating the transitive closure of our
dependencies, the better this framework will work.

- There are certainly common patterns in Java that do not work
very well, and suppression is sometimes the best option.
* Example: JUnit's @Setup and @Test conventions do not work
very well and it is not worth the effort to make them work.
This is actually because if it were "normal code" it would be
bad code. There are complex inter-method dependencies enforced
only by convention. This matters: sometimes a JUnit test class
is called from another class, used as "normal code". This does
go wrong in practice. Plain old JUnit test cases frequently go
wrong as well.

And here is my opinion when it comes to Beam:

- "Community over code" is not an excuse for negligent
practices that cause easily avoidable risk to our users. I
will be very disappointed if we choose that.
- I think having tooling that helps newcomers write better
code by default is better for the community too. Just like
having automatic formatting is better. Less to haggle about in
review, etc.
- A simple search reveals about 170 bugs that we know of [4].
- So far in almost every module I have fixed I discovered
actual new null errors. Many examples at [5].
- It is extremely easy to suppress the type checking. Almost
all of our classes have it suppressed already (I did this
work, to allow existing errors while protecting new code).
- Including the annotations in the shipped jars is an
important feature. Without this, users cannot write null-safe
code themselves.
* Reuven highlighted this: when methods are not annotated,
we have to use/implement workarounds. Actually Guava does use
checkerframework annotations [6] and the problem is that it
requires its *input* to already be non-null so actually you
cannot even use it to convert nullable values to non-nullable
values.
* Beam has its own [7] that is annotated, actually for yet
another reason: when Guava's checkNotNull fails, it throws NPE
when it should throw IllegalArgumentException. Guava's
checkNotNull should not be used for input validation!
- It is unfortunate that IntelliJ inserts a bunch of
annotations in user code. I wonder if there is something we
can do about that. At the Java level, if they are not on the
classpath they should be ignored and not affect users.
Coincidentally, the JDK has had NullPointerExceptions in this
area :-) [8].

I understand the pain of longer compile times slowing people
down. That is actually a problem to be solved which does not
require lowering our standards of quality. How about we try
moving it to a separate CI job and see how it goes?

In my experience stories like Reuven's are much more
frustrating in a separate CI job because you find out quite
late that your code has flaws. Like when spotless fails, but
much more work to fix, and would have been prevented long ago
if it were integrated into the compile.

I agree with this. I prefer to be able to detect on my computer
that there are failures, and not have to wait for submission. The
complaint was that some people are experiencing trouble on their
local machine however, so it seems reasonable to add an opt-out
flag (though I would prefer opt out to opt in).

Kenn

[1]

https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf

<https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf>
[2]

https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf

<https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf>
[3]

https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275

<https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275>
[4]

https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)

<https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)>
[5] https://github.com/apache/beam/pull/12284
<https://github.com/apache/beam/pull/12284> and
https://github.com/apache/beam/pull/12162
<https://github.com/apache/beam/pull/12162> and
[6]

https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878

<https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878>
[7]

https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java

<https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java>
[8]
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174
<https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174>

On Mon, Mar 15, 2021 at 2:12 PM Reuven Lax <re...@google.com
<mailto:re...@google.com>> wrote:

I have some deeper concerns with the null checks. The fact
that many libraries we use (including guava) don't always
annotate their methods forces a lot of workarounds. As a
very simple example, the return value from
Preconditions.checkNotNull clearly can never be null, yet
the nullability checks don't know this. This and other
similar cases require constantly adding extra unnecessary
null checks in the code just to make the checker happy.
There have been other cases where I haven't been able to
figure out a way to make the checker happy (often these
seem to involve using lambdas), and after 10-15 minutes of
investigation have given up and disabled the check.

Now you might say that it's worth the extra pain and
ugliness of writing "useless" code to ensure that we have
null-safe code. However I think this ignores a
sociological aspect of software development. I have a
higher tolerance than many for this sort of pain, and I'm
willing to spend some time figuring out how to rewrite my
code such that it makes the checker happy (even though
often it forced me to write much more awkward code).
However even I have often found myself giving up and just
disabling the check. Many others will have less tolerance
than me, and will simply disable the checks. At that point
we'll have a codebase littered with
@SuppressWarnings("nullness"), which doesn't really get us
where we want to be. I've seen similar struggles in other
codebases, and generally having a static checker with too
many false positives often ends up being worse than having
no checker.

On Mon, Mar 15, 2021 at 10:37 AM Ismaël Mejía
<ieme...@gmail.com <mailto:ieme...@gmail.com>> wrote:

Even if I like the strictness for Null checking, I
also think that
this is adding too much extra time for builds (that I
noticed locally
when enabled) and also I agree with Jan that the
annotations are
really an undesired side effect. For reference when
you try to auto
complete some method signatures on IntelliJ on
downstream projects
with C-A-v it generates some extra Checkers
annotations like @NonNull
and others even if the user isn't using them which is
not desirable.

On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver
<kcwea...@google.com <mailto:kcwea...@google.com>> wrote:
>>
>> Big +1 for moving this to separate CI job. I really
don't like what annotations are currently added to the
code we ship. Tools like Idea add these annotations to
code they generate when overriding classes and that's
very annoying. Users should not be exposed to internal
tools like nullability checking.
>
>
> I was only planning on moving this to a separate CI
job. The job would still be release blocking, so the
same annotations would still be required.
>
> I'm not sure which annotations you are concerned
about. There are two annotations involved with
nullness checking, @SuppressWarnings and @Nullable.
@SuppressWarnings has retention policy SOURCE, so it
shouldn't be exposed to users at all. @Nullable is not
just for internal tooling, it also provides useful
information about our APIs to users. The user should
not have to guess whether a method argument etc. can
be null or not, and for better or worse, these
annotations are the standard way of expressing that in
Java.

Re: Null checking in Beam

Reply via email to