Re: Null checking in Beam

Reuven Lax Tue, 16 Mar 2021 00:33:47 -0700

On Mon, Mar 15, 2021 at 11:42 PM Reuven Lax <[email protected]> wrote:


>
>
> On Mon, Mar 15, 2021 at 9:12 PM Kenneth Knowles <[email protected]> wrote:
>
>> I will be blunt about my opinions about the general issue:
>>
>> - NullPointerExceptions (and similar) are a solved problem.
>>    * They have been since 2003 at the latest [1] (this is when the types
>> were hacked into Java - the foundation dates back to the 70s or earlier)
>>
>
> Huh - Fahndrich tried to hire me once to work on a project called
> Singularity. Small world. Also note that Sanjay Ghemawat is listed in the
> citations!
>

Umm, I was confusing names. Fahndrich is actually a former coworker at
Google :)


>
>
>>    * Checkerframework is a _pluggable_ system where that nullness type
>> system is a "hello, world" level demo, since 2008 at the latest [2].
>>    * Our users should know this and judge us accordingly.
>>
>> - Checkerframework should be thought of and described as type checking,
>> because it is. It is not heuristic nor approximate.
>> - If your code was unclear about whether something could be null, it was
>> probably unclear to a person reading it also, and very likely to have
>> actual bugs.
>> - APIs that accept a lot of nullable parameters are, generally speaking,
>> bad APIs. They are hard to use correctly, less readable, and very likely to
>> cause actual bugs. You are forcing your users to deal with accidental
>> complexity you left behind.
>>   * Corollary to the above two points: Almost all the time, the changes
>> to clearify nullness make your code better, more readable, easier for users
>> or editors.
>> - It is true that there is a learning curve to programming in this way.
>>
>
> I agree with the above in a closed system. However as mentioned, some of
> the APIs we use suffer from this.
>
> In a previous life, I saw up close an effort to add such analysis to a
> large codebase. Two separate tools called Prefix and Prefast were used (the
> difference between the two is actually quite interesting, but not relevant
> here). However in order to make this analysis useful, there was a massive
> effort to properly annotate the entire codebase, including all libraries
> used. This isn't a perfect example - this was a C++ codebase which is much
> harder to analyze, and these tools identified far more than simply nullness
> errors (resource leaks, array indices, proper string null termination,
> exception behavior, etc.). However the closer we can get to properly
> annotating the transitive closure of our dependencies, the better this
> framework will work.
>
>
>
>> - There are certainly common patterns in Java that do not work very well,
>> and suppression is sometimes the best option.
>>    * Example: JUnit's @Setup and @Test conventions do not work very well
>> and it is not worth the effort to make them work. This is actually because
>> if it were "normal code" it would be bad code. There are complex
>> inter-method dependencies enforced only by convention. This matters:
>> sometimes a JUnit test class is called from another class, used as "normal
>> code". This does go wrong in practice. Plain old JUnit test cases
>> frequently go wrong as well.
>>
>> And here is my opinion when it comes to Beam:
>>
>> - "Community over code" is not an excuse for negligent practices that
>> cause easily avoidable risk to our users. I will be very disappointed if we
>> choose that.
>> - I think having tooling that helps newcomers write better code by
>> default is better for the community too. Just like having automatic
>> formatting is better. Less to haggle about in review, etc.
>> - A simple search reveals about 170 bugs that we know of [4].
>> - So far in almost every module I have fixed I discovered actual new null
>> errors. Many examples at [5].
>> - It is extremely easy to suppress the type checking. Almost all of our
>> classes have it suppressed already (I did this work, to allow existing
>> errors while protecting new code).
>> - Including the annotations in the shipped jars is an important feature.
>> Without this, users cannot write null-safe code themselves.
>>    * Reuven highlighted this: when methods are not annotated, we have to
>> use/implement workarounds. Actually Guava does use checkerframework
>> annotations [6] and the problem is that it requires its *input* to already
>> be non-null so actually you cannot even use it to convert nullable values
>> to non-nullable values.
>>    * Beam has its own [7] that is annotated, actually for yet another
>> reason: when Guava's checkNotNull fails, it throws NPE when it should throw
>> IllegalArgumentException. Guava's checkNotNull should not be used for input
>> validation!
>> - It is unfortunate that IntelliJ inserts a bunch of annotations in user
>> code. I wonder if there is something we can do about that. At the Java
>> level, if they are not on the classpath they should be ignored and not
>> affect users. Coincidentally, the JDK has had NullPointerExceptions in this
>> area :-)  [8].
>>
>> I understand the pain of longer compile times slowing people down. That
>> is actually a problem to be solved which does not require lowering our
>> standards of quality. How about we try moving it to a separate CI job and
>> see how it goes?
>>
>>
>
>> In my experience stories like Reuven's are much more frustrating in a
>> separate CI job because you find out quite late that your code has flaws.
>> Like when spotless fails, but much more work to fix, and would have been
>> prevented long ago if it were integrated into the compile.
>>
>
> I agree with this. I prefer to be able to detect on my computer that there
> are failures, and not have to wait for submission. The complaint was that
> some people are experiencing trouble on their local machine however, so it
> seems reasonable to add an opt-out flag (though I would prefer opt out to
> opt in).
>
>
>>
>> Kenn
>>
>> [1]
>> https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf
>> [2]
>> https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf
>> [3]
>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275
>> [4]
>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)
>> [5] https://github.com/apache/beam/pull/12284 and
>> https://github.com/apache/beam/pull/12162 and
>> [6]
>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878
>> [7]
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java
>> [8] https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174
>>
>>
>> On Mon, Mar 15, 2021 at 2:12 PM Reuven Lax <[email protected]> wrote:
>>
>>> I have some deeper concerns with the null checks. The fact that many
>>> libraries we use (including guava) don't always annotate their methods
>>> forces a lot of workarounds. As a very simple example, the return value
>>> from Preconditions.checkNotNull clearly can never be null, yet the
>>> nullability checks don't know this. This and other similar cases require
>>> constantly adding extra unnecessary null checks in the code just to make
>>> the checker happy. There have been other cases where I haven't been able to
>>> figure out a way to make the checker happy (often these seem to involve
>>> using lambdas), and after 10-15 minutes of investigation have given up and
>>> disabled the check.
>>>
>>> Now you might say that it's worth the extra pain and ugliness of writing
>>> "useless" code to ensure that we have null-safe code. However I think this
>>> ignores a sociological aspect of software development. I have a
>>> higher tolerance than many for this sort of pain, and I'm willing to spend
>>> some time figuring out how to rewrite my code such that it makes the
>>> checker happy (even though often it forced me to write much more awkward
>>> code). However even I have often found myself giving up and just disabling
>>> the check. Many others will have less tolerance than me, and will simply
>>> disable the checks. At that point we'll have a codebase littered with
>>> @SuppressWarnings("nullness"), which doesn't really get us where we want to
>>> be. I've seen similar struggles in other codebases, and generally having a
>>> static checker with too many false positives often ends up being worse than
>>> having no checker.
>>>
>>> On Mon, Mar 15, 2021 at 10:37 AM Ismaël Mejía <[email protected]> wrote:
>>>
>>>> +1
>>>>
>>>> Even if I like the strictness for Null checking, I also think that
>>>> this is adding too much extra time for builds (that I noticed locally
>>>> when enabled) and also I agree with Jan that the annotations are
>>>> really an undesired side effect. For reference when you try to auto
>>>> complete some method signatures on IntelliJ on downstream projects
>>>> with C-A-v it generates some extra Checkers annotations like @NonNull
>>>> and others even if the user isn't using them which is not desirable.
>>>>
>>>>
>>>>
>>>> On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver <[email protected]>
>>>> wrote:
>>>> >>
>>>> >> Big +1 for moving this to separate CI job. I really don't like what
>>>> annotations are currently added to the code we ship. Tools like Idea add
>>>> these annotations to code they generate when overriding classes and that's
>>>> very annoying. Users should not be exposed to internal tools like
>>>> nullability checking.
>>>> >
>>>> >
>>>> > I was only planning on moving this to a separate CI job. The job
>>>> would still be release blocking, so the same annotations would still be
>>>> required.
>>>> >
>>>> > I'm not sure which annotations you are concerned about. There are two
>>>> annotations involved with nullness checking, @SuppressWarnings and
>>>> @Nullable. @SuppressWarnings has retention policy SOURCE, so it shouldn't
>>>> be exposed to users at all. @Nullable is not just for internal tooling, it
>>>> also provides useful information about our APIs to users. The user should
>>>> not have to guess whether a method argument etc. can be null or not, and
>>>> for better or worse, these annotations are the standard way of expressing
>>>> that in Java.
>>>>
>>>

Re: Null checking in Beam

Reply via email to