Re: Null checking in Beam

Alexey Romanenko Wed, 31 Mar 2021 08:40:08 -0700

I believe we should update a contribution doc on this, since now it’s confusing 
 - you run “./gradlew :your:task:check” and it’s fine but then it fails on CI 
because of some missed Nullable checks.


Also, I noticed that “checkNotNull(yourVar)” or "checkState(yourVar != null)” 
won’t help, it will still complain. Only explicit “if (yourVar != null)” works. 
Is it correct or do I miss something?

> On 24 Mar 2021, at 01:24, Kyle Weaver <kcwea...@google.com> wrote:
> 
> I moved the checker framework from opt-out to opt-in [1]. You can opt in by 
> passing "-PenableCheckerFramework=true" to the Gradle invocation. The checker 
> still runs on all Jenkins CI builds.
> 
> [1] https://github.com/apache/beam/pull/14301 
> <https://github.com/apache/beam/pull/14301>
> On Wed, Mar 17, 2021 at 8:50 AM Kenneth Knowles <k...@apache.org 
> <mailto:k...@apache.org>> wrote:
> 
> 
> On Wed, Mar 17, 2021 at 2:49 AM Jan Lukavský <je...@seznam.cz 
> <mailto:je...@seznam.cz>> wrote:
> If there is no way to configure which annotations should be generated, then 
> I'd be +1 for removing the checker to separated CI and adding an opt-in flag 
> for the check when run locally.
> 
> Yes. If this answer is correct, then we are out of luck: 
> https://stackoverflow.com/questions/57929137/disable-nonnull-annotation-on-implement-methods
>  
> <https://stackoverflow.com/questions/57929137/disable-nonnull-annotation-on-implement-methods>
> 
> A good followup to moving it to a separate CI job would be to attach a full 
> type checking run to the `:check` gradle task. This should be the best place 
> to attach all of our extended checks like spotbugs, spotless, checkstyle, 
> checkerframework. I rarely run `:check` myself (I think it is currently 
> broken in some ways) but it may be good to start.
> 
> BTW the slowdown we need to solve is local builds, not CI runs. When it was 
> added the slowdown was almost completely prevented by caching. And now that 
> we disable it for test classes (where for some reason it was extra slow) I 
> wonder if it will speed up the main CI runs at all. So I would say the first 
> thing to do is to just disable it by default but enable it in the Jenkins job 
> builders.
> 
> Kenn 
> 
> We need to solve the issue for dev@ as well, as the undesirable annotations 
> are already digging their way to the code base:
> 
> git/apache/beam$ git grep UnknownKeyFor
> 
> Another strange thing is that it seems, that we are pulling the 
> checkerframework as a runtime dependency (at least for some submodules). When 
> I run `mvn dependency:tree` on one of my projects that uses maven I see
> 
> [INFO] +- com.google.guava:guava:jar:30.1-jre:provided
> [INFO] |  +- com.google.guava:failureaccess:jar:1.0.1:provided
> [INFO] |  +- 
> com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:provided
> [INFO] |  +- org.checkerframework:checker-qual:jar:3.5.0:provided
> 
> which is fine, but when I add beam-sdks-java-extensions-euphoria it changes to
> 
> [INFO] +- 
> org.apache.beam:beam-sdks-java-extensions-euphoria:jar:2.28.0:compile
> [INFO] |  \- org.checkerframework:checker-qual:jar:3.7.0:compile
> 
> I'm not a gradle guru, so I cannot tell how this happens, there seems to be 
> nothing special about euphoria in the gradle.
> 
>  Jan
> 
> On 3/16/21 7:12 PM, Kenneth Knowles wrote:
>> I've asked on checkerframework users mailing list if they or any users have 
>> recommendations for the IntelliJ integration issue.
>> 
>> It is worth noting that the default annotations inserted into the bytecode 
>> do add value: the @NonNull type annotations are default for checkerframework 
>> but not default for spotbugs. So having the default inserted enables 
>> downstream users to have betters spotbugs heuristic analysis. Further, the 
>> defaults can be adjusted by users so freezing them at the values we use them 
>> at is valuable in general across all tools.
>> 
>> I think it makes sense to sacrifice these minor value-adds to keep things 
>> simple-looking for IntelliJ users.
>> 
>> Kenn
>> 
>> On Tue, Mar 16, 2021 at 10:05 AM Kenneth Knowles <k...@apache.org 
>> <mailto:k...@apache.org>> wrote:
>> Seems it is an FAQ with no solution: 
>> https://checkerframework.org/manual/#faq-classfile-annotations 
>> <https://checkerframework.org/manual/#faq-classfile-annotations>
>> On Tue, Mar 16, 2021 at 10:01 AM Kenneth Knowles <k...@apache.org 
>> <mailto:k...@apache.org>> wrote:
>> Adding -PskipCheckerframework when releasing will solve it for users, but 
>> not for dev@.
>> 
>> Making it off by default and a separate CI check that turns it on would 
>> solve it overall but has the downsides mentioned before.
>> 
>> It would be very nice to simply have a flag to not insert default 
>> annotations.
>> 
>> Kenn
>> 
>> On Tue, Mar 16, 2021 at 9:37 AM Jan Lukavský <je...@seznam.cz 
>> <mailto:je...@seznam.cz>> wrote:
>> I believe it is not a problem of Idea. At least I didn't notice behavior 
>> like that with Guava, although Guava uses the framework as well. Maybe there 
>> is a way to tune which annotations should be generated? Or Guava handles 
>> that somehow differently?
>> 
>> On 3/16/21 5:22 PM, Reuven Lax wrote:
>>> I've also been annoyed at IntelliJ autogenenerating all these annotations. 
>>> I believe Kenn said that this was not the intention - maybe there's an 
>>> IntelliJ setting that would stop this from happening?
>>> 
>>> On Tue, Mar 16, 2021 at 2:14 AM Jan Lukavský <je...@seznam.cz 
>>> <mailto:je...@seznam.cz>> wrote:
>>> I don't know the details of the checkerframework, but there seems a 
>>> contradiction between what code is (currently) generated and what we 
>>> therefore release and what actually the checkerframework states [1]:
>>> 
>>> @UnknownKeyFor:
>>> 
>>> Used internally by the type system; should never be written by a 
>>> programmer. 
>>> 
>>> If this annotation is generated for overwritten methods, then I'd say, that 
>>> it means we place a great burden to our users - either not using 
>>> autogenerated methods, or erase all the generated annotations afterwards. 
>>> Either way, that is not "polite" from Beam.
>>> 
>>> What we should judge is not only a formal purity of code, but what stands 
>>> on the other side is how usable APIs we provide. We should not head for 
>>> 100% pure code and sacrificing use comfort. Every check that leaks to user 
>>> code is at a price and we should not ignore that. No free lunch.
>>> 
>>> From my point of view:
>>> 
>>>  a) if a check does not modify the bytecode, it is fine and we can use it - 
>>> we are absolutely free to use any tooling we agree on, if our users cannot 
>>> be affected anyhow
>>> 
>>>  b) if a tool needs to be leaked to user, it should be as small leakage as 
>>> possible
>>> 
>>>  c) if a check significantly affects compile performance, it should be 
>>> possible to opt-out
>>> 
>>> I think that our current setup violates all these three points.
>>> 
>>> Moving the check to different CI is a possibility (a)), it would then 
>>> require opt-in flag to be able to run the check locally. It would also stop 
>>> the leakage (if we would release code without this check).
>>> 
>>> If we want to keep some annotations for user's benefit (which might be 
>>> fine), it should be really limited to the bare minimum (e.g. only @Nullable 
>>> for method arguments and return values, possibly more, I don't know if and 
>>> how we can configure that). Definitely not @UnknownKeyFor, that is simply 
>>> internal to the checker. We should then have opt-out flag for local 
>>> development before committing changes.
>>> 
>>>  Jan
>>> 
>>> [1] 
>>> https://checkerframework.org/api/org/checkerframework/checker/nullness/qual/UnknownKeyFor.html
>>>  
>>> <https://checkerframework.org/api/org/checkerframework/checker/nullness/qual/UnknownKeyFor.html>
>>> On 3/16/21 8:33 AM, Reuven Lax wrote:
>>>> 
>>>> 
>>>> On Mon, Mar 15, 2021 at 11:42 PM Reuven Lax <re...@google.com 
>>>> <mailto:re...@google.com>> wrote:
>>>> 
>>>> 
>>>> On Mon, Mar 15, 2021 at 9:12 PM Kenneth Knowles <k...@apache.org 
>>>> <mailto:k...@apache.org>> wrote:
>>>> I will be blunt about my opinions about the general issue:
>>>> 
>>>> - NullPointerExceptions (and similar) are a solved problem.
>>>>    * They have been since 2003 at the latest [1] (this is when the types 
>>>> were hacked into Java - the foundation dates back to the 70s or earlier)
>>>> 
>>>> Huh - Fahndrich tried to hire me once to work on a project called 
>>>> Singularity. Small world. Also note that Sanjay Ghemawat is listed in the 
>>>> citations!
>>>> 
>>>> Umm, I was confusing names. Fahndrich is actually a former coworker at 
>>>> Google :)
>>>>  
>>>>  
>>>>    * Checkerframework is a _pluggable_ system where that nullness type 
>>>> system is a "hello, world" level demo, since 2008 at the latest [2].
>>>>    * Our users should know this and judge us accordingly.
>>>> 
>>>> - Checkerframework should be thought of and described as type checking, 
>>>> because it is. It is not heuristic nor approximate.
>>>> - If your code was unclear about whether something could be null, it was 
>>>> probably unclear to a person reading it also, and very likely to have 
>>>> actual bugs.
>>>> - APIs that accept a lot of nullable parameters are, generally speaking, 
>>>> bad APIs. They are hard to use correctly, less readable, and very likely 
>>>> to cause actual bugs. You are forcing your users to deal with accidental 
>>>> complexity you left behind.
>>>>   * Corollary to the above two points: Almost all the time, the changes to 
>>>> clearify nullness make your code better, more readable, easier for users 
>>>> or editors.
>>>> - It is true that there is a learning curve to programming in this way.
>>>> 
>>>> I agree with the above in a closed system. However as mentioned, some of 
>>>> the APIs we use suffer from this.
>>>> 
>>>> In a previous life, I saw up close an effort to add such analysis to a 
>>>> large codebase. Two separate tools called Prefix and Prefast were used 
>>>> (the difference between the two is actually quite interesting, but not 
>>>> relevant here). However in order to make this analysis useful, there was a 
>>>> massive effort to properly annotate the entire codebase, including all 
>>>> libraries used. This isn't a perfect example - this was a C++ codebase 
>>>> which is much harder to analyze, and these tools identified far more than 
>>>> simply nullness errors (resource leaks, array indices, proper string null 
>>>> termination, exception behavior, etc.). However the closer we can get to 
>>>> properly annotating the transitive closure of our dependencies, the better 
>>>> this framework will work.
>>>> 
>>>>  
>>>> - There are certainly common patterns in Java that do not work very well, 
>>>> and suppression is sometimes the best option.
>>>>    * Example: JUnit's @Setup and @Test conventions do not work very well 
>>>> and it is not worth the effort to make them work. This is actually because 
>>>> if it were "normal code" it would be bad code. There are complex 
>>>> inter-method dependencies enforced only by convention. This matters: 
>>>> sometimes a JUnit test class is called from another class, used as "normal 
>>>> code". This does go wrong in practice. Plain old JUnit test cases 
>>>> frequently go wrong as well.
>>>> 
>>>> And here is my opinion when it comes to Beam:
>>>> 
>>>> - "Community over code" is not an excuse for negligent practices that 
>>>> cause easily avoidable risk to our users. I will be very disappointed if 
>>>> we choose that.
>>>> - I think having tooling that helps newcomers write better code by default 
>>>> is better for the community too. Just like having automatic formatting is 
>>>> better. Less to haggle about in review, etc.
>>>> - A simple search reveals about 170 bugs that we know of [4].
>>>> - So far in almost every module I have fixed I discovered actual new null 
>>>> errors. Many examples at [5].
>>>> - It is extremely easy to suppress the type checking. Almost all of our 
>>>> classes have it suppressed already (I did this work, to allow existing 
>>>> errors while protecting new code).
>>>> - Including the annotations in the shipped jars is an important feature. 
>>>> Without this, users cannot write null-safe code themselves.
>>>>    * Reuven highlighted this: when methods are not annotated, we have to 
>>>> use/implement workarounds. Actually Guava does use checkerframework 
>>>> annotations [6] and the problem is that it requires its *input* to already 
>>>> be non-null so actually you cannot even use it to convert nullable values 
>>>> to non-nullable values.
>>>>    * Beam has its own [7] that is annotated, actually for yet another 
>>>> reason: when Guava's checkNotNull fails, it throws NPE when it should 
>>>> throw IllegalArgumentException. Guava's checkNotNull should not be used 
>>>> for input validation!
>>>> - It is unfortunate that IntelliJ inserts a bunch of annotations in user 
>>>> code. I wonder if there is something we can do about that. At the Java 
>>>> level, if they are not on the classpath they should be ignored and not 
>>>> affect users. Coincidentally, the JDK has had NullPointerExceptions in 
>>>> this area :-)  [8].
>>>> 
>>>> I understand the pain of longer compile times slowing people down. That is 
>>>> actually a problem to be solved which does not require lowering our 
>>>> standards of quality. How about we try moving it to a separate CI job and 
>>>> see how it goes?
>>>> 
>>>>  
>>>> In my experience stories like Reuven's are much more frustrating in a 
>>>> separate CI job because you find out quite late that your code has flaws. 
>>>> Like when spotless fails, but much more work to fix, and would have been 
>>>> prevented long ago if it were integrated into the compile.
>>>> 
>>>> I agree with this. I prefer to be able to detect on my computer that there 
>>>> are failures, and not have to wait for submission. The complaint was that 
>>>> some people are experiencing trouble on their local machine however, so it 
>>>> seems reasonable to add an opt-out flag (though I would prefer opt out to 
>>>> opt in).
>>>>  
>>>> 
>>>> Kenn
>>>> 
>>>> [1] 
>>>> https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf 
>>>> <https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf>
>>>> [2] 
>>>> https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf
>>>>  
>>>> <https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf>
>>>> [3] 
>>>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275
>>>>  
>>>> <https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275>
>>>> [4] 
>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)
>>>>  
>>>> <https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)>
>>>> [5] https://github.com/apache/beam/pull/12284 
>>>> <https://github.com/apache/beam/pull/12284> and 
>>>> https://github.com/apache/beam/pull/12162 
>>>> <https://github.com/apache/beam/pull/12162> and 
>>>> [6] 
>>>> https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878
>>>>  
>>>> <https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878>
>>>> [7] 
>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java
>>>>  
>>>> <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java>
>>>> [8] https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174 
>>>> <https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174>
>>>> 
>>>> 
>>>> On Mon, Mar 15, 2021 at 2:12 PM Reuven Lax <re...@google.com 
>>>> <mailto:re...@google.com>> wrote:
>>>> I have some deeper concerns with the null checks. The fact that many 
>>>> libraries we use (including guava) don't always annotate their methods 
>>>> forces a lot of workarounds. As a very simple example, the return value 
>>>> from Preconditions.checkNotNull clearly can never be null, yet the 
>>>> nullability checks don't know this. This and other similar cases require 
>>>> constantly adding extra unnecessary null checks in the code just to make 
>>>> the checker happy. There have been other cases where I haven't been able 
>>>> to figure out a way to make the checker happy (often these seem to involve 
>>>> using lambdas), and after 10-15 minutes of investigation have given up and 
>>>> disabled the check.
>>>> 
>>>> Now you might say that it's worth the extra pain and ugliness of writing 
>>>> "useless" code to ensure that we have null-safe code. However I think this 
>>>> ignores a sociological aspect of software development. I have a higher 
>>>> tolerance than many for this sort of pain, and I'm willing to spend some 
>>>> time figuring out how to rewrite my code such that it makes the checker 
>>>> happy (even though often it forced me to write much more awkward code). 
>>>> However even I have often found myself giving up and just disabling the 
>>>> check. Many others will have less tolerance than me, and will simply 
>>>> disable the checks. At that point we'll have a codebase littered with 
>>>> @SuppressWarnings("nullness"), which doesn't really get us where we want 
>>>> to be. I've seen similar struggles in other codebases, and generally 
>>>> having a static checker with too many false positives often ends up being 
>>>> worse than having no checker.
>>>> 
>>>> On Mon, Mar 15, 2021 at 10:37 AM Ismaël Mejía <ieme...@gmail.com 
>>>> <mailto:ieme...@gmail.com>> wrote:
>>>> +1
>>>> 
>>>> Even if I like the strictness for Null checking, I also think that
>>>> this is adding too much extra time for builds (that I noticed locally
>>>> when enabled) and also I agree with Jan that the annotations are
>>>> really an undesired side effect. For reference when you try to auto
>>>> complete some method signatures on IntelliJ on downstream projects
>>>> with C-A-v it generates some extra Checkers annotations like @NonNull
>>>> and others even if the user isn't using them which is not desirable.
>>>> 
>>>> 
>>>> 
>>>> On Mon, Mar 15, 2021 at 6:04 PM Kyle Weaver <kcwea...@google.com 
>>>> <mailto:kcwea...@google.com>> wrote:
>>>> >>
>>>> >> Big +1 for moving this to separate CI job. I really don't like what 
>>>> >> annotations are currently added to the code we ship. Tools like Idea 
>>>> >> add these annotations to code they generate when overriding classes and 
>>>> >> that's very annoying. Users should not be exposed to internal tools 
>>>> >> like nullability checking.
>>>> >
>>>> >
>>>> > I was only planning on moving this to a separate CI job. The job would 
>>>> > still be release blocking, so the same annotations would still be 
>>>> > required.
>>>> >
>>>> > I'm not sure which annotations you are concerned about. There are two 
>>>> > annotations involved with nullness checking, @SuppressWarnings and 
>>>> > @Nullable. @SuppressWarnings has retention policy SOURCE, so it 
>>>> > shouldn't be exposed to users at all. @Nullable is not just for internal 
>>>> > tooling, it also provides useful information about our APIs to users. 
>>>> > The user should not have to guess whether a method argument etc. can be 
>>>> > null or not, and for better or worse, these annotations are the standard 
>>>> > way of expressing that in Java.

Re: Null checking in Beam

Reply via email to