Re: Null checking in Beam

Jan Lukavský Wed, 17 Mar 2021 02:49:05 -0700

If there is no way to configure which annotations should be generated,then I'd be +1 for removing the checker to separated CI and adding anopt-in flag for the check when run locally.

We need to solve the issue for dev@ as well, as the undesirableannotations are already digging their way to the code base:


git/apache/beam$ git grep UnknownKeyFor

Another strange thing is that it seems, that we are pulling thecheckerframework as a runtime dependency (at least for some submodules).When I run `mvn dependency:tree` on one of my projects that uses maven I see


[INFO] +- com.google.guava:guava:jar:30.1-jre:provided
[INFO] |  +- com.google.guava:failureaccess:jar:1.0.1:provided

[INFO] | +-com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:provided

[INFO] |  +- org.checkerframework:checker-qual:jar:3.5.0:provided

which is fine, but when I add beam-sdks-java-extensions-euphoria itchanges to

[INFO] +-org.apache.beam:beam-sdks-java-extensions-euphoria:jar:2.28.0:compile

[INFO] |  \- org.checkerframework:checker-qual:jar:3.7.0:compile

I'm not a gradle guru, so I cannot tell how this happens, there seems tobe nothing special about euphoria in the gradle.


 Jan

On 3/16/21 7:12 PM, Kenneth Knowles wrote:

I've asked on checkerframework users mailing list if they or any usershave recommendations for the IntelliJ integration issue.

It is worth noting that the default annotations inserted into thebytecode do add value: the @NonNull type annotations are default forcheckerframework but not default for spotbugs. So having the defaultinserted enables downstream users to have betters spotbugs heuristicanalysis. Further, the defaults can be adjusted by users so freezingthem at the values we use them at is valuable in general across all tools.

I think it makes sense to sacrifice these minor value-adds to keepthings simple-looking for IntelliJ users.


Kenn

On Tue, Mar 16, 2021 at 10:05 AM Kenneth Knowles <k...@apache.org<mailto:k...@apache.org>> wrote:


    Seems it is an FAQ with no solution:
    https://checkerframework.org/manual/#faq-classfile-annotations
    <https://checkerframework.org/manual/#faq-classfile-annotations>

    On Tue, Mar 16, 2021 at 10:01 AM Kenneth Knowles <k...@apache.org
    <mailto:k...@apache.org>> wrote:

        Adding -PskipCheckerframework when releasing will solve it for
        users, but not for dev@.

        Making it off by default and a separate CI check that turns it
        on would solve it overall but has the downsides mentioned before.

        It would be very nice to simply have a flag to not insert
        default annotations.

        Kenn

        On Tue, Mar 16, 2021 at 9:37 AM Jan Lukavský <je...@seznam.cz
        <mailto:je...@seznam.cz>> wrote:

            I believe it is not a problem of Idea. At least I didn't
            notice behavior like that with Guava, although Guava uses
            the framework as well. Maybe there is a way to tune which
            annotations should be generated? Or Guava handles that
            somehow differently?

            On 3/16/21 5:22 PM, Reuven Lax wrote:

            I've also been annoyed at IntelliJ autogenenerating all
            these annotations. I believe Kenn said that this was not
            the intention - maybe there's an IntelliJ setting that
            would stop this from happening?

            On Tue, Mar 16, 2021 at 2:14 AM Jan Lukavský
            <je...@seznam.cz <mailto:je...@seznam.cz>> wrote:

                I don't know the details of the checkerframework, but
                there seems a contradiction between what code is
                (currently) generated and what we therefore release
                and what actually the checkerframework states [1]:

                @UnknownKeyFor:

                Used internally by the type system; should never be
                written by a programmer.

                If this annotation is generated for overwritten
                methods, then I'd say, that it means we place a great
                burden to our users - either not using autogenerated
                methods, or erase all the generated annotations
                afterwards. Either way, that is not "polite" from Beam.

                What we should judge is not only a formal purity of
                code, but what stands on the other side is how usable
                APIs we provide. We should not head for 100% pure
                code and sacrificing use comfort. Every check that
                leaks to user code is at a price and we should not
                ignore that. No free lunch.

                From my point of view:

                 a) if a check does not modify the bytecode, it is
                fine and we can use it - we are absolutely free to
                use any tooling we agree on, if our users cannot be
                affected anyhow

                 b) if a tool needs to be leaked to user, it should
                be as small leakage as possible

                 c) if a check significantly affects compile
                performance, it should be possible to opt-out

                I think that our current setup violates all these
                three points.

                Moving the check to different CI is a possibility
                (a)), it would then require opt-in flag to be able to
                run the check locally. It would also stop the leakage
                (if we would release code without this check).

                If we want to keep some annotations for user's
                benefit (which might be fine), it should be really
                limited to the bare minimum (e.g. only @Nullable for
                method arguments and return values, possibly more, I
                don't know if and how we can configure that).
                Definitely not @UnknownKeyFor, that is simply
                internal to the checker. We should then have opt-out
                flag for local development before committing changes.

                 Jan

                [1]
                
https://checkerframework.org/api/org/checkerframework/checker/nullness/qual/UnknownKeyFor.html
                
<https://checkerframework.org/api/org/checkerframework/checker/nullness/qual/UnknownKeyFor.html>

                On 3/16/21 8:33 AM, Reuven Lax wrote:



                On Mon, Mar 15, 2021 at 11:42 PM Reuven Lax
                <re...@google.com <mailto:re...@google.com>> wrote:



                    On Mon, Mar 15, 2021 at 9:12 PM Kenneth Knowles
                    <k...@apache.org <mailto:k...@apache.org>> wrote:

                        I will be blunt about my opinions about the
                        general issue:

                        - NullPointerExceptions (and similar) are a
                        solved problem.
                           * They have been since 2003 at the latest
                        [1] (this is when the types were hacked into
                        Java - the foundation dates back to the 70s
                        or earlier)


                    Huh - Fahndrich tried to hire me once to work on
                    a project called Singularity. Small world. Also
                    note that Sanjay Ghemawat is listed in the
                    citations!


                Umm, I was confusing names. Fahndrich is actually a
                former coworker at Google :)

                           * Checkerframework is a _pluggable_
                        system where that nullness type system is a
                        "hello, world" level demo, since 2008 at the
                        latest [2].
                           * Our users should know this and judge us
                        accordingly.

                        - Checkerframework should be thought of and
                        described as type checking, because it is.
                        It is not heuristic nor approximate.
                        - If your code was unclear about whether
                        something could be null, it was probably
                        unclear to a person reading it also, and
                        very likely to have actual bugs.
                        - APIs that accept a lot of nullable
                        parameters are, generally speaking, bad
                        APIs. They are hard to use correctly, less
                        readable, and very likely to cause actual
                        bugs. You are forcing your users to deal
                        with accidental complexity you left behind.
                          * Corollary to the above two points:
                        Almost all the time, the changes to clearify
                        nullness make your code better, more
                        readable, easier for users or editors.
                        - It is true that there is a learning curve
                        to programming in this way.


                    I agree with the above in a closed system.
                    However as mentioned, some of the APIs we use
                    suffer from this.

                    In a previous life, I saw up close an effort to
                    add such analysis to a large codebase. Two
                    separate tools called Prefix and Prefast were
                    used (the difference between the two is actually
                    quite interesting, but not relevant here).
                    However in order to make this analysis useful,
                    there was a massive effort to properly annotate
                    the entire codebase, including all libraries
                    used. This isn't a perfect example - this was a
                    C++ codebase which is much harder to analyze,
                    and these tools identified far more than simply
                    nullness errors (resource leaks, array indices,
                    proper string null termination, exception
                    behavior, etc.). However the closer we can get
                    to properly annotating the transitive closure of
                    our dependencies, the better this framework will
                    work.

                        - There are certainly common patterns in
                        Java that do not work very well, and
                        suppression is sometimes the best option.
                           * Example: JUnit's @Setup and @Test
                        conventions do not work very well and it is
                        not worth the effort to make them work. This
                        is actually because if it were "normal code"
                        it would be bad code. There are complex
                        inter-method dependencies enforced only by
                        convention. This matters: sometimes a JUnit
                        test class is called from another class,
                        used as "normal code". This does go wrong in
                        practice. Plain old JUnit test cases
                        frequently go wrong as well.

                        And here is my opinion when it comes to Beam:

                        - "Community over code" is not an excuse for
                        negligent practices that cause easily
                        avoidable risk to our users. I will be very
                        disappointed if we choose that.
                        - I think having tooling that helps
                        newcomers write better code by default is
                        better for the community too. Just like
                        having automatic formatting is better. Less
                        to haggle about in review, etc.
                        - A simple search reveals about 170 bugs
                        that we know of [4].
                        - So far in almost every module I have fixed
                        I discovered actual new null errors. Many
                        examples at [5].
                        - It is extremely easy to suppress the type
                        checking. Almost all of our classes have it
                        suppressed already (I did this work, to
                        allow existing errors while protecting new
                        code).
                        - Including the annotations in the shipped
                        jars is an important feature. Without this,
                        users cannot write null-safe code themselves.
                           * Reuven highlighted this: when methods
                        are not annotated, we have to use/implement
                        workarounds. Actually Guava does use
                        checkerframework annotations [6] and the
                        problem is that it requires its *input* to
                        already be non-null so actually you cannot
                        even use it to convert nullable values to
                        non-nullable values.
                           * Beam has its own [7] that is annotated,
                        actually for yet another reason: when
                        Guava's checkNotNull fails, it throws NPE
                        when it should throw
                        IllegalArgumentException. Guava's
                        checkNotNull should not be used for input
                        validation!
                        - It is unfortunate that IntelliJ inserts a
                        bunch of annotations in user code. I wonder
                        if there is something we can do about that.
                        At the Java level, if they are not on the
                        classpath they should be ignored and not
                        affect users. Coincidentally, the JDK has
                        had NullPointerExceptions in this area :-) [8].

                        I understand the pain of longer compile
                        times slowing people down. That is actually
                        a problem to be solved which does not
                        require lowering our standards of quality.
                        How about we try moving it to a separate CI
                        job and see how it goes?

                        In my experience stories like Reuven's are
                        much more frustrating in a separate CI job
                        because you find out quite late that your
                        code has flaws. Like when spotless fails,
                        but much more work to fix, and would have
                        been prevented long ago if it were
                        integrated into the compile.


                    I agree with this. I prefer to be able to detect
                    on my computer that there are failures, and not
                    have to wait for submission. The complaint was
                    that some people are experiencing trouble on
                    their local machine however, so it seems
                    reasonable to add an opt-out flag (though I
                    would prefer opt out to opt in).


                        Kenn

                        [1]
                        
https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf
                        
<https://web.eecs.umich.edu/~bchandra/courses/papers/Fahndrich_NonNull.pdf>
                        [2]
                        
https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf
                        
<https://homes.cs.washington.edu/~mernst/pubs/pluggable-checkers-issta2008.pdf>
                        [3]
                        
https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275
                        
<https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/pom.xml#L275>
                        [4]
                        
https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)
                        
<https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20(summary%20~%20%22NPE%22%20OR%20summary%20~%20%22NullPointerException%22%20OR%20description%20~%20%22NPE%22%20OR%20description%20~%20%22NullPointerException%22%20OR%20comment%20~%20%22NPE%22%20OR%20comment%20~%20%22NullPointerException%22)>
                        [5]
                        https://github.com/apache/beam/pull/12284
                        <https://github.com/apache/beam/pull/12284>
                        and
                        https://github.com/apache/beam/pull/12162
                        <https://github.com/apache/beam/pull/12162> and
                        [6]
                        
https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878
                        
<https://github.com/google/guava/blob/fe3fda0ca54076a2268d060725e9a6e26f867a5e/guava/src/com/google/common/base/Preconditions.java#L878>
                        [7]
                        
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java
                        
<https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/Preconditions.java>
                        [8]
                        
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174
                        
<https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152174>


                        On Mon, Mar 15, 2021 at 2:12 PM Reuven Lax
                        <re...@google.com <mailto:re...@google.com>>
                        wrote:

                            I have some deeper concerns with the
                            null checks. The fact that many
                            libraries we use (including guava) don't
                            always annotate their methods forces a
                            lot of workarounds. As a very simple
                            example, the return value from
                            Preconditions.checkNotNull clearly can
                            never be null, yet the nullability
                            checks don't know this. This and other
                            similar cases require constantly adding
                            extra unnecessary null checks in the
                            code just to make the checker happy.
                            There have been other cases where I
                            haven't been able to figure out a way to
                            make the checker happy (often these seem
                            to involve using lambdas), and after
                            10-15 minutes of investigation have
                            given up and disabled the check.

                            Now you might say that it's worth the
                            extra pain and ugliness of writing
                            "useless" code to ensure that we have
                            null-safe code. However I think this
                            ignores a sociological aspect of
                            software development. I have a
                            higher tolerance than many for this sort
                            of pain, and I'm willing to spend some
                            time figuring out how to rewrite my code
                            such that it makes the checker happy
                            (even though often it forced me to write
                            much more awkward code). However even I
                            have often found myself giving up and
                            just disabling the check. Many others
                            will have less tolerance than me, and
                            will simply disable the checks. At that
                            point we'll have a codebase littered
                            with @SuppressWarnings("nullness"),
                            which doesn't really get us where we
                            want to be. I've seen similar struggles
                            in other codebases, and generally having
                            a static checker with too many false
                            positives often ends up being worse than
                            having no checker.

                            On Mon, Mar 15, 2021 at 10:37 AM Ismaël
                            Mejía <ieme...@gmail.com
                            <mailto:ieme...@gmail.com>> wrote:

                                +1

                                Even if I like the strictness for
                                Null checking, I also think that
                                this is adding too much extra time
                                for builds (that I noticed locally
                                when enabled) and also I agree with
                                Jan that the annotations are
                                really an undesired side effect. For
                                reference when you try to auto
                                complete some method signatures on
                                IntelliJ on downstream projects
                                with C-A-v it generates some extra
                                Checkers annotations like @NonNull
                                and others even if the user isn't
                                using them which is not desirable.



                                On Mon, Mar 15, 2021 at 6:04 PM Kyle
                                Weaver <kcwea...@google.com
                                <mailto:kcwea...@google.com>> wrote:
                                >>
                                >> Big +1 for moving this to
                                separate CI job. I really don't like
                                what annotations are currently added
                                to the code we ship. Tools like Idea
                                add these annotations to code they
                                generate when overriding classes and
                                that's very annoying. Users should
                                not be exposed to internal tools
                                like nullability checking.
                                >
                                >
                                > I was only planning on moving this
                                to a separate CI job. The job would
                                still be release blocking, so the
                                same annotations would still be
                                required.
                                >
                                > I'm not sure which annotations you
                                are concerned about. There are two
                                annotations involved with nullness
                                checking, @SuppressWarnings and
                                @Nullable. @SuppressWarnings has
                                retention policy SOURCE, so it
                                shouldn't be exposed to users at
                                all. @Nullable is not just for
                                internal tooling, it also provides
                                useful information about our APIs to
                                users. The user should not have to
                                guess whether a method argument etc.
                                can be null or not, and for better
                                or worse, these annotations are the
                                standard way of expressing that in Java.

Re: Null checking in Beam

Reply via email to