Kenn can you adjust the script to match only source code files: `--include \*.java --include \*.py --include \*.go` otherwise it produces a lot of extra false positives due to html files and cache files. Also can we extract the full annotation as a column so we can filter/group for the full kind (type) of the experimental annotation e.g. @Experimental(Kind.SCHEMAS), @Experimental(Kind.SOURCE_SINK), etc.
This way we can group occurrences per kind and quickly triage some of them which are still clearly still experimental (and with ongoing independent stabilization efforts [1]) like these: @Experimental(Kind.SCHEMAS) @Experimental(Kind.SPLITTABLE_DO_FN) @Experimental(Kind.PORTABILITY) (and probably @Experimental(Kind.CONTEXTFUL) I have been going in the last weeks adjusting the Experimental annotations to follow the @Experimental(Kind.FOO) pattern thinking about this future triage so good to see the effort may pay :) As part of this work one idea we agreed with Luke Cwik was to remove the Experimental annotations from ‘runners/core*’ because historically Beam has not had strong compatibility guarantees for users of these APIs (runner authors). It is probably worth to re run the script against the latest master because results in the spreadsheet do not correspond with the current master. (Note that the remaining External class is still tagged as Experimental because it is still pending to move it into ‘sdks/java/core’). Not related to Experimental but worth mentioning is that we also started tagging: sdks/java/core/src/main/java/org/apache/beam/sdk/util/* sdks/java/core/src/main/java/org/apache/beam/sdk/testing/* as @Internal for the same reasons, classes in both packages are basically for Internal use on Beam SDK Harness, for runner authors and for tests. And pipeline authors should not be relying on their stability. We also introduced package level Experimental annotations (package-info.java) so this can easily count for 50 duplicates that should probably be trimmed for the same person who is covering the corresponding files in the package. With all these adjustments we will be easily below 250 matches. Regards, Ismaël [1] https://lists.apache.org/thread.html/r73d3b19506ea435ee6be568ccc32065e36cd873dbbcf2a3e9049254e%40%3Cdev.beam.apache.org%3E On Fri, Mar 6, 2020 at 11:54 PM Kenneth Knowles <k...@apache.org> wrote: > > OK I tried to make a tiny bit of progress on this, with `grep --ignore-case > --line-number --recursive '@experimental' .` there are 578 occurrences > (includes website and comments). Via `| cut -d ':' -f 1 | sort | uniq | wc > -l` there are 377 distinct code files. > > So that's a big project but easily scales to the contributors. I suggest we > need to crowdsource a bit. > > I created > https://docs.google.com/spreadsheets/d/1T98I7tFoUgwW2tegS5xbNRjaVDvZiBBLn7jg0Ef_IwU/edit?usp=sharing > where you can suggest/comment adding your name to a file to volunteer to own > going through the file. > > I have not checked git history to try to find owners. > > Kenn > > On Mon, Dec 2, 2019 at 10:26 AM Alexey Romanenko <aromanenko....@gmail.com> > wrote: >> >> Thank you Kenn for starting this discussion. >> >> As I see, for now, the main goal for “@Experimental" annotation is to relive >> and be useful in the sense as it’s name says (this is obviously not a case >> for the moment). I'd suggest a bit more simplified scenario for this: >> >> 1. We do a revision of all “@Experimental" annotation uses now. For the code >> (IOs/libs/etc) that we 100% know that has been used in production for a long >> time with current stable API, we just take this annotation away since it’s >> no needed anymore. >> >> 2. For the code, that is left after p.1, we leave as “@Experimental”, wait >> for N releases (N=3 ?) and then take it away if there are no breaking >> changes happened. We may want to add new argument for “@Experimental” to >> keep track release number when it was added. >> >> 3. We would need to have a regular “Experimental annotation report” (like we >> have for dependencies) sending to dev@ and it will allow us to track new and >> out-dated annotation. >> >> 4. And on course we update contributors documentation about that. >> >> Idea of graduation by voting seems a bit complicated - for me it means that >> all added new user APIs should go through this process and I’m afraid that >> in the end, we potentially can be overwhelmed with number of such polls. I >> think that several releases of maturation and final decision of the >> person(2) responsible for the component should be enough. >> >> In the same time, I like the Andrew’s idea about checking a breaking changes >> through external tool. So, it could guarantee us to to remove experimental >> state without any fear to break API. >> >> In case of breaking changes of stable API, that won’t be possible to avoid, >> we still can use @Deprecated and wait for 3 release to remove (as we already >> did before). So, having up-to-date @Experimental and @Deprecated >> annotations won’t be confusing for users. >> >> >> >> >> >> On 28 Nov 2019, at 04:48, Kenneth Knowles <k...@apache.org> wrote: >> >> >> >> On Wed, Nov 27, 2019 at 1:04 PM Elliotte Rusty Harold <elh...@ibiblio.org> >> wrote: >>> >>> On Wed, Nov 27, 2019 at 1:12 PM Kenneth Knowles <k...@apache.org> wrote: >>> > >>> >>> > *Opt-in*: This is a powerful idea that I think changes everything. >>> > - for an experimental new IO, a separate artifact; this way we can >>> > also see downloads >>> > - for experimental code fragments, add checkState that the relevant >>> > experiment is turned on via flags >>> >>> To be clear the experimental artifact would have the same group ID and >>> artifact ID but a different version than the non-experimental >>> artifacts? E.g. >>> org.apache.beam:beam-runners-gcp-gcemd:2.4.0-experimental >>> >>> That could work. Changing the artifact ID or the package name would >>> risk split package issues and diamond dependency problems. We'd still >>> need to be careful about mixing experimental and non-experimental >>> artifacts. >> >> >> That's clever! I think using the classifier might be better than a modified >> version number, e.g. org.apache.beam:beam-io-mydb:2.4.0:experimental >> >> My prior idea was much less clever: for any version 2.X there would either >> be beam-io-mydb-experimental or beam-io-mydb (after graduation) so no >> problem with a split package. There would be no "same artifact id" concern. >> >> Your idea would allow us to ship two variants of the library, if we >> developed the tooling for it. I think doing the stripping of experimental >> bits and ensuring they both compile might be tricky unless we are stripping >> rather disjoint piece of the library. >> >> Kenn >> >>