Hi Ingo, Thanks for driving this discussion. Some use cases come to my mind, maybe those rules could be checked by the same way. 1. new introduced `StreamExecNode` is implemented json serialization/deserialization. Currently it is checked in `JsonSerdeCoverageTest`. 2. new introduced `RelNode` could be covered by all `MetadataHandler`s. Currently this rule not exists yet.
Best regards, JING ZHANG Ingo Bürk <i...@ververica.com> 于2021年9月9日周四 下午6:49写道: > Great! I'll work on getting the PR into an actual, proper shape now, > including looking at found violations more carefully and eventually > freezing current violations (maybe removing some quick-wins). > > One more thing I just ran into is that ArchUnit doesn't explicitly support > Scala; while many things just work (since it's still byte code), > Scala-specific concepts like traits seem to cause issues. I'll have to > exclude Scala code from the checks for now, I think. > > > Ingo > > On Tue, Sep 7, 2021 at 5:03 PM Chesnay Schepler <ches...@apache.org> > wrote: > > > I would say that's fine time-wise. > > > > On 07/09/2021 15:29, Ingo Bürk wrote: > > > Thanks, Chesnay. I updated the PR to use a separate module now, and ran > > it > > > on a few modules (some Table API modules and a couple connectors). The > CI > > > seemed to take ~2.5min for executing the tests; that's certainly not > > > negligible. On the other hand, even the few tests implemented already > > found > > > several violations ("several" is an understatement, but I manually > > verified > > > some of them, not all of them). > > > > > > On Mon, Sep 6, 2021 at 3:44 PM Chesnay Schepler <ches...@apache.org> > > wrote: > > > > > >> While flink-tests is currently the best choice in that it has the > > >> biggest classpath, it is also the module already requiring the most > time > > >> on CI. > > >> > > >> Furthermore, given that we ideally cover all APIs (including > connectors > > >> & formats), having that mess of dependencies in flink-tests may > > >> interfere with existing / future tests. > > >> > > >> As such I would prefer a separate module, as annoying as that may be. > > >> > > >> On 06/09/2021 15:26, Ingo Bürk wrote: > > >>> I just quickly chatted with the author/maintainer of ArchUnit, and a > > >> module > > >>> which depends on every module that should be tested seems to be the > > best > > >>> solution. How do you feel about using flink-tests for this vs. > having a > > >>> separate module for this purpose? > > >>> > > >>> > > >>> Ingo > > >>> > > >>> On Mon, Sep 6, 2021 at 3:04 PM Ingo Bürk <i...@ververica.com> wrote: > > >>> > > >>>> Hi Chesnay, > > >>>> > > >>>> Those are all great questions, and I want to tackle those as well. > For > > >> the > > >>>> moment I went per-module, but runtime-wise that isn't ideal the more > > >>>> modules we'd activate this in. ArchUnit does cache classes between > > >> tests, > > >>>> but if we run them individually per module, we'd still add up quite > a > > >> bit > > >>>> of execution time (a single module in my IDE is around 10s with the > > >> tests I > > >>>> currently have implemented, but I suspect the bottleneck here is the > > >>>> importing of classes, not the number of tests). Ideally we'd just > run > > >> them > > >>>> once in a module with a big enough classpath to cover everything. If > > we > > >>>> have such a place, that would probably be our best shot. I'll also > > keep > > >>>> investigating here, of course. > > >>>> > > >>>> For now I just pushed a solution to avoid the overlap when executing > > it > > >>>> per-module by matching on the URI. It's not the prettiest solution, > > but > > >>>> does work; but that's more to not fail the tests in unrelated > modules > > >> and > > >>>> doesn't help much with execution time. > > >>>> > > >>>> > > >>>> Ingo > > >>>> > > >>>> On Mon, Sep 6, 2021 at 1:57 PM Chesnay Schepler <ches...@apache.org > > > > >>>> wrote: > > >>>> > > >>>>> Do you have an estimate for long these tests would run for? > > >>>>> > > >>>>> For project-wide tests, what are the options for setting that up? > > >>>>> If we let the tests run per-module then I guess they'd overlap > > >>>>> considerably (because other Flink modules are being put on the > > >>>>> classpath), which isn't ideal. > > >>>>> > > >>>>> On 06/09/2021 13:51, David Morávek wrote: > > >>>>>> Hi Ingo, > > >>>>>> > > >>>>>> +1 for this effort. This could automate lot of "written rules" > that > > >> are > > >>>>>> easy to forget about / not to be aware of (such as that each test > > >> should > > >>>>>> extend the TestLogger as Till has already mentioned). > > >>>>>> > > >>>>>> I went trough your examples and ArchUnit looks really powerful and > > >>>>>> expressive while still being easy to read. > > >>>>>> > > >>>>>> Best, > > >>>>>> D. > > >>>>>> > > >>>>>> On Mon, Sep 6, 2021 at 1:00 PM Ingo Bürk <i...@ververica.com> > > wrote: > > >>>>>> > > >>>>>>> Thanks for your input Chesnay! > > >>>>>>> > > >>>>>>> The limitations of ArchUnit probably mostly stem from the fact > that > > >> it > > >>>>>>> operates on byte code and thus can't access anything not > accessible > > >>>>> from > > >>>>>>> byte code, i.e. JavaDocs. But I think Checkstyle and ArchUnit are > > >>>>>>> complementing each other quite well here. The main reason against > > >>>>>>> Checkstyle for these tests is its limitation to single files > only, > > >>>>>>> rendering many tests (including the one you mentioned) > impossible. > > >> The > > >>>>>>> secondary reason is that ArchUnit has more declarative APIs and > the > > >>>>> tests > > >>>>>>> become quite easy to write and maintain (some groundwork effort > is > > >>>>> needed, > > >>>>>>> of course). Over time we could probably expand quite a bit more > on > > >>>>> what is > > >>>>>>> tested with ArchUnit as it can test entire architectures (package > > >>>>> accesses > > >>>>>>> etc.), and it has support for "freezing" known violations to > > prevent > > >>>>> new > > >>>>>>> violations and removing existing ones over time. > > >>>>>>> > > >>>>>>> The @VisibleForTesting use case you mentioned is possible; I've > > >> pushed > > >>>>> a > > >>>>>>> version of that rule to the draft PR now as well. > > >>>>>>> > > >>>>>>> > > >>>>>>> Best > > >>>>>>> Ingo > > >>>>>>> > > >>>>>>> On Mon, Sep 6, 2021 at 12:11 PM Chesnay Schepler < > > ches...@apache.org > > >>>>>>> wrote: > > >>>>>>> > > >>>>>>>> This sounds like an interesting effort. > > >>>>>>>> > > >>>>>>>> The draft you have opened uses ArchUnit; can you explain a bit > > what > > >>>>> the > > >>>>>>>> capabilities/limitations of said tool are? > > >>>>>>>> > > >>>>>>>> One thing we wanted to have for a long time is that > > methods/classes > > >>>>>>>> annotated with @VisibleForTesting are not called from production > > >> code; > > >>>>>>>> would that be something that could be implemented? > > >>>>>>>> > > >>>>>>>> It's not a problem imo that tests need to run in order to catch > > >> stuff; > > >>>>>>>> so long as it is noticed on CI. > > >>>>>>>> > > >>>>>>>> On 03/09/2021 08:48, Ingo Bürk wrote: > > >>>>>>>>> Hi Timo, Till, > > >>>>>>>>> > > >>>>>>>>> thanks for your input already. I'm glad to hear that the idea > > >>>>>>> resonates, > > >>>>>>>>> also thanks for the additional ideas! > > >>>>>>>>> > > >>>>>>>>> I've created a JIRA issue[1] for now just to track this idea. > I'm > > >>>>> also > > >>>>>>>>> working on a bit of a proof of concept and opened it as a draft > > >>>>> PR[2]. > > >>>>>>>> I'm > > >>>>>>>>> happy for anyone to join that PR to look and discuss. The PR > > >> doesn't > > >>>>>>>>> necessarily intend to be merged in its current state, but is > > rather > > >>>>> for > > >>>>>>>>> evaluation. > > >>>>>>>>> > > >>>>>>>>> Meanwhile I'm also collecting ideas in a Google Doc, so if > anyone > > >>>>> wants > > >>>>>>>> to > > >>>>>>>>> suggest more use cases to explore or implement, please let me > > know. > > >>>>>>>>> > > >>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-24138 > > >>>>>>>>> [2] https://github.com/apache/flink/pull/17133 > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> Best > > >>>>>>>>> Ingo > > >>>>>>>>> > > >>>>>>>>> On Thu, Sep 2, 2021 at 12:31 PM Till Rohrmann < > > >> trohrm...@apache.org> > > >>>>>>>> wrote: > > >>>>>>>>>> If it is possible to automate these kinds of checks, then I am > > all > > >>>>> for > > >>>>>>>> it > > >>>>>>>>>> because everything else breaks eventually. So +1 for this > > >> proposal. > > >>>>>>>>>> I don't have experience with what tools are available, though. > > >>>>>>>>>> > > >>>>>>>>>> I would like to add a rule that every test class extends > > directly > > >> or > > >>>>>>>>>> indirectly TestLogger because otherwise it is super hard to > make > > >>>>> sense > > >>>>>>>> of > > >>>>>>>>>> the test logs (Arvid will probably chime in stating that this > > will > > >>>>> be > > >>>>>>>>>> solved with Junit5 eventually). > > >>>>>>>>>> > > >>>>>>>>>> Not sure whether this is possible or not but if we can check > > that > > >>>>> all > > >>>>>>>>>> interfaces have properly defined JavaDocs on each method, then > > >> this > > >>>>>>>> could > > >>>>>>>>>> also be helpful in my opinion. > > >>>>>>>>>> > > >>>>>>>>>> Cheers, > > >>>>>>>>>> Till > > >>>>>>>>>> > > >>>>>>>>>> On Thu, Sep 2, 2021 at 11:16 AM Timo Walther < > > twal...@apache.org> > > >>>>>>>> wrote: > > >>>>>>>>>>> Hi Ingo, > > >>>>>>>>>>> > > >>>>>>>>>>> thanks for starting this discussion. Having more automation > is > > >>>>>>>>>>> definitely desirable. Esp. in the API / SDK areas where we > > >>>>> frequently > > >>>>>>>>>>> have to add similar comments to PRs. The more checks the > > better. > > >> We > > >>>>>>>>>>> definitely also need more guidelines (e.g. how to develop a > > Flink > > >>>>>>>>>>> connector) but automation is safer then long checklists that > > >> might > > >>>>> be > > >>>>>>>>>>> out of date quickly. > > >>>>>>>>>>> > > >>>>>>>>>>> +1 to the proposal. I don't have an opinion on the tool > though. > > >>>>>>>>>>> > > >>>>>>>>>>> Regards, > > >>>>>>>>>>> Timo > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> On 01.09.21 11:03, Ingo Bürk wrote: > > >>>>>>>>>>>> Hello everyone, > > >>>>>>>>>>>> > > >>>>>>>>>>>> I would like to start a discussion on introducing automated > > >> tests > > >>>>>>> for > > >>>>>>>>>>> more > > >>>>>>>>>>>> architectural rather than stilistic topics. For example, > here > > >> are > > >>>>> a > > >>>>>>>> few > > >>>>>>>>>>>> things that seem worth checking to me (this is > > Table-API-focused > > >>>>>>> since > > >>>>>>>>>> it > > >>>>>>>>>>>> is the subsystem I'm involved in): > > >>>>>>>>>>>> > > >>>>>>>>>>>> (a) All classes in o.a.f.table.api should be annotated with > > one > > >>>>>>>>>>>> of @Internal, @PublicEvolving, or @Public. > > >>>>>>>>>>>> (b) Classes whose name ends in *ConnectorOptions should be > > >> located > > >>>>>>> in > > >>>>>>>>>>>> o.a.f.connector.*.table > > >>>>>>>>>>>> (c) Classes implementing DynamicSourceFactory / > > >> DynamicSinkFactory > > >>>>>>>>>> should > > >>>>>>>>>>>> have no static members of type ConfigOption > > >>>>>>>>>>>> > > >>>>>>>>>>>> There are probably significantly more cases worth checking, > > and > > >>>>> also > > >>>>>>>>>> more > > >>>>>>>>>>>> involved ones (these are rather simple examples), like > > >> disallowing > > >>>>>>>>>> access > > >>>>>>>>>>>> between certain packages etc. There are two questions I > would > > >> like > > >>>>>>> to > > >>>>>>>>>> ask > > >>>>>>>>>>>> to the community: > > >>>>>>>>>>>> > > >>>>>>>>>>>> (1) Do you think such tests are useful in general? > > >>>>>>>>>>>> (2) What use cases come to mind for you? > > >>>>>>>>>>>> > > >>>>>>>>>>>> If the idea finds consensus, I would like to use (2) to > > >>>>> investigate > > >>>>>>>>>> which > > >>>>>>>>>>>> tooling to use. An obvious candidate is Checkstyle, as this > is > > >>>>>>> already > > >>>>>>>>>>>> used. It also has the advantage of being well integrated in > > the > > >>>>> IDE. > > >>>>>>>>>>>> However, it is limited to looking at single files only, and > > >> custom > > >>>>>>>>>> checks > > >>>>>>>>>>>> are pretty complicated and involved to implement[1]. Another > > >>>>>>> possible > > >>>>>>>>>>> tool > > >>>>>>>>>>>> is ArchUnit[2], which would be significantly easier to > > maintain > > >>>>> and > > >>>>>>> is > > >>>>>>>>>>> more > > >>>>>>>>>>>> powerful, but in turn requires tests to be executed. If you > > have > > >>>>>>>>>> further > > >>>>>>>>>>>> suggestions (or thoughts) they would of course also be quite > > >>>>>>> welcome, > > >>>>>>>>>>>> though for now I would focus on (1) and (2) and go from > there > > to > > >>>>>>>>>>> evaluate. > > >>>>>>>>>>>> [1] https://checkstyle.sourceforge.io/writingchecks.html > > >>>>>>>>>>>> [2] https://www.archunit.org/ > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> Best > > >>>>>>>>>>>> Ingo > > >>>>>>>>>>>> > > >> > > > > >