Great! I'll work on getting the PR into an actual, proper shape now, including looking at found violations more carefully and eventually freezing current violations (maybe removing some quick-wins).
One more thing I just ran into is that ArchUnit doesn't explicitly support Scala; while many things just work (since it's still byte code), Scala-specific concepts like traits seem to cause issues. I'll have to exclude Scala code from the checks for now, I think. Ingo On Tue, Sep 7, 2021 at 5:03 PM Chesnay Schepler <ches...@apache.org> wrote: > I would say that's fine time-wise. > > On 07/09/2021 15:29, Ingo Bürk wrote: > > Thanks, Chesnay. I updated the PR to use a separate module now, and ran > it > > on a few modules (some Table API modules and a couple connectors). The CI > > seemed to take ~2.5min for executing the tests; that's certainly not > > negligible. On the other hand, even the few tests implemented already > found > > several violations ("several" is an understatement, but I manually > verified > > some of them, not all of them). > > > > On Mon, Sep 6, 2021 at 3:44 PM Chesnay Schepler <ches...@apache.org> > wrote: > > > >> While flink-tests is currently the best choice in that it has the > >> biggest classpath, it is also the module already requiring the most time > >> on CI. > >> > >> Furthermore, given that we ideally cover all APIs (including connectors > >> & formats), having that mess of dependencies in flink-tests may > >> interfere with existing / future tests. > >> > >> As such I would prefer a separate module, as annoying as that may be. > >> > >> On 06/09/2021 15:26, Ingo Bürk wrote: > >>> I just quickly chatted with the author/maintainer of ArchUnit, and a > >> module > >>> which depends on every module that should be tested seems to be the > best > >>> solution. How do you feel about using flink-tests for this vs. having a > >>> separate module for this purpose? > >>> > >>> > >>> Ingo > >>> > >>> On Mon, Sep 6, 2021 at 3:04 PM Ingo Bürk <i...@ververica.com> wrote: > >>> > >>>> Hi Chesnay, > >>>> > >>>> Those are all great questions, and I want to tackle those as well. For > >> the > >>>> moment I went per-module, but runtime-wise that isn't ideal the more > >>>> modules we'd activate this in. ArchUnit does cache classes between > >> tests, > >>>> but if we run them individually per module, we'd still add up quite a > >> bit > >>>> of execution time (a single module in my IDE is around 10s with the > >> tests I > >>>> currently have implemented, but I suspect the bottleneck here is the > >>>> importing of classes, not the number of tests). Ideally we'd just run > >> them > >>>> once in a module with a big enough classpath to cover everything. If > we > >>>> have such a place, that would probably be our best shot. I'll also > keep > >>>> investigating here, of course. > >>>> > >>>> For now I just pushed a solution to avoid the overlap when executing > it > >>>> per-module by matching on the URI. It's not the prettiest solution, > but > >>>> does work; but that's more to not fail the tests in unrelated modules > >> and > >>>> doesn't help much with execution time. > >>>> > >>>> > >>>> Ingo > >>>> > >>>> On Mon, Sep 6, 2021 at 1:57 PM Chesnay Schepler <ches...@apache.org> > >>>> wrote: > >>>> > >>>>> Do you have an estimate for long these tests would run for? > >>>>> > >>>>> For project-wide tests, what are the options for setting that up? > >>>>> If we let the tests run per-module then I guess they'd overlap > >>>>> considerably (because other Flink modules are being put on the > >>>>> classpath), which isn't ideal. > >>>>> > >>>>> On 06/09/2021 13:51, David Morávek wrote: > >>>>>> Hi Ingo, > >>>>>> > >>>>>> +1 for this effort. This could automate lot of "written rules" that > >> are > >>>>>> easy to forget about / not to be aware of (such as that each test > >> should > >>>>>> extend the TestLogger as Till has already mentioned). > >>>>>> > >>>>>> I went trough your examples and ArchUnit looks really powerful and > >>>>>> expressive while still being easy to read. > >>>>>> > >>>>>> Best, > >>>>>> D. > >>>>>> > >>>>>> On Mon, Sep 6, 2021 at 1:00 PM Ingo Bürk <i...@ververica.com> > wrote: > >>>>>> > >>>>>>> Thanks for your input Chesnay! > >>>>>>> > >>>>>>> The limitations of ArchUnit probably mostly stem from the fact that > >> it > >>>>>>> operates on byte code and thus can't access anything not accessible > >>>>> from > >>>>>>> byte code, i.e. JavaDocs. But I think Checkstyle and ArchUnit are > >>>>>>> complementing each other quite well here. The main reason against > >>>>>>> Checkstyle for these tests is its limitation to single files only, > >>>>>>> rendering many tests (including the one you mentioned) impossible. > >> The > >>>>>>> secondary reason is that ArchUnit has more declarative APIs and the > >>>>> tests > >>>>>>> become quite easy to write and maintain (some groundwork effort is > >>>>> needed, > >>>>>>> of course). Over time we could probably expand quite a bit more on > >>>>> what is > >>>>>>> tested with ArchUnit as it can test entire architectures (package > >>>>> accesses > >>>>>>> etc.), and it has support for "freezing" known violations to > prevent > >>>>> new > >>>>>>> violations and removing existing ones over time. > >>>>>>> > >>>>>>> The @VisibleForTesting use case you mentioned is possible; I've > >> pushed > >>>>> a > >>>>>>> version of that rule to the draft PR now as well. > >>>>>>> > >>>>>>> > >>>>>>> Best > >>>>>>> Ingo > >>>>>>> > >>>>>>> On Mon, Sep 6, 2021 at 12:11 PM Chesnay Schepler < > ches...@apache.org > >>>>>>> wrote: > >>>>>>> > >>>>>>>> This sounds like an interesting effort. > >>>>>>>> > >>>>>>>> The draft you have opened uses ArchUnit; can you explain a bit > what > >>>>> the > >>>>>>>> capabilities/limitations of said tool are? > >>>>>>>> > >>>>>>>> One thing we wanted to have for a long time is that > methods/classes > >>>>>>>> annotated with @VisibleForTesting are not called from production > >> code; > >>>>>>>> would that be something that could be implemented? > >>>>>>>> > >>>>>>>> It's not a problem imo that tests need to run in order to catch > >> stuff; > >>>>>>>> so long as it is noticed on CI. > >>>>>>>> > >>>>>>>> On 03/09/2021 08:48, Ingo Bürk wrote: > >>>>>>>>> Hi Timo, Till, > >>>>>>>>> > >>>>>>>>> thanks for your input already. I'm glad to hear that the idea > >>>>>>> resonates, > >>>>>>>>> also thanks for the additional ideas! > >>>>>>>>> > >>>>>>>>> I've created a JIRA issue[1] for now just to track this idea. I'm > >>>>> also > >>>>>>>>> working on a bit of a proof of concept and opened it as a draft > >>>>> PR[2]. > >>>>>>>> I'm > >>>>>>>>> happy for anyone to join that PR to look and discuss. The PR > >> doesn't > >>>>>>>>> necessarily intend to be merged in its current state, but is > rather > >>>>> for > >>>>>>>>> evaluation. > >>>>>>>>> > >>>>>>>>> Meanwhile I'm also collecting ideas in a Google Doc, so if anyone > >>>>> wants > >>>>>>>> to > >>>>>>>>> suggest more use cases to explore or implement, please let me > know. > >>>>>>>>> > >>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-24138 > >>>>>>>>> [2] https://github.com/apache/flink/pull/17133 > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Best > >>>>>>>>> Ingo > >>>>>>>>> > >>>>>>>>> On Thu, Sep 2, 2021 at 12:31 PM Till Rohrmann < > >> trohrm...@apache.org> > >>>>>>>> wrote: > >>>>>>>>>> If it is possible to automate these kinds of checks, then I am > all > >>>>> for > >>>>>>>> it > >>>>>>>>>> because everything else breaks eventually. So +1 for this > >> proposal. > >>>>>>>>>> I don't have experience with what tools are available, though. > >>>>>>>>>> > >>>>>>>>>> I would like to add a rule that every test class extends > directly > >> or > >>>>>>>>>> indirectly TestLogger because otherwise it is super hard to make > >>>>> sense > >>>>>>>> of > >>>>>>>>>> the test logs (Arvid will probably chime in stating that this > will > >>>>> be > >>>>>>>>>> solved with Junit5 eventually). > >>>>>>>>>> > >>>>>>>>>> Not sure whether this is possible or not but if we can check > that > >>>>> all > >>>>>>>>>> interfaces have properly defined JavaDocs on each method, then > >> this > >>>>>>>> could > >>>>>>>>>> also be helpful in my opinion. > >>>>>>>>>> > >>>>>>>>>> Cheers, > >>>>>>>>>> Till > >>>>>>>>>> > >>>>>>>>>> On Thu, Sep 2, 2021 at 11:16 AM Timo Walther < > twal...@apache.org> > >>>>>>>> wrote: > >>>>>>>>>>> Hi Ingo, > >>>>>>>>>>> > >>>>>>>>>>> thanks for starting this discussion. Having more automation is > >>>>>>>>>>> definitely desirable. Esp. in the API / SDK areas where we > >>>>> frequently > >>>>>>>>>>> have to add similar comments to PRs. The more checks the > better. > >> We > >>>>>>>>>>> definitely also need more guidelines (e.g. how to develop a > Flink > >>>>>>>>>>> connector) but automation is safer then long checklists that > >> might > >>>>> be > >>>>>>>>>>> out of date quickly. > >>>>>>>>>>> > >>>>>>>>>>> +1 to the proposal. I don't have an opinion on the tool though. > >>>>>>>>>>> > >>>>>>>>>>> Regards, > >>>>>>>>>>> Timo > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On 01.09.21 11:03, Ingo Bürk wrote: > >>>>>>>>>>>> Hello everyone, > >>>>>>>>>>>> > >>>>>>>>>>>> I would like to start a discussion on introducing automated > >> tests > >>>>>>> for > >>>>>>>>>>> more > >>>>>>>>>>>> architectural rather than stilistic topics. For example, here > >> are > >>>>> a > >>>>>>>> few > >>>>>>>>>>>> things that seem worth checking to me (this is > Table-API-focused > >>>>>>> since > >>>>>>>>>> it > >>>>>>>>>>>> is the subsystem I'm involved in): > >>>>>>>>>>>> > >>>>>>>>>>>> (a) All classes in o.a.f.table.api should be annotated with > one > >>>>>>>>>>>> of @Internal, @PublicEvolving, or @Public. > >>>>>>>>>>>> (b) Classes whose name ends in *ConnectorOptions should be > >> located > >>>>>>> in > >>>>>>>>>>>> o.a.f.connector.*.table > >>>>>>>>>>>> (c) Classes implementing DynamicSourceFactory / > >> DynamicSinkFactory > >>>>>>>>>> should > >>>>>>>>>>>> have no static members of type ConfigOption > >>>>>>>>>>>> > >>>>>>>>>>>> There are probably significantly more cases worth checking, > and > >>>>> also > >>>>>>>>>> more > >>>>>>>>>>>> involved ones (these are rather simple examples), like > >> disallowing > >>>>>>>>>> access > >>>>>>>>>>>> between certain packages etc. There are two questions I would > >> like > >>>>>>> to > >>>>>>>>>> ask > >>>>>>>>>>>> to the community: > >>>>>>>>>>>> > >>>>>>>>>>>> (1) Do you think such tests are useful in general? > >>>>>>>>>>>> (2) What use cases come to mind for you? > >>>>>>>>>>>> > >>>>>>>>>>>> If the idea finds consensus, I would like to use (2) to > >>>>> investigate > >>>>>>>>>> which > >>>>>>>>>>>> tooling to use. An obvious candidate is Checkstyle, as this is > >>>>>>> already > >>>>>>>>>>>> used. It also has the advantage of being well integrated in > the > >>>>> IDE. > >>>>>>>>>>>> However, it is limited to looking at single files only, and > >> custom > >>>>>>>>>> checks > >>>>>>>>>>>> are pretty complicated and involved to implement[1]. Another > >>>>>>> possible > >>>>>>>>>>> tool > >>>>>>>>>>>> is ArchUnit[2], which would be significantly easier to > maintain > >>>>> and > >>>>>>> is > >>>>>>>>>>> more > >>>>>>>>>>>> powerful, but in turn requires tests to be executed. If you > have > >>>>>>>>>> further > >>>>>>>>>>>> suggestions (or thoughts) they would of course also be quite > >>>>>>> welcome, > >>>>>>>>>>>> though for now I would focus on (1) and (2) and go from there > to > >>>>>>>>>>> evaluate. > >>>>>>>>>>>> [1] https://checkstyle.sourceforge.io/writingchecks.html > >>>>>>>>>>>> [2] https://www.archunit.org/ > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Best > >>>>>>>>>>>> Ingo > >>>>>>>>>>>> > >> > >