Hi everyone! I would like to start an open discussion about some issue with the heterogeneity of the Flink code base.
We have, since the beginning in Apache (and even since we started the predecessor project, Stratosphere) refrained from strictly enforcing conventions like formatting, style, or libraries. I like the idea behind it, that committers and contributors are not forced into a corset of hundreds of rules before they can contribute something. As the project is growing, more and more people with different backgrounds have joined, and the project has grown a bit heterogeneous in several parts. In many cases, not necessarily due to need for different functionality, but simply due to "roll your own style". I think this is starting to become a bit of an issue. Here are a few examples: - Parameter checking is sometimes done with commons-lang3, commons-lang, or guava - Command line parsing is sometimes done with commons-cli, sometimes with scopt. - Code styles are quite different from commit to commit. Spaces, indentations, braces. Not a critical thing, but seems to encourage people to reformat other people's code, whenever the pass over it, which should be avoided (cluttered diffs, may introduce new bugs actually) - Some projects are mixed Java/Scala, which is not perfectly supported by the tools so far. It also needs many "fromJava / toJava" conversions and makes the entry hurdle into the project higher. - Tests are sometimes written as Java Unit tests, sometimes as Scala Unit tests (method style), sometimes as Scala Unit Tests (grammar style). Not all things need to be unified across the entire Flink code base. But it becomes harder to switch between projects, even for seasoned Flinksters. And it becomes a hurdle for new contributors, which is very critical. I, personally, would like to encourage people to keep this in mind. Easier understanding of the code and easier entry for newcomers (for which a certain homogeneity helps quite a bit) should have a higher priority than the desire to stick to the personal favorite code style or library. This is a big community effort, after all. That said, we should not, of course, block of the use of new libraries/languages/features when they have significant benefit over the existing state. I am eager to hear opinions! Stephan