Re: [DISCUSS] Generating java code?

Niels Basjes Tue, 28 Apr 2020 02:14:55 -0700

Hi,

Thank you for clarifying the tradeoffs and choices.
I've updated my pull request for your review, as far as I can tell it meets
the choices.


Now there are 3 scenarios:
1) There is no properties file --> everything returns a "default" value.
These are the defaults I have chosen:

Version                : <unknown>
ScalaVersion           : <unknown>
BuildTime              : 1970-01-01T00:00:00Z
BuildTimeString        : 1970-01-01T00:00:00+0000
GitCommitId            : DecafC0ffeeD0d0F00d
GitCommitIdAbbrev      : DeadD0d0
GitCommitTime          : 1970-01-01T00:00:00Z
GitCommitTimeString    : 1970-01-01T00:00:00+0000


2) A poorly generated properties file exists --> Only the bad values return
a default.
This happens because the maven git plugin is not run by the IDE.
Example:

project.version=1.11-SNAPSHOT
scala.binary.version=2.11

git.commit.id=${git.commit.id}
git.commit.id.abbrev=${git.commit.id.abbrev}
git.commit.time=${git.commit.time}
git.build.time=${git.build.time}


3) A full generated properties file exists --> Everything returns the right
value.
Example:

project.version=1.11-SNAPSHOT
scala.binary.version=2.11

git.commit.id=8e2642ec0046bb02fe9c1648b589051e66c2f956
git.commit.id.abbrev=8e2642e
git.commit.time=2020-04-28T09:22:20+0200
git.build.time=2020-04-28T10:06:42+0200


Niels

On Tue, Apr 21, 2020 at 2:00 PM Chesnay Schepler <[email protected]> wrote:

> I've had an offline discussion with Till and we came to the following
> conclusion:
>
> All out-lined approaches work perfectly when Flink is built in it's
> entirety with maven.
> None of the out-lined approach work perfectly when Flink is _not_ built in
> it's entirety.
>
> The source-file generation approach has similar problems as the
> properties-file one; if I build flink-runtime, switch to another branch,
> build flink-dist, then the information that will be displayed in the UI
> when I start the cluster may be incorrect.
> Put another way, you can still easily run into the issue of "doesn't work
> on *my* machine", the solution to which always is to rebuilt Flink
> completely.
> Frustratingly enough, with both approaches you can even run into the issue
> where the information _seems_ correct, when it actually isn't; but this
> isn't something that we can really solve.
>
> As such, this is no longer really about correctness, but one about
> convenience and maintenance overhead.
>
> As for maintenance, I think a good argument can be made that the
> generation approach would be easier to maintain, in parts because it is the
> more standard approach for this kind of thing.
>
> The properties-file approach however clearly wins in convenience as it
> neither requires an additional command to be run and doesn't cause compile
> errors. The compile error is quite a problem indeed since it provides no
> information as to what the actual problem is; or rather how to solve it.
>
> Overall we concluded that we would like to stick to the existing approach
> of having a properties file (or JSON or whatever).
>
> I saw that you updated the PR, and will take a look shortly.
>
> On a personal note, I'm sympathetic to your idea, but people are quite
> sensitive to things interrupting their work-flow (which I ran into a few
> times already :/ ), so we put a lot of emphasis on that aspect.
>
> On 16/04/2020 16:05, Niels Basjes wrote:
>
> Hi,
>
> Apparently the IDEs (like IntelliJ) are imperfect at supporting the DEVs in
> these kinds of use cases.
> The solutions you see are like you have in IntelliJ: a "generate-sources"
> button.
> It is a way of working I see in almost all projects I'm involved with:
> Almost all either use a parser generator (like javacc or Antlr) or generate
> code from a specification (like Avro or Protobuf).
> In all of these cases there is no "default" value you can use so this
> discussion never arose in those projects.
>
> All other options (other than 'hard' generating a file/class without having
> defaults) involve having some kind of "default" (regardless if it is a
> property file, a Json file or a Java interface).
> This will lead to the illusion of "it works on my machine" or "why doesn't
> it work on my machine".
> What I also see is that these options which include "defaults" also have
> quite a bit of extra code to handle all the situations related to loading
> the settings (either through resource loading or classloading).
>
> The goal I was working on is that the docker image that is loaded is tagged
> with a flink+scala version.
> If the DEV does not generate the right values or has values from a
> different branch then a different docker image may be used then what is
> expected leading to "why doesn't it work on my machine" .
>
> So yes it is an additional learning step for the DEVs and yes it takes a
> bit of getting used to.
> Yet I truly believe it is the best direction.
>
> Niels
>
>
> On Wed, Apr 15, 2020 at 5:26 PM Till Rohrmann <[email protected]> 
> <[email protected]> wrote:
>
>
> I'm not advocating for a specific approach. The point I wanted to make is
> that there are solutions which allow us to get rid of the problematic
> parsing and not disrupting the workflow. If the Jackson JSON file approach
> works for Niels, then I'm fine with that as well.
>
> Cheers,
> Till
>
> On Wed, Apr 15, 2020 at 5:21 PM Chesnay Schepler <[email protected]> 
> <[email protected]>
> wrote:
>
>
> It doesn't have to be a properties file, nor do we necessarily have to
> do any manual parsing.
> It could just be a JSON file that we point Jackson at. Naturally we
> could also generate it with Jackson.
> You'd have a POJO for all the fields with sane defaults (an analogue to
> the proposed generated POJO) and 5-6 lines at most for loading the file
> and handling errors.
>
> I don't think this would be such a problem.
>
> If I understand Till correctly he's proposing a service-loader approach,
> which seems overkill to me. It also introduces some inconsistency in
> what one needs to do before working on stuff which I'm apprehensive
>
> about.
>
> On 15/04/2020 16:31, Till Rohrmann wrote:
>
> Hi everyone,
>
> thanks for starting this discussion Niels.
>
> I like the idea of getting rid of parsing a Properties instance. On the
> other hand, I also understand that people are concerned about
>
> disrupting
>
> the workflows of our devs.
>
> Maybe we can find a compromise between both approaches. For example,
>
> one
>
> could define an EnvironmentInformationProvider interface which is
>
> loaded
>
> at
>
> runtime. The implementation of this interface could be generated by
>
> `mvn
>
> generate-sources`. Moreover, if we cannot find it, then we fail with a
> descriptive exception saying to run `mvn generate-sources`. That way
>
> only
>
> devs who rely on the EnvironmentInformation have to run `mvn
> generate-sources` when testing/developing and we could still get rid of
>
> all
>
> the parsing and logic of a properties file. What do you think?
>
> Cheers,
> Till
>
> On Mon, Apr 13, 2020 at 10:53 AM Niels Basjes <[email protected]> 
> <[email protected]> wrote:
>
>
> Hi,
>
> On Mon, Apr 13, 2020 at 8:30 AM Yang Wang <[email protected]> 
> <[email protected]>
>
> wrote:
>
> Although the latter option is more stable,
> i think it is not acceptable for all the developers to execute `mvn
> generate-sources` first.
>
> Otherwise, the Flink project is just broken and could not run tests,
>
> Flink
>
> jobs in IDE.
>
>
>
> It is important to realize that this discussion is essentially around
>
> the
>
> fact that systems like IntelliJ (what I use) execute various ways of
> generating code that have been written in a maven pom.xml in different
> ways. Simply put: They don't execute all instructions that have been
> defined in the pom.xml automatically when needed.
>
>     - With the properties file variant (which uses resource filtering)
>     IntelliJ does resource filtering automatically yet no maven
>
> plugins
>
> are
>
>     run that affect the properties that are replaced.
>     So the git variables are NOT populated when running in IntelliJ
>
> and
>
> I
>
>     have seen several values in the propersites file are nonsense.
>     As a consequence the Java code reading those needs to catch things
>
> like
>
>     "missing properties file" and various kinds of "nonsense values"
>
> and
>
>     replace them with a workable default.
>     So when running this code you are actually running something
>
> different
>
>     from what will be run when doing a `mvn clean install`, yet the
> developer
>     is under the illusion it all works because of the code that
>
> catches
>
> all
>
> of
>     those problems.
>     Depending on the way these variables are used this may lead to "It
>
> fails
>
>     in production but it works fine in IntelliJ" situations.
>     In my mind the code that catches all of those exceptional
>
> situations in
>
>     poorly generated properties only exists so that developers can run
>
> the
>
>     (otherwise perfectly fine) software in a "broken"
>
> build/development
>
>     environment.
>
>
>     - With the way I propose to generate the code the effect is indeed
>     slightly harder: If you do not run the pom.xml based code
>
> generation it
>
>     will not work.
>     I understand that this requires the developers to think a bit more
>
> about
>
>     the code they are working on.
>     The upside is that the code either works perfectly or does not
>
> compile.
>
>     There is no "it compiles but is really nonsense".
>     I prefer this.
>     Also at this point in time this is already true for Flink: There
>
> are
>
>     quite a few modules where the developer MUST run mvn
>
> generate-sources
>
> for
>     all tests to run successfully.
>     Like I indicated there is a javacc SQL parser and there are a lot
>
> of
>
>     tests that rely on generating Avro code before they are able to
>
> run.
>
>     I have several projects where systems like Avro and Antlr force me
>
> in
>
>     this direction, there is simply no other way to do this.
>
> So i think the version properties file is enough for now. +1 for the
>
> first
>
> option.
>
>
> Like I said. I'm fine with either choice by the committers.
> I would choose the `mvn generate-sources` code variant as it is much
> simpler and much more reliable.
>
> Niels
>
> Niels Basjes <[email protected]> <[email protected]> 于2020年4月9日周四 下午4:47写道：
>
> Hi,
>
> I'm working on https://issues.apache.org/jira/browse/FLINK-16871 to
>
> make
>
> more build time variables (like the scala version) into the code
>
> available
>
> at runtime.
>
> During the review process there was discussion around a basic
>
> question:
>
> *Is
>
> generating java code during the build ok?*
> See
>
>     -
>
> https://github.com/apache/flink/pull/11245#discussion_r400035133
>
>     - https://github.com/apache/flink/pull/11592
>     -
>
> https://github.com/apache/flink/pull/11592#issuecomment-610282450
>
> As suggested by Chesnay Schepler I'm putting this question to the
>
> mailing
>
> list.
>
>
> https://github.com/apache/flink/pull/11592#issuecomment-610963947
>
> The main discussion was around the ease of use when running in an
>
> IDE
>
> like
>
> IntelliJ.
>
> So essentially we have two solution directions available:
>
>     1. *Generate a properties file and then use the classloader to
>
> load
>
> this
>
>     file as a resource and then parse it as a property file.*
>     This is the currently used solution direction for this part of
>
> the
>
> code.
>
>     A rough version of this (to be improved) :
>
>
>
> https://github.com/apache/flink/commit/47099f663b7644056e9d87b262cd4dba034f513e
>
>     This method has several effects:
>        1. The developer can run the project immediately from within
>
> the
>
> IDE
>
>        as fallback values are provided if the 'actual' values are
>
> missing.
>
>        2. This property file (with stuff that should never be
>
> overwritten)
>
>        can be modified by placing a different one in the classpath.
>
> In
>
> fact it IS
>        modified in the flink-dist as it generates a new file with
>
> the
>
> same
>
> name
>        into the binary distribution (I consider this to be bad).
>        3. Loading resources means loading, parsing and a lot of
>
> error
>
>        handling. Lots of things "can be null" or  be a default
>
> value.
>
> So
>
> the
>
>        values are unreliable and lots of code needs to handle this.
>
> In
>
> fact
>
> when
>        running from IntelliJ the properties file is generated poorly
>
> most
>
> of the
>        time, only during a normal maven build will it work
>
> correctly.
>
>     2. *Generate a Java source file and then simply compile this and
>
> make
>
> it
>
>     part of the project.*
>     Essentially the same model as you would have when using Apache
>
> Avro,
>
>     Protobuf, Antlr 4 and javacc (several of those are used in
>
> Flink!).
>
>     A rough version of this (to be improved) :
>
>
>
> https://github.com/apache/flink/commit/d215e4df60dc9d647dcee1aa9a2114cbf49d0566
>
>     This method has several effects:
>     1. The developer MUST run 'mvn generate-sources' before the
>
> actual
>
> the
>
>        project immediately from within the IDE as fallback values
>
> are
>
> provided if
>        the 'actual' values are missing.
>        2. The code/test will not run until this step is done.
>        3. Because the file is generated by a plugin it is always
>
> correct.
>
> As
>
>        a consequence all variables are always available and the
>
> downstream
>
> users
>        no longer have to handle the "can be null" or "default value"
> situations.
>
> So is generating code similar to what I created a desired change?
> My opinion is that it is the better solution, the data available is
>
> more
>
> reliable and as a consequence the rest of the code is simpler.
> It would probably mean that during development of flink you should
>
> be
>
> aware
>
> of this and do an 'extra step' to get it running.
>
> What do you guys think?
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>
>
>
>

-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: [DISCUSS] Generating java code?

Reply via email to