Hi,
Thank you for clarifying the tradeoffs and choices.
I've updated my pull request for your review, as far as I can tell it meets
the choices.
Now there are 3 scenarios:
1) There is no properties file --> everything returns a "default" value.
These are the defaults I have chosen:
Version : <unknown>
ScalaVersion : <unknown>
BuildTime : 1970-01-01T00:00:00Z
BuildTimeString : 1970-01-01T00:00:00+0000
GitCommitId : DecafC0ffeeD0d0F00d
GitCommitIdAbbrev : DeadD0d0
GitCommitTime : 1970-01-01T00:00:00Z
GitCommitTimeString : 1970-01-01T00:00:00+0000
2) A poorly generated properties file exists --> Only the bad values return
a default.
This happens because the maven git plugin is not run by the IDE.
Example:
project.version=1.11-SNAPSHOT
scala.binary.version=2.11
git.commit.id=${git.commit.id}
git.commit.id.abbrev=${git.commit.id.abbrev}
git.commit.time=${git.commit.time}
git.build.time=${git.build.time}
3) A full generated properties file exists --> Everything returns the right
value.
Example:
project.version=1.11-SNAPSHOT
scala.binary.version=2.11
git.commit.id=8e2642ec0046bb02fe9c1648b589051e66c2f956
git.commit.id.abbrev=8e2642e
git.commit.time=2020-04-28T09:22:20+0200
git.build.time=2020-04-28T10:06:42+0200
Niels
On Tue, Apr 21, 2020 at 2:00 PM Chesnay Schepler <[email protected]> wrote:
> I've had an offline discussion with Till and we came to the following
> conclusion:
>
> All out-lined approaches work perfectly when Flink is built in it's
> entirety with maven.
> None of the out-lined approach work perfectly when Flink is _not_ built in
> it's entirety.
>
> The source-file generation approach has similar problems as the
> properties-file one; if I build flink-runtime, switch to another branch,
> build flink-dist, then the information that will be displayed in the UI
> when I start the cluster may be incorrect.
> Put another way, you can still easily run into the issue of "doesn't work
> on *my* machine", the solution to which always is to rebuilt Flink
> completely.
> Frustratingly enough, with both approaches you can even run into the issue
> where the information _seems_ correct, when it actually isn't; but this
> isn't something that we can really solve.
>
> As such, this is no longer really about correctness, but one about
> convenience and maintenance overhead.
>
> As for maintenance, I think a good argument can be made that the
> generation approach would be easier to maintain, in parts because it is the
> more standard approach for this kind of thing.
>
> The properties-file approach however clearly wins in convenience as it
> neither requires an additional command to be run and doesn't cause compile
> errors. The compile error is quite a problem indeed since it provides no
> information as to what the actual problem is; or rather how to solve it.
>
> Overall we concluded that we would like to stick to the existing approach
> of having a properties file (or JSON or whatever).
>
> I saw that you updated the PR, and will take a look shortly.
>
> On a personal note, I'm sympathetic to your idea, but people are quite
> sensitive to things interrupting their work-flow (which I ran into a few
> times already :/ ), so we put a lot of emphasis on that aspect.
>
> On 16/04/2020 16:05, Niels Basjes wrote:
>
> Hi,
>
> Apparently the IDEs (like IntelliJ) are imperfect at supporting the DEVs in
> these kinds of use cases.
> The solutions you see are like you have in IntelliJ: a "generate-sources"
> button.
> It is a way of working I see in almost all projects I'm involved with:
> Almost all either use a parser generator (like javacc or Antlr) or generate
> code from a specification (like Avro or Protobuf).
> In all of these cases there is no "default" value you can use so this
> discussion never arose in those projects.
>
> All other options (other than 'hard' generating a file/class without having
> defaults) involve having some kind of "default" (regardless if it is a
> property file, a Json file or a Java interface).
> This will lead to the illusion of "it works on my machine" or "why doesn't
> it work on my machine".
> What I also see is that these options which include "defaults" also have
> quite a bit of extra code to handle all the situations related to loading
> the settings (either through resource loading or classloading).
>
> The goal I was working on is that the docker image that is loaded is tagged
> with a flink+scala version.
> If the DEV does not generate the right values or has values from a
> different branch then a different docker image may be used then what is
> expected leading to "why doesn't it work on my machine" .
>
> So yes it is an additional learning step for the DEVs and yes it takes a
> bit of getting used to.
> Yet I truly believe it is the best direction.
>
> Niels
>
>
> On Wed, Apr 15, 2020 at 5:26 PM Till Rohrmann <[email protected]>
> <[email protected]> wrote:
>
>
> I'm not advocating for a specific approach. The point I wanted to make is
> that there are solutions which allow us to get rid of the problematic
> parsing and not disrupting the workflow. If the Jackson JSON file approach
> works for Niels, then I'm fine with that as well.
>
> Cheers,
> Till
>
> On Wed, Apr 15, 2020 at 5:21 PM Chesnay Schepler <[email protected]>
> <[email protected]>
> wrote:
>
>
> It doesn't have to be a properties file, nor do we necessarily have to
> do any manual parsing.
> It could just be a JSON file that we point Jackson at. Naturally we
> could also generate it with Jackson.
> You'd have a POJO for all the fields with sane defaults (an analogue to
> the proposed generated POJO) and 5-6 lines at most for loading the file
> and handling errors.
>
> I don't think this would be such a problem.
>
> If I understand Till correctly he's proposing a service-loader approach,
> which seems overkill to me. It also introduces some inconsistency in
> what one needs to do before working on stuff which I'm apprehensive
>
> about.
>
> On 15/04/2020 16:31, Till Rohrmann wrote:
>
> Hi everyone,
>
> thanks for starting this discussion Niels.
>
> I like the idea of getting rid of parsing a Properties instance. On the
> other hand, I also understand that people are concerned about
>
> disrupting
>
> the workflows of our devs.
>
> Maybe we can find a compromise between both approaches. For example,
>
> one
>
> could define an EnvironmentInformationProvider interface which is
>
> loaded
>
> at
>
> runtime. The implementation of this interface could be generated by
>
> `mvn
>
> generate-sources`. Moreover, if we cannot find it, then we fail with a
> descriptive exception saying to run `mvn generate-sources`. That way
>
> only
>
> devs who rely on the EnvironmentInformation have to run `mvn
> generate-sources` when testing/developing and we could still get rid of
>
> all
>
> the parsing and logic of a properties file. What do you think?
>
> Cheers,
> Till
>
> On Mon, Apr 13, 2020 at 10:53 AM Niels Basjes <[email protected]>
> <[email protected]> wrote:
>
>
> Hi,
>
> On Mon, Apr 13, 2020 at 8:30 AM Yang Wang <[email protected]>
> <[email protected]>
>
> wrote:
>
> Although the latter option is more stable,
> i think it is not acceptable for all the developers to execute `mvn
> generate-sources` first.
>
> Otherwise, the Flink project is just broken and could not run tests,
>
> Flink
>
> jobs in IDE.
>
>
>
> It is important to realize that this discussion is essentially around
>
> the
>
> fact that systems like IntelliJ (what I use) execute various ways of
> generating code that have been written in a maven pom.xml in different
> ways. Simply put: They don't execute all instructions that have been
> defined in the pom.xml automatically when needed.
>
> - With the properties file variant (which uses resource filtering)
> IntelliJ does resource filtering automatically yet no maven
>
> plugins
>
> are
>
> run that affect the properties that are replaced.
> So the git variables are NOT populated when running in IntelliJ
>
> and
>
> I
>
> have seen several values in the propersites file are nonsense.
> As a consequence the Java code reading those needs to catch things
>
> like
>
> "missing properties file" and various kinds of "nonsense values"
>
> and
>
> replace them with a workable default.
> So when running this code you are actually running something
>
> different
>
> from what will be run when doing a `mvn clean install`, yet the
> developer
> is under the illusion it all works because of the code that
>
> catches
>
> all
>
> of
> those problems.
> Depending on the way these variables are used this may lead to "It
>
> fails
>
> in production but it works fine in IntelliJ" situations.
> In my mind the code that catches all of those exceptional
>
> situations in
>
> poorly generated properties only exists so that developers can run
>
> the
>
> (otherwise perfectly fine) software in a "broken"
>
> build/development
>
> environment.
>
>
> - With the way I propose to generate the code the effect is indeed
> slightly harder: If you do not run the pom.xml based code
>
> generation it
>
> will not work.
> I understand that this requires the developers to think a bit more
>
> about
>
> the code they are working on.
> The upside is that the code either works perfectly or does not
>
> compile.
>
> There is no "it compiles but is really nonsense".
> I prefer this.
> Also at this point in time this is already true for Flink: There
>
> are
>
> quite a few modules where the developer MUST run mvn
>
> generate-sources
>
> for
> all tests to run successfully.
> Like I indicated there is a javacc SQL parser and there are a lot
>
> of
>
> tests that rely on generating Avro code before they are able to
>
> run.
>
> I have several projects where systems like Avro and Antlr force me
>
> in
>
> this direction, there is simply no other way to do this.
>
> So i think the version properties file is enough for now. +1 for the
>
> first
>
> option.
>
>
> Like I said. I'm fine with either choice by the committers.
> I would choose the `mvn generate-sources` code variant as it is much
> simpler and much more reliable.
>
> Niels
>
> Niels Basjes <[email protected]> <[email protected]> 于2020年4月9日周四 下午4:47写道:
>
> Hi,
>
> I'm working on https://issues.apache.org/jira/browse/FLINK-16871 to
>
> make
>
> more build time variables (like the scala version) into the code
>
> available
>
> at runtime.
>
> During the review process there was discussion around a basic
>
> question:
>
> *Is
>
> generating java code during the build ok?*
> See
>
> -
>
> https://github.com/apache/flink/pull/11245#discussion_r400035133
>
> - https://github.com/apache/flink/pull/11592
> -
>
> https://github.com/apache/flink/pull/11592#issuecomment-610282450
>
> As suggested by Chesnay Schepler I'm putting this question to the
>
> mailing
>
> list.
>
>
> https://github.com/apache/flink/pull/11592#issuecomment-610963947
>
> The main discussion was around the ease of use when running in an
>
> IDE
>
> like
>
> IntelliJ.
>
> So essentially we have two solution directions available:
>
> 1. *Generate a properties file and then use the classloader to
>
> load
>
> this
>
> file as a resource and then parse it as a property file.*
> This is the currently used solution direction for this part of
>
> the
>
> code.
>
> A rough version of this (to be improved) :
>
>
>
> https://github.com/apache/flink/commit/47099f663b7644056e9d87b262cd4dba034f513e
>
> This method has several effects:
> 1. The developer can run the project immediately from within
>
> the
>
> IDE
>
> as fallback values are provided if the 'actual' values are
>
> missing.
>
> 2. This property file (with stuff that should never be
>
> overwritten)
>
> can be modified by placing a different one in the classpath.
>
> In
>
> fact it IS
> modified in the flink-dist as it generates a new file with
>
> the
>
> same
>
> name
> into the binary distribution (I consider this to be bad).
> 3. Loading resources means loading, parsing and a lot of
>
> error
>
> handling. Lots of things "can be null" or be a default
>
> value.
>
> So
>
> the
>
> values are unreliable and lots of code needs to handle this.
>
> In
>
> fact
>
> when
> running from IntelliJ the properties file is generated poorly
>
> most
>
> of the
> time, only during a normal maven build will it work
>
> correctly.
>
> 2. *Generate a Java source file and then simply compile this and
>
> make
>
> it
>
> part of the project.*
> Essentially the same model as you would have when using Apache
>
> Avro,
>
> Protobuf, Antlr 4 and javacc (several of those are used in
>
> Flink!).
>
> A rough version of this (to be improved) :
>
>
>
> https://github.com/apache/flink/commit/d215e4df60dc9d647dcee1aa9a2114cbf49d0566
>
> This method has several effects:
> 1. The developer MUST run 'mvn generate-sources' before the
>
> actual
>
> the
>
> project immediately from within the IDE as fallback values
>
> are
>
> provided if
> the 'actual' values are missing.
> 2. The code/test will not run until this step is done.
> 3. Because the file is generated by a plugin it is always
>
> correct.
>
> As
>
> a consequence all variables are always available and the
>
> downstream
>
> users
> no longer have to handle the "can be null" or "default value"
> situations.
>
> So is generating code similar to what I created a desired change?
> My opinion is that it is the better solution, the data available is
>
> more
>
> reliable and as a consequence the rest of the code is simpler.
> It would probably mean that during development of flink you should
>
> be
>
> aware
>
> of this and do an 'extra step' to get it running.
>
> What do you guys think?
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>
>
>
>
--
Best regards / Met vriendelijke groeten,
Niels Basjes