updated my junit 5 PR to show that: https://github.com/apache/beam/pull/4360/files#diff-578d1770f8b47ebbc1e74a2c19de9a6aR28
It doesn't remove jackson yet but exposes a nicer user interface for the config. I'm not fully clear on all the jackson usage yet, there are some round trips (PO -> json -> PO) which are weird without more knowledge. Romain Manni-Bucau @rmannibucau <https://twitter.com/rmannibucau> | Blog <https://rmannibucau.metawerx.net/> | Old Blog <http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> | LinkedIn <https://www.linkedin.com/in/rmannibucau> 2018-01-09 22:57 GMT+01:00 Lukasz Cwik <lc...@google.com>: > Removing large dependencies is great but I can only assume it will be a > lot of work so if you can get it to work in a backwards compatible way > great. > > On Tue, Jan 9, 2018 at 1:52 PM, Romain Manni-Bucau <rmannibu...@gmail.com> > wrote: > >> It conflicts easily between libs and containers. Shade is not a good >> option too - see the thread on this topic :(. >> >> At the end i see using the cli solution closer to user - vs framework dev >> for json - and hurting less in terms of classpath so pby sthg to test no? >> >> Le 9 janv. 2018 22:47, "Lukasz Cwik" <lc...@google.com> a écrit : >> >>> Romain, how has Jackson been a classpath breaker? >>> >>> >>> On Tue, Jan 9, 2018 at 1:20 PM, Romain Manni-Bucau < >>> rmannibu...@gmail.com> wrote: >>> >>>> Hmm, beam already owns the cli parsing - that is what I meant - it only >>>> misses the arg delimiter (ie quoting) and adding it is easy no? >>>> >>>> Le 9 janv. 2018 21:19, "Robert Bradshaw" <rober...@google.com> a >>>> écrit : >>>> >>>>> On Tue, Jan 9, 2018 at 11:48 AM, Romain Manni-Bucau >>>>> <rmannibu...@gmail.com> wrote: >>>>> > >>>>> > Le 9 janv. 2018 19:50, "Robert Bradshaw" <rober...@google.com> a >>>>> écrit : >>>>> > >>>>> > From what I understand: >>>>> > >>>>> > 1) The command line argument values (both simple and more complex) >>>>> are >>>>> > all JSON-representable. >>>>> > >>>>> > And must be all CLI representable >>>>> >>>>> Sorry, I should have said "all pipeline options." In any case, one can >>>>> always do JSON -> string and the resulting string is usable as a >>>>> command line argument. >>>>> >>>>> > 2) The command line is a mapping of keys to these values. >>>>> > >>>>> > Which makes your 1 not true since json supports more ;) >>>>> > >>>>> > As such, it seems quite natural to represent the whole set as a >>>>> single >>>>> > JSON map, rather than using a different, custom encoding for the top >>>>> > level (whose custom escaping would have to be carried into the inner >>>>> > JSON values). Note that JSON has the advantage that one never needs >>>>> to >>>>> > explain or define it, and parsers/serializers already exists for all >>>>> > languages (e.g. if one has a driver script in another language for >>>>> > launching a java pipeline, it's easy to communicate all the args). >>>>> > >>>>> > Same reasonning applies to CLI AFAIK >>>>> >>>>> The spec of what a valid command line argument list is is surprisingly >>>>> inconsistent across platforms, languages, and programs. And your >>>>> proposal seems to be getting into what the delimiter is, and how to >>>>> escape it, and possibly then how to escape the escape character. All >>>>> of this is sidestepped by pointing at an existing spec. >>>>> >>>>> > We can't get rid of Jackson in the core because of (1) so there's >>>>> > little value in adding complexity to remove it from (2). The fact >>>>> that >>>>> > Java doesn't ship anything in its expansive standard library for this >>>>> > is unfortuante, so we have to take a dependency on something. >>>>> > >>>>> > We actually can as shown before >>>>> >>>>> How, if JSON is integral to the parsing of the argument values >>>>> themselves? (Or is the argument that it's not, but each extender of >>>>> PipelineOptions is responsible for choosing a JSON parser and doing it >>>>> themselves.) >>>>> >>>>> > and jackson is not a simple negligible lib >>>>> > but a classpath breaker :( >>>>> >>>>> Perhaps there are better libraries we could use instead? Is the >>>>> ecosystem so bad that it encourages projects to roll their own of >>>>> everything rather than share code? >>>>> >>>>> > On Tue, Jan 9, 2018 at 10:00 AM, Lukasz Cwik <lc...@google.com> >>>>> wrote: >>>>> >> Ah, you don't want JSON within JSON. I see, if thats the case just >>>>> migrate >>>>> >> all of them to string tokenization and drop the Jackson usage for >>>>> string >>>>> >> -> >>>>> >> string[] conversion. >>>>> >> >>>>> >> On Mon, Jan 8, 2018 at 10:06 PM, Romain Manni-Bucau >>>>> >> <rmannibu...@gmail.com> >>>>> >> wrote: >>>>> >>> >>>>> >>> Lukasz, the use case is to znsure the config used can still map >>>>> the CLI >>>>> >>> and that options dont start to abuse json so it is exactly the >>>>> opposite >>>>> >>> of >>>>> >>> "we can be fancy" and is closer to "we can be robust". Also the >>>>> default >>>>> >>> should be easy and not json (i just want to set the runner, >>>>> --runner=xxx >>>>> >>> is >>>>> >>> easier than a json version). If the value doesnt start with a [ we >>>>> can >>>>> >>> use >>>>> >>> the tokenizer else jackson, wdyt? >>>>> >>> >>>>> >>> >>>>> >>> Le 8 janv. 2018 22:59, "Lukasz Cwik" <lc...@google.com> a écrit : >>>>> >>> >>>>> >>> Now that I think about this more. I looked at some of the examples >>>>> in the >>>>> >>> pom.xml and they don't seem to be tricky to write in JSON. >>>>> >>> I also looked at the Jenkins job configurations (specifically the >>>>> >>> performance tests) and they pass around maps which they convert to >>>>> the >>>>> >>> required format without needing a user to write it themselves. >>>>> >>> Using gradle, we will be able to trivially to do the same thing >>>>> (convert >>>>> >>> maps to json without needing the person to write it by hand) like >>>>> groovy >>>>> >>> does. >>>>> >>> Since we are migrating away from Maven it doesn't seem worthwhile >>>>> to >>>>> >>> spend >>>>> >>> time on to make it easier to write the args in the Maven poms. >>>>> >>> >>>>> >>> Is there another use case that is being missed? >>>>> >>> >>>>> >>> On Mon, Jan 8, 2018 at 1:38 PM, Romain Manni-Bucau >>>>> >>> <rmannibu...@gmail.com> >>>>> >>> wrote: >>>>> >>>> >>>>> >>>> Good point for \t and ,. >>>>> >>>> >>>>> >>>> Any objection to use jackson as a fallback for that purpose - for >>>>> >>>> backwqrd compat only - and make it optional then? Will create the >>>>> ticket >>>>> >>>> if >>>>> >>>> not. >>>>> >>>> >>>>> >>>> Le 8 janv. 2018 20:32, "Robert Bradshaw" <rober...@google.com> a >>>>> écrit : >>>>> >>>>> >>>>> >>>>> Part of the motivation to use JSON for more complex options was >>>>> that >>>>> >>>>> it avoids having to define (and document, test, have users >>>>> learn, ...) >>>>> >>>>> yet another format for expressing lists, maps, etc. >>>>> >>>>> >>>>> >>>>> On Mon, Jan 8, 2018 at 11:19 AM, Lukasz Cwik <lc...@google.com> >>>>> wrote: >>>>> >>>>> > Ken, this is specifically about running integration tests and >>>>> not a >>>>> >>>>> > users >>>>> >>>>> > main(). >>>>> >>>>> > >>>>> >>>>> > Note, that PipelineOptions JSON format was used because it was >>>>> a >>>>> >>>>> > convenient >>>>> >>>>> > serialized form that is easy to explain to people what is >>>>> required. >>>>> >>>>> > Using a different string tokenizer and calling >>>>> >>>>> > PipelineOptionsFactory.fromArgs() with the parsed strings >>>>> seems like >>>>> >>>>> > it >>>>> >>>>> > would be better. >>>>> >>>>> > >>>>> >>>>> > These are the supported formats for fromArgs(): >>>>> >>>>> > * --project=MyProject (simple property, will set the >>>>> "project" >>>>> >>>>> > property >>>>> >>>>> > to "MyProject") >>>>> >>>>> > * --readOnly=true (for boolean properties, will set the >>>>> >>>>> > "readOnly" >>>>> >>>>> > property to "true") >>>>> >>>>> > * --readOnly (shorthand for boolean properties, will set >>>>> the >>>>> >>>>> > "readOnly" >>>>> >>>>> > property to "true") >>>>> >>>>> > * --x=1 --x=2 --x=3 (list style simple property, will set >>>>> the >>>>> >>>>> > "x" >>>>> >>>>> > property to [1, 2, 3]) >>>>> >>>>> > * --x=1,2,3 (shorthand list style simple property, will >>>>> set the >>>>> >>>>> > "x" >>>>> >>>>> > property to [1, 2, 3]) >>>>> >>>>> > * --complexObject='{"key1":"value1",...} (JSON format >>>>> for all >>>>> >>>>> > other >>>>> >>>>> > complex types) >>>>> >>>>> > >>>>> >>>>> > Using a string tokenizer that minimizes the number of required >>>>> escape >>>>> >>>>> > characters would be good so we could use newline characters as >>>>> our >>>>> >>>>> > only >>>>> >>>>> > token. I would avoid ',\t ' as tokens since they are more >>>>> likely to >>>>> >>>>> > appear. >>>>> >>>>> > >>>>> >>>>> > On Mon, Jan 8, 2018 at 10:33 AM, Kenneth Knowles < >>>>> k...@google.com> >>>>> >>>>> > wrote: >>>>> >>>>> >> >>>>> >>>>> >> We do have a plain command line syntax, and whoever writes the >>>>> >>>>> >> main(String[]) function is responsible for invoking the >>>>> parser. It >>>>> >>>>> >> isn't >>>>> >>>>> >> quite as nice as standard arg parse libraries, but it isn't >>>>> too bad. >>>>> >>>>> >> It >>>>> >>>>> >> would be great to improve, though. >>>>> >>>>> >> >>>>> >>>>> >> Jackson is for machine-to-machine communication or other >>>>> situations >>>>> >>>>> >> where >>>>> >>>>> >> command line parsing doesn't work so well. >>>>> >>>>> >> >>>>> >>>>> >> Are we using these some other way? >>>>> >>>>> >> >>>>> >>>>> >> Kenn >>>>> >>>>> >> >>>>> >>>>> >> On Sun, Jan 7, 2018 at 7:21 AM, Romain Manni-Bucau >>>>> >>>>> >> <rmannibu...@gmail.com> >>>>> >>>>> >> wrote: >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> >>> Le 7 janv. 2018 12:53, "Jean-Baptiste Onofré" < >>>>> j...@nanthrax.net> a >>>>> >>>>> >>> écrit : >>>>> >>>>> >>> >>>>> >>>>> >>> Hi Romain, >>>>> >>>>> >>> >>>>> >>>>> >>> I guess you are assuming that pipeline options are flat >>>>> command >>>>> >>>>> >>> line >>>>> >>>>> >>> like >>>>> >>>>> >>> argument, right ? >>>>> >>>>> >>> >>>>> >>>>> >>> Actually, theoretically, pipeline options can be >>>>> represented as >>>>> >>>>> >>> json, >>>>> >>>>> >>> that's why we use jackson. >>>>> >>>>> >>> The pipeline options can be serialized/deserialized as json. >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> >>> Yes but if users (or dev ;)) start to use that then it is >>>>> trivial >>>>> >>>>> >>> to >>>>> >>>>> >>> break the cli handling and fromArgs parsing or, if not, >>>>> break the >>>>> >>>>> >>> user >>>>> >>>>> >>> experience. So at the end it is a kind of "yes but no", >>>>> right? >>>>> >>>>> >>> >>>>> >>>>> >>> PS: already see some advanced users having a headache trying >>>>> to >>>>> >>>>> >>> pass >>>>> >>>>> >>> pipeline options in json so using the plain command line >>>>> syntax can >>>>> >>>>> >>> be more >>>>> >>>>> >>> friendly too. >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> >>> The purpose is to remove the jackson dependencies ? >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> >>> Later yes, seeing core dep tree I identify a lot of dep >>>>> which can >>>>> >>>>> >>> conflict in some env and are not really needed or bring much >>>>> being >>>>> >>>>> >>> in the >>>>> >>>>> >>> core - like avro as mentionned in another thread. Can need a >>>>> >>>>> >>> sanitizing >>>>> >>>>> >>> round. Short term it was really a "why is it that >>>>> complicated" ;). >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> >>> Regards >>>>> >>>>> >>> JB >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> >>> On 01/07/2018 11:38 AM, Romain Manni-Bucau wrote: >>>>> >>>>> >>>> >>>>> >>>>> >>>> Hi guys, >>>>> >>>>> >>>> >>>>> >>>>> >>>> not sure i fully get why jackson is used to parse pipeline >>>>> options >>>>> >>>>> >>>> in >>>>> >>>>> >>>> the testing integration >>>>> >>>>> >>>> >>>>> >>>>> >>>> why not using a token parser like [1] which allows to map >>>>> 1-1 with >>>>> >>>>> >>>> the >>>>> >>>>> >>>> user interface (command line) the options? >>>>> >>>>> >>>> >>>>> >>>>> >>>> [1] >>>>> >>>>> >>>> >>>>> >>>>> >>>> >>>>> >>>>> >>>> https://github.com/Talend/comp >>>>> onent-runtime/blob/master/component-server/src/main/java/org >>>>> /talend/sdk/component/server/lang/StringPropertiesTokenizer.java >>>>> >>>>> >>>> >>>>> >>>>> >>>> Romain Manni-Bucau >>>>> >>>>> >>>> @rmannibucau <https://twitter.com/rmannibucau> | Blog >>>>> >>>>> >>>> <https://rmannibucau.metawerx.net/> | Old Blog >>>>> >>>>> >>>> <http://rmannibucau.wordpress.com> | Github >>>>> >>>>> >>>> <https://github.com/rmannibucau> >>>>> >>>>> >>>> | LinkedIn <https://www.linkedin.com/in/rmannibucau> >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> >>> -- >>>>> >>>>> >>> Jean-Baptiste Onofré >>>>> >>>>> >>> jbono...@apache.org >>>>> >>>>> >>> http://blog.nanthrax.net >>>>> >>>>> >>> Talend - http://www.talend.com >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> >> >>>>> >>>>> > >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >> >>>>> > >>>>> > >>>>> >>>> >>> >