updated my junit 5 PR to show that:
https://github.com/apache/beam/pull/4360/files#diff-578d1770f8b47ebbc1e74a2c19de9a6aR28

It doesn't remove jackson yet but exposes a nicer user interface for the
config.

I'm not fully clear on all the jackson usage yet, there are some round
trips (PO -> json -> PO) which are weird without more knowledge.


Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> |  Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau>

2018-01-09 22:57 GMT+01:00 Lukasz Cwik <lc...@google.com>:

> Removing large dependencies is great but I can only assume it will be a
> lot of work so if you can get it to work in a backwards compatible way
> great.
>
> On Tue, Jan 9, 2018 at 1:52 PM, Romain Manni-Bucau <rmannibu...@gmail.com>
> wrote:
>
>> It conflicts easily between libs and containers. Shade is not a good
>> option too - see the thread on this topic :(.
>>
>> At the end i see using the cli solution closer to user - vs framework dev
>> for json - and hurting less in terms of classpath so pby sthg to test no?
>>
>> Le 9 janv. 2018 22:47, "Lukasz Cwik" <lc...@google.com> a écrit :
>>
>>> Romain, how has Jackson been a classpath breaker?
>>>
>>>
>>> On Tue, Jan 9, 2018 at 1:20 PM, Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
>>>> Hmm, beam already owns the cli parsing - that is what I meant - it only
>>>> misses the arg delimiter (ie quoting) and adding it is easy no?
>>>>
>>>> Le 9 janv. 2018 21:19, "Robert Bradshaw" <rober...@google.com> a
>>>> écrit :
>>>>
>>>>> On Tue, Jan 9, 2018 at 11:48 AM, Romain Manni-Bucau
>>>>> <rmannibu...@gmail.com> wrote:
>>>>> >
>>>>> > Le 9 janv. 2018 19:50, "Robert Bradshaw" <rober...@google.com> a
>>>>> écrit :
>>>>> >
>>>>> > From what I understand:
>>>>> >
>>>>> > 1) The command line argument values (both simple and more complex)
>>>>> are
>>>>> > all JSON-representable.
>>>>> >
>>>>> > And must be all CLI representable
>>>>>
>>>>> Sorry, I should have said "all pipeline options." In any case, one can
>>>>> always do JSON -> string and the resulting string is usable as a
>>>>> command line argument.
>>>>>
>>>>> > 2) The command line is a mapping of keys to these values.
>>>>> >
>>>>> > Which makes your 1 not true since json supports more ;)
>>>>> >
>>>>> > As such, it seems quite natural to represent the whole set as a
>>>>> single
>>>>> > JSON map, rather than using a different, custom encoding for the top
>>>>> > level (whose custom escaping would have to be carried into the inner
>>>>> > JSON values). Note that JSON has the advantage that one never needs
>>>>> to
>>>>> > explain or define it, and parsers/serializers already exists for all
>>>>> > languages (e.g. if one has a driver script in another language for
>>>>> > launching a java pipeline, it's easy to communicate all the args).
>>>>> >
>>>>> > Same reasonning applies to CLI AFAIK
>>>>>
>>>>> The spec of what a valid command line argument list is is surprisingly
>>>>> inconsistent across platforms, languages, and programs. And your
>>>>> proposal seems to be getting into what the delimiter is, and how to
>>>>> escape it, and possibly then how to escape the escape character. All
>>>>> of this is sidestepped by pointing at an existing spec.
>>>>>
>>>>> > We can't get rid of Jackson in the core because of (1) so there's
>>>>> > little value in adding complexity to remove it from (2). The fact
>>>>> that
>>>>> > Java doesn't ship anything in its expansive standard library for this
>>>>> > is unfortuante, so we have to take a dependency on something.
>>>>> >
>>>>> > We actually can as shown before
>>>>>
>>>>> How, if JSON is integral to the parsing of the argument values
>>>>> themselves? (Or is the argument that it's not, but each extender of
>>>>> PipelineOptions is responsible for choosing a JSON parser and doing it
>>>>> themselves.)
>>>>>
>>>>> > and jackson is not a simple negligible lib
>>>>> > but a classpath breaker :(
>>>>>
>>>>> Perhaps there are better libraries we could use instead? Is the
>>>>> ecosystem so bad that it encourages projects to roll their own of
>>>>> everything rather than share code?
>>>>>
>>>>> > On Tue, Jan 9, 2018 at 10:00 AM, Lukasz Cwik <lc...@google.com>
>>>>> wrote:
>>>>> >> Ah, you don't want JSON within JSON. I see, if thats the case just
>>>>> migrate
>>>>> >> all of them to string tokenization and drop the Jackson usage for
>>>>> string
>>>>> >> ->
>>>>> >> string[] conversion.
>>>>> >>
>>>>> >> On Mon, Jan 8, 2018 at 10:06 PM, Romain Manni-Bucau
>>>>> >> <rmannibu...@gmail.com>
>>>>> >> wrote:
>>>>> >>>
>>>>> >>> Lukasz, the use case is to znsure the config used can still map
>>>>> the CLI
>>>>> >>> and that options dont start to abuse json so it is exactly the
>>>>> opposite
>>>>> >>> of
>>>>> >>> "we can be fancy" and is closer to "we can be robust". Also the
>>>>> default
>>>>> >>> should be easy and not json (i just want to set the runner,
>>>>> --runner=xxx
>>>>> >>> is
>>>>> >>> easier than a json version). If the value doesnt start with a [ we
>>>>> can
>>>>> >>> use
>>>>> >>> the tokenizer else jackson, wdyt?
>>>>> >>>
>>>>> >>>
>>>>> >>> Le 8 janv. 2018 22:59, "Lukasz Cwik" <lc...@google.com> a écrit :
>>>>> >>>
>>>>> >>> Now that I think about this more. I looked at some of the examples
>>>>> in the
>>>>> >>> pom.xml and they don't seem to be tricky to write in JSON.
>>>>> >>> I also looked at the Jenkins job configurations (specifically the
>>>>> >>> performance tests) and they pass around maps which they convert to
>>>>> the
>>>>> >>> required format without needing a user to write it themselves.
>>>>> >>> Using gradle, we will be able to trivially to do the same thing
>>>>> (convert
>>>>> >>> maps to json without needing the person to write it by hand) like
>>>>> groovy
>>>>> >>> does.
>>>>> >>> Since we are migrating away from Maven it doesn't seem worthwhile
>>>>> to
>>>>> >>> spend
>>>>> >>> time on to make it easier to write the args in the Maven poms.
>>>>> >>>
>>>>> >>> Is there another use case that is being missed?
>>>>> >>>
>>>>> >>> On Mon, Jan 8, 2018 at 1:38 PM, Romain Manni-Bucau
>>>>> >>> <rmannibu...@gmail.com>
>>>>> >>> wrote:
>>>>> >>>>
>>>>> >>>> Good point for \t and ,.
>>>>> >>>>
>>>>> >>>> Any objection to use jackson as a fallback for that purpose - for
>>>>> >>>> backwqrd compat only - and make it optional then? Will create the
>>>>> ticket
>>>>> >>>> if
>>>>> >>>> not.
>>>>> >>>>
>>>>> >>>> Le 8 janv. 2018 20:32, "Robert Bradshaw" <rober...@google.com> a
>>>>> écrit :
>>>>> >>>>>
>>>>> >>>>> Part of the motivation to use JSON for more complex options was
>>>>> that
>>>>> >>>>> it avoids having to define (and document, test, have users
>>>>> learn, ...)
>>>>> >>>>> yet another format for expressing lists, maps, etc.
>>>>> >>>>>
>>>>> >>>>> On Mon, Jan 8, 2018 at 11:19 AM, Lukasz Cwik <lc...@google.com>
>>>>> wrote:
>>>>> >>>>> > Ken, this is specifically about running integration tests and
>>>>> not a
>>>>> >>>>> > users
>>>>> >>>>> > main().
>>>>> >>>>> >
>>>>> >>>>> > Note, that PipelineOptions JSON format was used because it was
>>>>> a
>>>>> >>>>> > convenient
>>>>> >>>>> > serialized form that is easy to explain to people what is
>>>>> required.
>>>>> >>>>> > Using a different string tokenizer and calling
>>>>> >>>>> > PipelineOptionsFactory.fromArgs() with the parsed strings
>>>>> seems like
>>>>> >>>>> > it
>>>>> >>>>> > would be better.
>>>>> >>>>> >
>>>>> >>>>> > These are the supported formats for fromArgs():
>>>>> >>>>> >    *   --project=MyProject (simple property, will set the
>>>>> "project"
>>>>> >>>>> > property
>>>>> >>>>> > to "MyProject")
>>>>> >>>>> >    *   --readOnly=true (for boolean properties, will set the
>>>>> >>>>> > "readOnly"
>>>>> >>>>> > property to "true")
>>>>> >>>>> >    *   --readOnly (shorthand for boolean properties, will set
>>>>> the
>>>>> >>>>> > "readOnly"
>>>>> >>>>> > property to "true")
>>>>> >>>>> >    *   --x=1 --x=2 --x=3 (list style simple property, will set
>>>>> the
>>>>> >>>>> > "x"
>>>>> >>>>> > property to [1, 2, 3])
>>>>> >>>>> >    *   --x=1,2,3 (shorthand list style simple property, will
>>>>> set the
>>>>> >>>>> > "x"
>>>>> >>>>> > property to [1, 2, 3])
>>>>> >>>>> >    *   --complexObject='{"key1":"value1",...} (JSON format
>>>>> for all
>>>>> >>>>> > other
>>>>> >>>>> > complex types)
>>>>> >>>>> >
>>>>> >>>>> > Using a string tokenizer that minimizes the number of required
>>>>> escape
>>>>> >>>>> > characters would be good so we could use newline characters as
>>>>> our
>>>>> >>>>> > only
>>>>> >>>>> > token. I would avoid ',\t ' as tokens since they are more
>>>>> likely to
>>>>> >>>>> > appear.
>>>>> >>>>> >
>>>>> >>>>> > On Mon, Jan 8, 2018 at 10:33 AM, Kenneth Knowles <
>>>>> k...@google.com>
>>>>> >>>>> > wrote:
>>>>> >>>>> >>
>>>>> >>>>> >> We do have a plain command line syntax, and whoever writes the
>>>>> >>>>> >> main(String[]) function is responsible for invoking the
>>>>> parser. It
>>>>> >>>>> >> isn't
>>>>> >>>>> >> quite as nice as standard arg parse libraries, but it isn't
>>>>> too bad.
>>>>> >>>>> >> It
>>>>> >>>>> >> would be great to improve, though.
>>>>> >>>>> >>
>>>>> >>>>> >> Jackson is for machine-to-machine communication or other
>>>>> situations
>>>>> >>>>> >> where
>>>>> >>>>> >> command line parsing doesn't work so well.
>>>>> >>>>> >>
>>>>> >>>>> >> Are we using these some other way?
>>>>> >>>>> >>
>>>>> >>>>> >> Kenn
>>>>> >>>>> >>
>>>>> >>>>> >> On Sun, Jan 7, 2018 at 7:21 AM, Romain Manni-Bucau
>>>>> >>>>> >> <rmannibu...@gmail.com>
>>>>> >>>>> >> wrote:
>>>>> >>>>> >>>
>>>>> >>>>> >>>
>>>>> >>>>> >>>
>>>>> >>>>> >>> Le 7 janv. 2018 12:53, "Jean-Baptiste Onofré" <
>>>>> j...@nanthrax.net> a
>>>>> >>>>> >>> écrit :
>>>>> >>>>> >>>
>>>>> >>>>> >>> Hi Romain,
>>>>> >>>>> >>>
>>>>> >>>>> >>> I guess you are assuming that pipeline options are flat
>>>>> command
>>>>> >>>>> >>> line
>>>>> >>>>> >>> like
>>>>> >>>>> >>> argument, right ?
>>>>> >>>>> >>>
>>>>> >>>>> >>> Actually, theoretically,  pipeline options can be
>>>>> represented as
>>>>> >>>>> >>> json,
>>>>> >>>>> >>> that's why we use jackson.
>>>>> >>>>> >>> The pipeline options can be serialized/deserialized as json.
>>>>> >>>>> >>>
>>>>> >>>>> >>>
>>>>> >>>>> >>> Yes but if users (or dev ;)) start to use that then it is
>>>>> trivial
>>>>> >>>>> >>> to
>>>>> >>>>> >>> break the cli handling and fromArgs parsing or, if not,
>>>>> break the
>>>>> >>>>> >>> user
>>>>> >>>>> >>> experience. So at the end it is a kind of "yes but no",
>>>>> right?
>>>>> >>>>> >>>
>>>>> >>>>> >>> PS: already see some advanced users having a headache trying
>>>>> to
>>>>> >>>>> >>> pass
>>>>> >>>>> >>> pipeline options in json so using the plain command line
>>>>> syntax can
>>>>> >>>>> >>> be more
>>>>> >>>>> >>> friendly too.
>>>>> >>>>> >>>
>>>>> >>>>> >>>
>>>>> >>>>> >>> The purpose is to remove the jackson dependencies ?
>>>>> >>>>> >>>
>>>>> >>>>> >>>
>>>>> >>>>> >>> Later yes, seeing core dep tree I identify a lot of dep
>>>>> which can
>>>>> >>>>> >>> conflict in some env and are not really needed or bring much
>>>>> being
>>>>> >>>>> >>> in the
>>>>> >>>>> >>> core - like avro as mentionned in another thread. Can need a
>>>>> >>>>> >>> sanitizing
>>>>> >>>>> >>> round. Short term it was really a "why is it that
>>>>> complicated" ;).
>>>>> >>>>> >>>
>>>>> >>>>> >>>
>>>>> >>>>> >>>
>>>>> >>>>> >>> Regards
>>>>> >>>>> >>> JB
>>>>> >>>>> >>>
>>>>> >>>>> >>>
>>>>> >>>>> >>> On 01/07/2018 11:38 AM, Romain Manni-Bucau wrote:
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> Hi guys,
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> not sure i fully get why jackson is used to parse pipeline
>>>>> options
>>>>> >>>>> >>>> in
>>>>> >>>>> >>>> the testing integration
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> why not using a token parser like [1] which allows to map
>>>>> 1-1 with
>>>>> >>>>> >>>> the
>>>>> >>>>> >>>> user interface (command line) the options?
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> [1]
>>>>> >>>>> >>>>
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> https://github.com/Talend/comp
>>>>> onent-runtime/blob/master/component-server/src/main/java/org
>>>>> /talend/sdk/component/server/lang/StringPropertiesTokenizer.java
>>>>> >>>>> >>>>
>>>>> >>>>> >>>> Romain Manni-Bucau
>>>>> >>>>> >>>> @rmannibucau <https://twitter.com/rmannibucau> | Blog
>>>>> >>>>> >>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>>>> >>>>> >>>> <http://rmannibucau.wordpress.com> | Github
>>>>> >>>>> >>>> <https://github.com/rmannibucau>
>>>>> >>>>> >>>> | LinkedIn <https://www.linkedin.com/in/rmannibucau>
>>>>> >>>>> >>>
>>>>> >>>>> >>>
>>>>> >>>>> >>> --
>>>>> >>>>> >>> Jean-Baptiste Onofré
>>>>> >>>>> >>> jbono...@apache.org
>>>>> >>>>> >>> http://blog.nanthrax.net
>>>>> >>>>> >>> Talend - http://www.talend.com
>>>>> >>>>> >>>
>>>>> >>>>> >>>
>>>>> >>>>> >>
>>>>> >>>>> >
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>
>>>>> >
>>>>> >
>>>>>
>>>>
>>>
>

Reply via email to