1) TestPipeline#convertToArgs is meant to convert TestPipelineOptions into
a String[] containing --arg=value for integration tests that only have a
main(String[] arg) entry point like WordCount. There is this PR[1] that is
outstanding that is attempting to clean this up and simplify it so that we
aren't doing this haphazard conversion.
[1] https://github.com/apache/beam/pull/4346

I haven't yet commented on the PR but I was going to suggest that the user
add String[] getMainArgs to TestPipelineOptions and not do the
convertToArgs hackiness for ITs that use main().

On Thu, Jan 11, 2018 at 1:53 PM, Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:

> Some inputs - maybe comments? - would be appreciated on:
>
> 1. Why using json in https://github.com/apache/beam/blob/master/sdks/java/
> core/src/main/java/org/apache/beam/sdk/testing/TestPipeline.java#L480 and
> not iterate over the options?
> 2. Also this is likely used to do a fromArgs and create another pipeline
> options later no? Is this api that useful here or does it need an options
> composition - fully object?
>
> Le 11 janv. 2018 19:09, "Lukasz Cwik" <lc...@google.com> a écrit :
>
>> Give links to the code segments that you want background on.
>>
>> On Wed, Jan 10, 2018 at 12:44 AM, Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>> updated my junit 5 PR to show that: https://github.com/apach
>>> e/beam/pull/4360/files#diff-578d1770f8b47ebbc1e74a2c19de9a6aR28
>>>
>>> It doesn't remove jackson yet but exposes a nicer user interface for the
>>> config.
>>>
>>> I'm not fully clear on all the jackson usage yet, there are some round
>>> trips (PO -> json -> PO) which are weird without more knowledge.
>>>
>>>
>>> Romain Manni-Bucau
>>> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>> <http://rmannibucau.wordpress.com> | Github
>>> <https://github.com/rmannibucau> | LinkedIn
>>> <https://www.linkedin.com/in/rmannibucau>
>>>
>>> 2018-01-09 22:57 GMT+01:00 Lukasz Cwik <lc...@google.com>:
>>>
>>>> Removing large dependencies is great but I can only assume it will be a
>>>> lot of work so if you can get it to work in a backwards compatible way
>>>> great.
>>>>
>>>> On Tue, Jan 9, 2018 at 1:52 PM, Romain Manni-Bucau <
>>>> rmannibu...@gmail.com> wrote:
>>>>
>>>>> It conflicts easily between libs and containers. Shade is not a good
>>>>> option too - see the thread on this topic :(.
>>>>>
>>>>> At the end i see using the cli solution closer to user - vs framework
>>>>> dev for json - and hurting less in terms of classpath so pby sthg to test
>>>>> no?
>>>>>
>>>>> Le 9 janv. 2018 22:47, "Lukasz Cwik" <lc...@google.com> a écrit :
>>>>>
>>>>>> Romain, how has Jackson been a classpath breaker?
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 9, 2018 at 1:20 PM, Romain Manni-Bucau <
>>>>>> rmannibu...@gmail.com> wrote:
>>>>>>
>>>>>>> Hmm, beam already owns the cli parsing - that is what I meant - it
>>>>>>> only misses the arg delimiter (ie quoting) and adding it is easy no?
>>>>>>>
>>>>>>> Le 9 janv. 2018 21:19, "Robert Bradshaw" <rober...@google.com> a
>>>>>>> écrit :
>>>>>>>
>>>>>>>> On Tue, Jan 9, 2018 at 11:48 AM, Romain Manni-Bucau
>>>>>>>> <rmannibu...@gmail.com> wrote:
>>>>>>>> >
>>>>>>>> > Le 9 janv. 2018 19:50, "Robert Bradshaw" <rober...@google.com> a
>>>>>>>> écrit :
>>>>>>>> >
>>>>>>>> > From what I understand:
>>>>>>>> >
>>>>>>>> > 1) The command line argument values (both simple and more
>>>>>>>> complex) are
>>>>>>>> > all JSON-representable.
>>>>>>>> >
>>>>>>>> > And must be all CLI representable
>>>>>>>>
>>>>>>>> Sorry, I should have said "all pipeline options." In any case, one
>>>>>>>> can
>>>>>>>> always do JSON -> string and the resulting string is usable as a
>>>>>>>> command line argument.
>>>>>>>>
>>>>>>>> > 2) The command line is a mapping of keys to these values.
>>>>>>>> >
>>>>>>>> > Which makes your 1 not true since json supports more ;)
>>>>>>>> >
>>>>>>>> > As such, it seems quite natural to represent the whole set as a
>>>>>>>> single
>>>>>>>> > JSON map, rather than using a different, custom encoding for the
>>>>>>>> top
>>>>>>>> > level (whose custom escaping would have to be carried into the
>>>>>>>> inner
>>>>>>>> > JSON values). Note that JSON has the advantage that one never
>>>>>>>> needs to
>>>>>>>> > explain or define it, and parsers/serializers already exists for
>>>>>>>> all
>>>>>>>> > languages (e.g. if one has a driver script in another language for
>>>>>>>> > launching a java pipeline, it's easy to communicate all the args).
>>>>>>>> >
>>>>>>>> > Same reasonning applies to CLI AFAIK
>>>>>>>>
>>>>>>>> The spec of what a valid command line argument list is is
>>>>>>>> surprisingly
>>>>>>>> inconsistent across platforms, languages, and programs. And your
>>>>>>>> proposal seems to be getting into what the delimiter is, and how to
>>>>>>>> escape it, and possibly then how to escape the escape character. All
>>>>>>>> of this is sidestepped by pointing at an existing spec.
>>>>>>>>
>>>>>>>> > We can't get rid of Jackson in the core because of (1) so there's
>>>>>>>> > little value in adding complexity to remove it from (2). The fact
>>>>>>>> that
>>>>>>>> > Java doesn't ship anything in its expansive standard library for
>>>>>>>> this
>>>>>>>> > is unfortuante, so we have to take a dependency on something.
>>>>>>>> >
>>>>>>>> > We actually can as shown before
>>>>>>>>
>>>>>>>> How, if JSON is integral to the parsing of the argument values
>>>>>>>> themselves? (Or is the argument that it's not, but each extender of
>>>>>>>> PipelineOptions is responsible for choosing a JSON parser and doing
>>>>>>>> it
>>>>>>>> themselves.)
>>>>>>>>
>>>>>>>> > and jackson is not a simple negligible lib
>>>>>>>> > but a classpath breaker :(
>>>>>>>>
>>>>>>>> Perhaps there are better libraries we could use instead? Is the
>>>>>>>> ecosystem so bad that it encourages projects to roll their own of
>>>>>>>> everything rather than share code?
>>>>>>>>
>>>>>>>> > On Tue, Jan 9, 2018 at 10:00 AM, Lukasz Cwik <lc...@google.com>
>>>>>>>> wrote:
>>>>>>>> >> Ah, you don't want JSON within JSON. I see, if thats the case
>>>>>>>> just migrate
>>>>>>>> >> all of them to string tokenization and drop the Jackson usage
>>>>>>>> for string
>>>>>>>> >> ->
>>>>>>>> >> string[] conversion.
>>>>>>>> >>
>>>>>>>> >> On Mon, Jan 8, 2018 at 10:06 PM, Romain Manni-Bucau
>>>>>>>> >> <rmannibu...@gmail.com>
>>>>>>>> >> wrote:
>>>>>>>> >>>
>>>>>>>> >>> Lukasz, the use case is to znsure the config used can still map
>>>>>>>> the CLI
>>>>>>>> >>> and that options dont start to abuse json so it is exactly the
>>>>>>>> opposite
>>>>>>>> >>> of
>>>>>>>> >>> "we can be fancy" and is closer to "we can be robust". Also the
>>>>>>>> default
>>>>>>>> >>> should be easy and not json (i just want to set the runner,
>>>>>>>> --runner=xxx
>>>>>>>> >>> is
>>>>>>>> >>> easier than a json version). If the value doesnt start with a [
>>>>>>>> we can
>>>>>>>> >>> use
>>>>>>>> >>> the tokenizer else jackson, wdyt?
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> Le 8 janv. 2018 22:59, "Lukasz Cwik" <lc...@google.com> a
>>>>>>>> écrit :
>>>>>>>> >>>
>>>>>>>> >>> Now that I think about this more. I looked at some of the
>>>>>>>> examples in the
>>>>>>>> >>> pom.xml and they don't seem to be tricky to write in JSON.
>>>>>>>> >>> I also looked at the Jenkins job configurations (specifically
>>>>>>>> the
>>>>>>>> >>> performance tests) and they pass around maps which they convert
>>>>>>>> to the
>>>>>>>> >>> required format without needing a user to write it themselves.
>>>>>>>> >>> Using gradle, we will be able to trivially to do the same thing
>>>>>>>> (convert
>>>>>>>> >>> maps to json without needing the person to write it by hand)
>>>>>>>> like groovy
>>>>>>>> >>> does.
>>>>>>>> >>> Since we are migrating away from Maven it doesn't seem
>>>>>>>> worthwhile to
>>>>>>>> >>> spend
>>>>>>>> >>> time on to make it easier to write the args in the Maven poms.
>>>>>>>> >>>
>>>>>>>> >>> Is there another use case that is being missed?
>>>>>>>> >>>
>>>>>>>> >>> On Mon, Jan 8, 2018 at 1:38 PM, Romain Manni-Bucau
>>>>>>>> >>> <rmannibu...@gmail.com>
>>>>>>>> >>> wrote:
>>>>>>>> >>>>
>>>>>>>> >>>> Good point for \t and ,.
>>>>>>>> >>>>
>>>>>>>> >>>> Any objection to use jackson as a fallback for that purpose -
>>>>>>>> for
>>>>>>>> >>>> backwqrd compat only - and make it optional then? Will create
>>>>>>>> the ticket
>>>>>>>> >>>> if
>>>>>>>> >>>> not.
>>>>>>>> >>>>
>>>>>>>> >>>> Le 8 janv. 2018 20:32, "Robert Bradshaw" <rober...@google.com>
>>>>>>>> a écrit :
>>>>>>>> >>>>>
>>>>>>>> >>>>> Part of the motivation to use JSON for more complex options
>>>>>>>> was that
>>>>>>>> >>>>> it avoids having to define (and document, test, have users
>>>>>>>> learn, ...)
>>>>>>>> >>>>> yet another format for expressing lists, maps, etc.
>>>>>>>> >>>>>
>>>>>>>> >>>>> On Mon, Jan 8, 2018 at 11:19 AM, Lukasz Cwik <
>>>>>>>> lc...@google.com> wrote:
>>>>>>>> >>>>> > Ken, this is specifically about running integration tests
>>>>>>>> and not a
>>>>>>>> >>>>> > users
>>>>>>>> >>>>> > main().
>>>>>>>> >>>>> >
>>>>>>>> >>>>> > Note, that PipelineOptions JSON format was used because it
>>>>>>>> was a
>>>>>>>> >>>>> > convenient
>>>>>>>> >>>>> > serialized form that is easy to explain to people what is
>>>>>>>> required.
>>>>>>>> >>>>> > Using a different string tokenizer and calling
>>>>>>>> >>>>> > PipelineOptionsFactory.fromArgs() with the parsed strings
>>>>>>>> seems like
>>>>>>>> >>>>> > it
>>>>>>>> >>>>> > would be better.
>>>>>>>> >>>>> >
>>>>>>>> >>>>> > These are the supported formats for fromArgs():
>>>>>>>> >>>>> >    *   --project=MyProject (simple property, will set the
>>>>>>>> "project"
>>>>>>>> >>>>> > property
>>>>>>>> >>>>> > to "MyProject")
>>>>>>>> >>>>> >    *   --readOnly=true (for boolean properties, will set the
>>>>>>>> >>>>> > "readOnly"
>>>>>>>> >>>>> > property to "true")
>>>>>>>> >>>>> >    *   --readOnly (shorthand for boolean properties, will
>>>>>>>> set the
>>>>>>>> >>>>> > "readOnly"
>>>>>>>> >>>>> > property to "true")
>>>>>>>> >>>>> >    *   --x=1 --x=2 --x=3 (list style simple property, will
>>>>>>>> set the
>>>>>>>> >>>>> > "x"
>>>>>>>> >>>>> > property to [1, 2, 3])
>>>>>>>> >>>>> >    *   --x=1,2,3 (shorthand list style simple property,
>>>>>>>> will set the
>>>>>>>> >>>>> > "x"
>>>>>>>> >>>>> > property to [1, 2, 3])
>>>>>>>> >>>>> >    *   --complexObject='{"key1":"value1",...} (JSON format
>>>>>>>> for all
>>>>>>>> >>>>> > other
>>>>>>>> >>>>> > complex types)
>>>>>>>> >>>>> >
>>>>>>>> >>>>> > Using a string tokenizer that minimizes the number of
>>>>>>>> required escape
>>>>>>>> >>>>> > characters would be good so we could use newline characters
>>>>>>>> as our
>>>>>>>> >>>>> > only
>>>>>>>> >>>>> > token. I would avoid ',\t ' as tokens since they are more
>>>>>>>> likely to
>>>>>>>> >>>>> > appear.
>>>>>>>> >>>>> >
>>>>>>>> >>>>> > On Mon, Jan 8, 2018 at 10:33 AM, Kenneth Knowles <
>>>>>>>> k...@google.com>
>>>>>>>> >>>>> > wrote:
>>>>>>>> >>>>> >>
>>>>>>>> >>>>> >> We do have a plain command line syntax, and whoever writes
>>>>>>>> the
>>>>>>>> >>>>> >> main(String[]) function is responsible for invoking the
>>>>>>>> parser. It
>>>>>>>> >>>>> >> isn't
>>>>>>>> >>>>> >> quite as nice as standard arg parse libraries, but it
>>>>>>>> isn't too bad.
>>>>>>>> >>>>> >> It
>>>>>>>> >>>>> >> would be great to improve, though.
>>>>>>>> >>>>> >>
>>>>>>>> >>>>> >> Jackson is for machine-to-machine communication or other
>>>>>>>> situations
>>>>>>>> >>>>> >> where
>>>>>>>> >>>>> >> command line parsing doesn't work so well.
>>>>>>>> >>>>> >>
>>>>>>>> >>>>> >> Are we using these some other way?
>>>>>>>> >>>>> >>
>>>>>>>> >>>>> >> Kenn
>>>>>>>> >>>>> >>
>>>>>>>> >>>>> >> On Sun, Jan 7, 2018 at 7:21 AM, Romain Manni-Bucau
>>>>>>>> >>>>> >> <rmannibu...@gmail.com>
>>>>>>>> >>>>> >> wrote:
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>> Le 7 janv. 2018 12:53, "Jean-Baptiste Onofré" <
>>>>>>>> j...@nanthrax.net> a
>>>>>>>> >>>>> >>> écrit :
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>> Hi Romain,
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>> I guess you are assuming that pipeline options are flat
>>>>>>>> command
>>>>>>>> >>>>> >>> line
>>>>>>>> >>>>> >>> like
>>>>>>>> >>>>> >>> argument, right ?
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>> Actually, theoretically,  pipeline options can be
>>>>>>>> represented as
>>>>>>>> >>>>> >>> json,
>>>>>>>> >>>>> >>> that's why we use jackson.
>>>>>>>> >>>>> >>> The pipeline options can be serialized/deserialized as
>>>>>>>> json.
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>> Yes but if users (or dev ;)) start to use that then it is
>>>>>>>> trivial
>>>>>>>> >>>>> >>> to
>>>>>>>> >>>>> >>> break the cli handling and fromArgs parsing or, if not,
>>>>>>>> break the
>>>>>>>> >>>>> >>> user
>>>>>>>> >>>>> >>> experience. So at the end it is a kind of "yes but no",
>>>>>>>> right?
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>> PS: already see some advanced users having a headache
>>>>>>>> trying to
>>>>>>>> >>>>> >>> pass
>>>>>>>> >>>>> >>> pipeline options in json so using the plain command line
>>>>>>>> syntax can
>>>>>>>> >>>>> >>> be more
>>>>>>>> >>>>> >>> friendly too.
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>> The purpose is to remove the jackson dependencies ?
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>> Later yes, seeing core dep tree I identify a lot of dep
>>>>>>>> which can
>>>>>>>> >>>>> >>> conflict in some env and are not really needed or bring
>>>>>>>> much being
>>>>>>>> >>>>> >>> in the
>>>>>>>> >>>>> >>> core - like avro as mentionned in another thread. Can
>>>>>>>> need a
>>>>>>>> >>>>> >>> sanitizing
>>>>>>>> >>>>> >>> round. Short term it was really a "why is it that
>>>>>>>> complicated" ;).
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>> Regards
>>>>>>>> >>>>> >>> JB
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>> On 01/07/2018 11:38 AM, Romain Manni-Bucau wrote:
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> Hi guys,
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> not sure i fully get why jackson is used to parse
>>>>>>>> pipeline options
>>>>>>>> >>>>> >>>> in
>>>>>>>> >>>>> >>>> the testing integration
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> why not using a token parser like [1] which allows to
>>>>>>>> map 1-1 with
>>>>>>>> >>>>> >>>> the
>>>>>>>> >>>>> >>>> user interface (command line) the options?
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> [1]
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> https://github.com/Talend/comp
>>>>>>>> onent-runtime/blob/master/component-server/src/main/java/org
>>>>>>>> /talend/sdk/component/server/lang/StringPropertiesTokenizer.java
>>>>>>>> >>>>> >>>>
>>>>>>>> >>>>> >>>> Romain Manni-Bucau
>>>>>>>> >>>>> >>>> @rmannibucau <https://twitter.com/rmannibucau> | Blog
>>>>>>>> >>>>> >>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>>>>>>> >>>>> >>>> <http://rmannibucau.wordpress.com> | Github
>>>>>>>> >>>>> >>>> <https://github.com/rmannibucau>
>>>>>>>> >>>>> >>>> | LinkedIn <https://www.linkedin.com/in/rmannibucau>
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>> --
>>>>>>>> >>>>> >>> Jean-Baptiste Onofré
>>>>>>>> >>>>> >>> jbono...@apache.org
>>>>>>>> >>>>> >>> http://blog.nanthrax.net
>>>>>>>> >>>>> >>> Talend - http://www.talend.com
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>>
>>>>>>>> >>>>> >>
>>>>>>>> >>>>> >
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>

Reply via email to