Robert Bradshaw had the idea of migrating away from using main(String[]
args) and just refactoring the code to test the WordCount PTransform
allowing one to write a traditional JUnit test that didn't call
main(String[] args). This would change what the contents of our examples
are and make them more amenable to testing which could be good guidance for
developers as well.

On Thu, Jan 11, 2018 at 3:08 PM, Lukasz Cwik <lc...@google.com> wrote:

> 1) TestPipeline#convertToArgs is meant to convert TestPipelineOptions into
> a String[] containing --arg=value for integration tests that only have a
> main(String[] arg) entry point like WordCount. There is this PR[1] that is
> outstanding that is attempting to clean this up and simplify it so that we
> aren't doing this haphazard conversion.
> [1] https://github.com/apache/beam/pull/4346
>
> I haven't yet commented on the PR but I was going to suggest that the user
> add String[] getMainArgs to TestPipelineOptions and not do the
> convertToArgs hackiness for ITs that use main().
>
> On Thu, Jan 11, 2018 at 1:53 PM, Romain Manni-Bucau <rmannibu...@gmail.com
> > wrote:
>
>> Some inputs - maybe comments? - would be appreciated on:
>>
>> 1. Why using json in https://github.com/apache/b
>> eam/blob/master/sdks/java/core/src/main/java/org/apache/beam
>> /sdk/testing/TestPipeline.java#L480 and not iterate over the options?
>> 2. Also this is likely used to do a fromArgs and create another pipeline
>> options later no? Is this api that useful here or does it need an options
>> composition - fully object?
>>
>> Le 11 janv. 2018 19:09, "Lukasz Cwik" <lc...@google.com> a écrit :
>>
>>> Give links to the code segments that you want background on.
>>>
>>> On Wed, Jan 10, 2018 at 12:44 AM, Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
>>>> updated my junit 5 PR to show that: https://github.com/apach
>>>> e/beam/pull/4360/files#diff-578d1770f8b47ebbc1e74a2c19de9a6aR28
>>>>
>>>> It doesn't remove jackson yet but exposes a nicer user interface for
>>>> the config.
>>>>
>>>> I'm not fully clear on all the jackson usage yet, there are some round
>>>> trips (PO -> json -> PO) which are weird without more knowledge.
>>>>
>>>>
>>>> Romain Manni-Bucau
>>>> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
>>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>>> <http://rmannibucau.wordpress.com> | Github
>>>> <https://github.com/rmannibucau> | LinkedIn
>>>> <https://www.linkedin.com/in/rmannibucau>
>>>>
>>>> 2018-01-09 22:57 GMT+01:00 Lukasz Cwik <lc...@google.com>:
>>>>
>>>>> Removing large dependencies is great but I can only assume it will be
>>>>> a lot of work so if you can get it to work in a backwards compatible way
>>>>> great.
>>>>>
>>>>> On Tue, Jan 9, 2018 at 1:52 PM, Romain Manni-Bucau <
>>>>> rmannibu...@gmail.com> wrote:
>>>>>
>>>>>> It conflicts easily between libs and containers. Shade is not a good
>>>>>> option too - see the thread on this topic :(.
>>>>>>
>>>>>> At the end i see using the cli solution closer to user - vs framework
>>>>>> dev for json - and hurting less in terms of classpath so pby sthg to test
>>>>>> no?
>>>>>>
>>>>>> Le 9 janv. 2018 22:47, "Lukasz Cwik" <lc...@google.com> a écrit :
>>>>>>
>>>>>>> Romain, how has Jackson been a classpath breaker?
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 9, 2018 at 1:20 PM, Romain Manni-Bucau <
>>>>>>> rmannibu...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hmm, beam already owns the cli parsing - that is what I meant - it
>>>>>>>> only misses the arg delimiter (ie quoting) and adding it is easy no?
>>>>>>>>
>>>>>>>> Le 9 janv. 2018 21:19, "Robert Bradshaw" <rober...@google.com> a
>>>>>>>> écrit :
>>>>>>>>
>>>>>>>>> On Tue, Jan 9, 2018 at 11:48 AM, Romain Manni-Bucau
>>>>>>>>> <rmannibu...@gmail.com> wrote:
>>>>>>>>> >
>>>>>>>>> > Le 9 janv. 2018 19:50, "Robert Bradshaw" <rober...@google.com>
>>>>>>>>> a écrit :
>>>>>>>>> >
>>>>>>>>> > From what I understand:
>>>>>>>>> >
>>>>>>>>> > 1) The command line argument values (both simple and more
>>>>>>>>> complex) are
>>>>>>>>> > all JSON-representable.
>>>>>>>>> >
>>>>>>>>> > And must be all CLI representable
>>>>>>>>>
>>>>>>>>> Sorry, I should have said "all pipeline options." In any case, one
>>>>>>>>> can
>>>>>>>>> always do JSON -> string and the resulting string is usable as a
>>>>>>>>> command line argument.
>>>>>>>>>
>>>>>>>>> > 2) The command line is a mapping of keys to these values.
>>>>>>>>> >
>>>>>>>>> > Which makes your 1 not true since json supports more ;)
>>>>>>>>> >
>>>>>>>>> > As such, it seems quite natural to represent the whole set as a
>>>>>>>>> single
>>>>>>>>> > JSON map, rather than using a different, custom encoding for the
>>>>>>>>> top
>>>>>>>>> > level (whose custom escaping would have to be carried into the
>>>>>>>>> inner
>>>>>>>>> > JSON values). Note that JSON has the advantage that one never
>>>>>>>>> needs to
>>>>>>>>> > explain or define it, and parsers/serializers already exists for
>>>>>>>>> all
>>>>>>>>> > languages (e.g. if one has a driver script in another language
>>>>>>>>> for
>>>>>>>>> > launching a java pipeline, it's easy to communicate all the
>>>>>>>>> args).
>>>>>>>>> >
>>>>>>>>> > Same reasonning applies to CLI AFAIK
>>>>>>>>>
>>>>>>>>> The spec of what a valid command line argument list is is
>>>>>>>>> surprisingly
>>>>>>>>> inconsistent across platforms, languages, and programs. And your
>>>>>>>>> proposal seems to be getting into what the delimiter is, and how to
>>>>>>>>> escape it, and possibly then how to escape the escape character.
>>>>>>>>> All
>>>>>>>>> of this is sidestepped by pointing at an existing spec.
>>>>>>>>>
>>>>>>>>> > We can't get rid of Jackson in the core because of (1) so there's
>>>>>>>>> > little value in adding complexity to remove it from (2). The
>>>>>>>>> fact that
>>>>>>>>> > Java doesn't ship anything in its expansive standard library for
>>>>>>>>> this
>>>>>>>>> > is unfortuante, so we have to take a dependency on something.
>>>>>>>>> >
>>>>>>>>> > We actually can as shown before
>>>>>>>>>
>>>>>>>>> How, if JSON is integral to the parsing of the argument values
>>>>>>>>> themselves? (Or is the argument that it's not, but each extender of
>>>>>>>>> PipelineOptions is responsible for choosing a JSON parser and
>>>>>>>>> doing it
>>>>>>>>> themselves.)
>>>>>>>>>
>>>>>>>>> > and jackson is not a simple negligible lib
>>>>>>>>> > but a classpath breaker :(
>>>>>>>>>
>>>>>>>>> Perhaps there are better libraries we could use instead? Is the
>>>>>>>>> ecosystem so bad that it encourages projects to roll their own of
>>>>>>>>> everything rather than share code?
>>>>>>>>>
>>>>>>>>> > On Tue, Jan 9, 2018 at 10:00 AM, Lukasz Cwik <lc...@google.com>
>>>>>>>>> wrote:
>>>>>>>>> >> Ah, you don't want JSON within JSON. I see, if thats the case
>>>>>>>>> just migrate
>>>>>>>>> >> all of them to string tokenization and drop the Jackson usage
>>>>>>>>> for string
>>>>>>>>> >> ->
>>>>>>>>> >> string[] conversion.
>>>>>>>>> >>
>>>>>>>>> >> On Mon, Jan 8, 2018 at 10:06 PM, Romain Manni-Bucau
>>>>>>>>> >> <rmannibu...@gmail.com>
>>>>>>>>> >> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>> Lukasz, the use case is to znsure the config used can still
>>>>>>>>> map the CLI
>>>>>>>>> >>> and that options dont start to abuse json so it is exactly the
>>>>>>>>> opposite
>>>>>>>>> >>> of
>>>>>>>>> >>> "we can be fancy" and is closer to "we can be robust". Also
>>>>>>>>> the default
>>>>>>>>> >>> should be easy and not json (i just want to set the runner,
>>>>>>>>> --runner=xxx
>>>>>>>>> >>> is
>>>>>>>>> >>> easier than a json version). If the value doesnt start with a
>>>>>>>>> [ we can
>>>>>>>>> >>> use
>>>>>>>>> >>> the tokenizer else jackson, wdyt?
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>> Le 8 janv. 2018 22:59, "Lukasz Cwik" <lc...@google.com> a
>>>>>>>>> écrit :
>>>>>>>>> >>>
>>>>>>>>> >>> Now that I think about this more. I looked at some of the
>>>>>>>>> examples in the
>>>>>>>>> >>> pom.xml and they don't seem to be tricky to write in JSON.
>>>>>>>>> >>> I also looked at the Jenkins job configurations (specifically
>>>>>>>>> the
>>>>>>>>> >>> performance tests) and they pass around maps which they
>>>>>>>>> convert to the
>>>>>>>>> >>> required format without needing a user to write it themselves.
>>>>>>>>> >>> Using gradle, we will be able to trivially to do the same
>>>>>>>>> thing (convert
>>>>>>>>> >>> maps to json without needing the person to write it by hand)
>>>>>>>>> like groovy
>>>>>>>>> >>> does.
>>>>>>>>> >>> Since we are migrating away from Maven it doesn't seem
>>>>>>>>> worthwhile to
>>>>>>>>> >>> spend
>>>>>>>>> >>> time on to make it easier to write the args in the Maven poms.
>>>>>>>>> >>>
>>>>>>>>> >>> Is there another use case that is being missed?
>>>>>>>>> >>>
>>>>>>>>> >>> On Mon, Jan 8, 2018 at 1:38 PM, Romain Manni-Bucau
>>>>>>>>> >>> <rmannibu...@gmail.com>
>>>>>>>>> >>> wrote:
>>>>>>>>> >>>>
>>>>>>>>> >>>> Good point for \t and ,.
>>>>>>>>> >>>>
>>>>>>>>> >>>> Any objection to use jackson as a fallback for that purpose -
>>>>>>>>> for
>>>>>>>>> >>>> backwqrd compat only - and make it optional then? Will create
>>>>>>>>> the ticket
>>>>>>>>> >>>> if
>>>>>>>>> >>>> not.
>>>>>>>>> >>>>
>>>>>>>>> >>>> Le 8 janv. 2018 20:32, "Robert Bradshaw" <rober...@google.com>
>>>>>>>>> a écrit :
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Part of the motivation to use JSON for more complex options
>>>>>>>>> was that
>>>>>>>>> >>>>> it avoids having to define (and document, test, have users
>>>>>>>>> learn, ...)
>>>>>>>>> >>>>> yet another format for expressing lists, maps, etc.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> On Mon, Jan 8, 2018 at 11:19 AM, Lukasz Cwik <
>>>>>>>>> lc...@google.com> wrote:
>>>>>>>>> >>>>> > Ken, this is specifically about running integration tests
>>>>>>>>> and not a
>>>>>>>>> >>>>> > users
>>>>>>>>> >>>>> > main().
>>>>>>>>> >>>>> >
>>>>>>>>> >>>>> > Note, that PipelineOptions JSON format was used because it
>>>>>>>>> was a
>>>>>>>>> >>>>> > convenient
>>>>>>>>> >>>>> > serialized form that is easy to explain to people what is
>>>>>>>>> required.
>>>>>>>>> >>>>> > Using a different string tokenizer and calling
>>>>>>>>> >>>>> > PipelineOptionsFactory.fromArgs() with the parsed strings
>>>>>>>>> seems like
>>>>>>>>> >>>>> > it
>>>>>>>>> >>>>> > would be better.
>>>>>>>>> >>>>> >
>>>>>>>>> >>>>> > These are the supported formats for fromArgs():
>>>>>>>>> >>>>> >    *   --project=MyProject (simple property, will set the
>>>>>>>>> "project"
>>>>>>>>> >>>>> > property
>>>>>>>>> >>>>> > to "MyProject")
>>>>>>>>> >>>>> >    *   --readOnly=true (for boolean properties, will set
>>>>>>>>> the
>>>>>>>>> >>>>> > "readOnly"
>>>>>>>>> >>>>> > property to "true")
>>>>>>>>> >>>>> >    *   --readOnly (shorthand for boolean properties, will
>>>>>>>>> set the
>>>>>>>>> >>>>> > "readOnly"
>>>>>>>>> >>>>> > property to "true")
>>>>>>>>> >>>>> >    *   --x=1 --x=2 --x=3 (list style simple property, will
>>>>>>>>> set the
>>>>>>>>> >>>>> > "x"
>>>>>>>>> >>>>> > property to [1, 2, 3])
>>>>>>>>> >>>>> >    *   --x=1,2,3 (shorthand list style simple property,
>>>>>>>>> will set the
>>>>>>>>> >>>>> > "x"
>>>>>>>>> >>>>> > property to [1, 2, 3])
>>>>>>>>> >>>>> >    *   --complexObject='{"key1":"value1",...} (JSON
>>>>>>>>> format for all
>>>>>>>>> >>>>> > other
>>>>>>>>> >>>>> > complex types)
>>>>>>>>> >>>>> >
>>>>>>>>> >>>>> > Using a string tokenizer that minimizes the number of
>>>>>>>>> required escape
>>>>>>>>> >>>>> > characters would be good so we could use newline
>>>>>>>>> characters as our
>>>>>>>>> >>>>> > only
>>>>>>>>> >>>>> > token. I would avoid ',\t ' as tokens since they are more
>>>>>>>>> likely to
>>>>>>>>> >>>>> > appear.
>>>>>>>>> >>>>> >
>>>>>>>>> >>>>> > On Mon, Jan 8, 2018 at 10:33 AM, Kenneth Knowles <
>>>>>>>>> k...@google.com>
>>>>>>>>> >>>>> > wrote:
>>>>>>>>> >>>>> >>
>>>>>>>>> >>>>> >> We do have a plain command line syntax, and whoever
>>>>>>>>> writes the
>>>>>>>>> >>>>> >> main(String[]) function is responsible for invoking the
>>>>>>>>> parser. It
>>>>>>>>> >>>>> >> isn't
>>>>>>>>> >>>>> >> quite as nice as standard arg parse libraries, but it
>>>>>>>>> isn't too bad.
>>>>>>>>> >>>>> >> It
>>>>>>>>> >>>>> >> would be great to improve, though.
>>>>>>>>> >>>>> >>
>>>>>>>>> >>>>> >> Jackson is for machine-to-machine communication or other
>>>>>>>>> situations
>>>>>>>>> >>>>> >> where
>>>>>>>>> >>>>> >> command line parsing doesn't work so well.
>>>>>>>>> >>>>> >>
>>>>>>>>> >>>>> >> Are we using these some other way?
>>>>>>>>> >>>>> >>
>>>>>>>>> >>>>> >> Kenn
>>>>>>>>> >>>>> >>
>>>>>>>>> >>>>> >> On Sun, Jan 7, 2018 at 7:21 AM, Romain Manni-Bucau
>>>>>>>>> >>>>> >> <rmannibu...@gmail.com>
>>>>>>>>> >>>>> >> wrote:
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>> Le 7 janv. 2018 12:53, "Jean-Baptiste Onofré" <
>>>>>>>>> j...@nanthrax.net> a
>>>>>>>>> >>>>> >>> écrit :
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>> Hi Romain,
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>> I guess you are assuming that pipeline options are flat
>>>>>>>>> command
>>>>>>>>> >>>>> >>> line
>>>>>>>>> >>>>> >>> like
>>>>>>>>> >>>>> >>> argument, right ?
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>> Actually, theoretically,  pipeline options can be
>>>>>>>>> represented as
>>>>>>>>> >>>>> >>> json,
>>>>>>>>> >>>>> >>> that's why we use jackson.
>>>>>>>>> >>>>> >>> The pipeline options can be serialized/deserialized as
>>>>>>>>> json.
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>> Yes but if users (or dev ;)) start to use that then it
>>>>>>>>> is trivial
>>>>>>>>> >>>>> >>> to
>>>>>>>>> >>>>> >>> break the cli handling and fromArgs parsing or, if not,
>>>>>>>>> break the
>>>>>>>>> >>>>> >>> user
>>>>>>>>> >>>>> >>> experience. So at the end it is a kind of "yes but no",
>>>>>>>>> right?
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>> PS: already see some advanced users having a headache
>>>>>>>>> trying to
>>>>>>>>> >>>>> >>> pass
>>>>>>>>> >>>>> >>> pipeline options in json so using the plain command line
>>>>>>>>> syntax can
>>>>>>>>> >>>>> >>> be more
>>>>>>>>> >>>>> >>> friendly too.
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>> The purpose is to remove the jackson dependencies ?
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>> Later yes, seeing core dep tree I identify a lot of dep
>>>>>>>>> which can
>>>>>>>>> >>>>> >>> conflict in some env and are not really needed or bring
>>>>>>>>> much being
>>>>>>>>> >>>>> >>> in the
>>>>>>>>> >>>>> >>> core - like avro as mentionned in another thread. Can
>>>>>>>>> need a
>>>>>>>>> >>>>> >>> sanitizing
>>>>>>>>> >>>>> >>> round. Short term it was really a "why is it that
>>>>>>>>> complicated" ;).
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>> Regards
>>>>>>>>> >>>>> >>> JB
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>> On 01/07/2018 11:38 AM, Romain Manni-Bucau wrote:
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> Hi guys,
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> not sure i fully get why jackson is used to parse
>>>>>>>>> pipeline options
>>>>>>>>> >>>>> >>>> in
>>>>>>>>> >>>>> >>>> the testing integration
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> why not using a token parser like [1] which allows to
>>>>>>>>> map 1-1 with
>>>>>>>>> >>>>> >>>> the
>>>>>>>>> >>>>> >>>> user interface (command line) the options?
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> [1]
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> https://github.com/Talend/comp
>>>>>>>>> onent-runtime/blob/master/component-server/src/main/java/org
>>>>>>>>> /talend/sdk/component/server/lang/StringPropertiesTokenizer.java
>>>>>>>>> >>>>> >>>>
>>>>>>>>> >>>>> >>>> Romain Manni-Bucau
>>>>>>>>> >>>>> >>>> @rmannibucau <https://twitter.com/rmannibucau> | Blog
>>>>>>>>> >>>>> >>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>>>>>>>> >>>>> >>>> <http://rmannibucau.wordpress.com> | Github
>>>>>>>>> >>>>> >>>> <https://github.com/rmannibucau>
>>>>>>>>> >>>>> >>>> | LinkedIn <https://www.linkedin.com/in/rmannibucau>
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>> --
>>>>>>>>> >>>>> >>> Jean-Baptiste Onofré
>>>>>>>>> >>>>> >>> jbono...@apache.org
>>>>>>>>> >>>>> >>> http://blog.nanthrax.net
>>>>>>>>> >>>>> >>> Talend - http://www.talend.com
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>>
>>>>>>>>> >>>>> >>
>>>>>>>>> >>>>> >
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>

Reply via email to