1) TestPipeline#convertToArgs is meant to convert TestPipelineOptions into a String[] containing --arg=value for integration tests that only have a main(String[] arg) entry point like WordCount. There is this PR[1] that is outstanding that is attempting to clean this up and simplify it so that we aren't doing this haphazard conversion. [1] https://github.com/apache/beam/pull/4346
I haven't yet commented on the PR but I was going to suggest that the user add String[] getMainArgs to TestPipelineOptions and not do the convertToArgs hackiness for ITs that use main(). On Thu, Jan 11, 2018 at 1:53 PM, Romain Manni-Bucau <rmannibu...@gmail.com> wrote: > Some inputs - maybe comments? - would be appreciated on: > > 1. Why using json in https://github.com/apache/beam/blob/master/sdks/java/ > core/src/main/java/org/apache/beam/sdk/testing/TestPipeline.java#L480 and > not iterate over the options? > 2. Also this is likely used to do a fromArgs and create another pipeline > options later no? Is this api that useful here or does it need an options > composition - fully object? > > Le 11 janv. 2018 19:09, "Lukasz Cwik" <lc...@google.com> a écrit : > >> Give links to the code segments that you want background on. >> >> On Wed, Jan 10, 2018 at 12:44 AM, Romain Manni-Bucau < >> rmannibu...@gmail.com> wrote: >> >>> updated my junit 5 PR to show that: https://github.com/apach >>> e/beam/pull/4360/files#diff-578d1770f8b47ebbc1e74a2c19de9a6aR28 >>> >>> It doesn't remove jackson yet but exposes a nicer user interface for the >>> config. >>> >>> I'm not fully clear on all the jackson usage yet, there are some round >>> trips (PO -> json -> PO) which are weird without more knowledge. >>> >>> >>> Romain Manni-Bucau >>> @rmannibucau <https://twitter.com/rmannibucau> | Blog >>> <https://rmannibucau.metawerx.net/> | Old Blog >>> <http://rmannibucau.wordpress.com> | Github >>> <https://github.com/rmannibucau> | LinkedIn >>> <https://www.linkedin.com/in/rmannibucau> >>> >>> 2018-01-09 22:57 GMT+01:00 Lukasz Cwik <lc...@google.com>: >>> >>>> Removing large dependencies is great but I can only assume it will be a >>>> lot of work so if you can get it to work in a backwards compatible way >>>> great. >>>> >>>> On Tue, Jan 9, 2018 at 1:52 PM, Romain Manni-Bucau < >>>> rmannibu...@gmail.com> wrote: >>>> >>>>> It conflicts easily between libs and containers. Shade is not a good >>>>> option too - see the thread on this topic :(. >>>>> >>>>> At the end i see using the cli solution closer to user - vs framework >>>>> dev for json - and hurting less in terms of classpath so pby sthg to test >>>>> no? >>>>> >>>>> Le 9 janv. 2018 22:47, "Lukasz Cwik" <lc...@google.com> a écrit : >>>>> >>>>>> Romain, how has Jackson been a classpath breaker? >>>>>> >>>>>> >>>>>> On Tue, Jan 9, 2018 at 1:20 PM, Romain Manni-Bucau < >>>>>> rmannibu...@gmail.com> wrote: >>>>>> >>>>>>> Hmm, beam already owns the cli parsing - that is what I meant - it >>>>>>> only misses the arg delimiter (ie quoting) and adding it is easy no? >>>>>>> >>>>>>> Le 9 janv. 2018 21:19, "Robert Bradshaw" <rober...@google.com> a >>>>>>> écrit : >>>>>>> >>>>>>>> On Tue, Jan 9, 2018 at 11:48 AM, Romain Manni-Bucau >>>>>>>> <rmannibu...@gmail.com> wrote: >>>>>>>> > >>>>>>>> > Le 9 janv. 2018 19:50, "Robert Bradshaw" <rober...@google.com> a >>>>>>>> écrit : >>>>>>>> > >>>>>>>> > From what I understand: >>>>>>>> > >>>>>>>> > 1) The command line argument values (both simple and more >>>>>>>> complex) are >>>>>>>> > all JSON-representable. >>>>>>>> > >>>>>>>> > And must be all CLI representable >>>>>>>> >>>>>>>> Sorry, I should have said "all pipeline options." In any case, one >>>>>>>> can >>>>>>>> always do JSON -> string and the resulting string is usable as a >>>>>>>> command line argument. >>>>>>>> >>>>>>>> > 2) The command line is a mapping of keys to these values. >>>>>>>> > >>>>>>>> > Which makes your 1 not true since json supports more ;) >>>>>>>> > >>>>>>>> > As such, it seems quite natural to represent the whole set as a >>>>>>>> single >>>>>>>> > JSON map, rather than using a different, custom encoding for the >>>>>>>> top >>>>>>>> > level (whose custom escaping would have to be carried into the >>>>>>>> inner >>>>>>>> > JSON values). Note that JSON has the advantage that one never >>>>>>>> needs to >>>>>>>> > explain or define it, and parsers/serializers already exists for >>>>>>>> all >>>>>>>> > languages (e.g. if one has a driver script in another language for >>>>>>>> > launching a java pipeline, it's easy to communicate all the args). >>>>>>>> > >>>>>>>> > Same reasonning applies to CLI AFAIK >>>>>>>> >>>>>>>> The spec of what a valid command line argument list is is >>>>>>>> surprisingly >>>>>>>> inconsistent across platforms, languages, and programs. And your >>>>>>>> proposal seems to be getting into what the delimiter is, and how to >>>>>>>> escape it, and possibly then how to escape the escape character. All >>>>>>>> of this is sidestepped by pointing at an existing spec. >>>>>>>> >>>>>>>> > We can't get rid of Jackson in the core because of (1) so there's >>>>>>>> > little value in adding complexity to remove it from (2). The fact >>>>>>>> that >>>>>>>> > Java doesn't ship anything in its expansive standard library for >>>>>>>> this >>>>>>>> > is unfortuante, so we have to take a dependency on something. >>>>>>>> > >>>>>>>> > We actually can as shown before >>>>>>>> >>>>>>>> How, if JSON is integral to the parsing of the argument values >>>>>>>> themselves? (Or is the argument that it's not, but each extender of >>>>>>>> PipelineOptions is responsible for choosing a JSON parser and doing >>>>>>>> it >>>>>>>> themselves.) >>>>>>>> >>>>>>>> > and jackson is not a simple negligible lib >>>>>>>> > but a classpath breaker :( >>>>>>>> >>>>>>>> Perhaps there are better libraries we could use instead? Is the >>>>>>>> ecosystem so bad that it encourages projects to roll their own of >>>>>>>> everything rather than share code? >>>>>>>> >>>>>>>> > On Tue, Jan 9, 2018 at 10:00 AM, Lukasz Cwik <lc...@google.com> >>>>>>>> wrote: >>>>>>>> >> Ah, you don't want JSON within JSON. I see, if thats the case >>>>>>>> just migrate >>>>>>>> >> all of them to string tokenization and drop the Jackson usage >>>>>>>> for string >>>>>>>> >> -> >>>>>>>> >> string[] conversion. >>>>>>>> >> >>>>>>>> >> On Mon, Jan 8, 2018 at 10:06 PM, Romain Manni-Bucau >>>>>>>> >> <rmannibu...@gmail.com> >>>>>>>> >> wrote: >>>>>>>> >>> >>>>>>>> >>> Lukasz, the use case is to znsure the config used can still map >>>>>>>> the CLI >>>>>>>> >>> and that options dont start to abuse json so it is exactly the >>>>>>>> opposite >>>>>>>> >>> of >>>>>>>> >>> "we can be fancy" and is closer to "we can be robust". Also the >>>>>>>> default >>>>>>>> >>> should be easy and not json (i just want to set the runner, >>>>>>>> --runner=xxx >>>>>>>> >>> is >>>>>>>> >>> easier than a json version). If the value doesnt start with a [ >>>>>>>> we can >>>>>>>> >>> use >>>>>>>> >>> the tokenizer else jackson, wdyt? >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> Le 8 janv. 2018 22:59, "Lukasz Cwik" <lc...@google.com> a >>>>>>>> écrit : >>>>>>>> >>> >>>>>>>> >>> Now that I think about this more. I looked at some of the >>>>>>>> examples in the >>>>>>>> >>> pom.xml and they don't seem to be tricky to write in JSON. >>>>>>>> >>> I also looked at the Jenkins job configurations (specifically >>>>>>>> the >>>>>>>> >>> performance tests) and they pass around maps which they convert >>>>>>>> to the >>>>>>>> >>> required format without needing a user to write it themselves. >>>>>>>> >>> Using gradle, we will be able to trivially to do the same thing >>>>>>>> (convert >>>>>>>> >>> maps to json without needing the person to write it by hand) >>>>>>>> like groovy >>>>>>>> >>> does. >>>>>>>> >>> Since we are migrating away from Maven it doesn't seem >>>>>>>> worthwhile to >>>>>>>> >>> spend >>>>>>>> >>> time on to make it easier to write the args in the Maven poms. >>>>>>>> >>> >>>>>>>> >>> Is there another use case that is being missed? >>>>>>>> >>> >>>>>>>> >>> On Mon, Jan 8, 2018 at 1:38 PM, Romain Manni-Bucau >>>>>>>> >>> <rmannibu...@gmail.com> >>>>>>>> >>> wrote: >>>>>>>> >>>> >>>>>>>> >>>> Good point for \t and ,. >>>>>>>> >>>> >>>>>>>> >>>> Any objection to use jackson as a fallback for that purpose - >>>>>>>> for >>>>>>>> >>>> backwqrd compat only - and make it optional then? Will create >>>>>>>> the ticket >>>>>>>> >>>> if >>>>>>>> >>>> not. >>>>>>>> >>>> >>>>>>>> >>>> Le 8 janv. 2018 20:32, "Robert Bradshaw" <rober...@google.com> >>>>>>>> a écrit : >>>>>>>> >>>>> >>>>>>>> >>>>> Part of the motivation to use JSON for more complex options >>>>>>>> was that >>>>>>>> >>>>> it avoids having to define (and document, test, have users >>>>>>>> learn, ...) >>>>>>>> >>>>> yet another format for expressing lists, maps, etc. >>>>>>>> >>>>> >>>>>>>> >>>>> On Mon, Jan 8, 2018 at 11:19 AM, Lukasz Cwik < >>>>>>>> lc...@google.com> wrote: >>>>>>>> >>>>> > Ken, this is specifically about running integration tests >>>>>>>> and not a >>>>>>>> >>>>> > users >>>>>>>> >>>>> > main(). >>>>>>>> >>>>> > >>>>>>>> >>>>> > Note, that PipelineOptions JSON format was used because it >>>>>>>> was a >>>>>>>> >>>>> > convenient >>>>>>>> >>>>> > serialized form that is easy to explain to people what is >>>>>>>> required. >>>>>>>> >>>>> > Using a different string tokenizer and calling >>>>>>>> >>>>> > PipelineOptionsFactory.fromArgs() with the parsed strings >>>>>>>> seems like >>>>>>>> >>>>> > it >>>>>>>> >>>>> > would be better. >>>>>>>> >>>>> > >>>>>>>> >>>>> > These are the supported formats for fromArgs(): >>>>>>>> >>>>> > * --project=MyProject (simple property, will set the >>>>>>>> "project" >>>>>>>> >>>>> > property >>>>>>>> >>>>> > to "MyProject") >>>>>>>> >>>>> > * --readOnly=true (for boolean properties, will set the >>>>>>>> >>>>> > "readOnly" >>>>>>>> >>>>> > property to "true") >>>>>>>> >>>>> > * --readOnly (shorthand for boolean properties, will >>>>>>>> set the >>>>>>>> >>>>> > "readOnly" >>>>>>>> >>>>> > property to "true") >>>>>>>> >>>>> > * --x=1 --x=2 --x=3 (list style simple property, will >>>>>>>> set the >>>>>>>> >>>>> > "x" >>>>>>>> >>>>> > property to [1, 2, 3]) >>>>>>>> >>>>> > * --x=1,2,3 (shorthand list style simple property, >>>>>>>> will set the >>>>>>>> >>>>> > "x" >>>>>>>> >>>>> > property to [1, 2, 3]) >>>>>>>> >>>>> > * --complexObject='{"key1":"value1",...} (JSON format >>>>>>>> for all >>>>>>>> >>>>> > other >>>>>>>> >>>>> > complex types) >>>>>>>> >>>>> > >>>>>>>> >>>>> > Using a string tokenizer that minimizes the number of >>>>>>>> required escape >>>>>>>> >>>>> > characters would be good so we could use newline characters >>>>>>>> as our >>>>>>>> >>>>> > only >>>>>>>> >>>>> > token. I would avoid ',\t ' as tokens since they are more >>>>>>>> likely to >>>>>>>> >>>>> > appear. >>>>>>>> >>>>> > >>>>>>>> >>>>> > On Mon, Jan 8, 2018 at 10:33 AM, Kenneth Knowles < >>>>>>>> k...@google.com> >>>>>>>> >>>>> > wrote: >>>>>>>> >>>>> >> >>>>>>>> >>>>> >> We do have a plain command line syntax, and whoever writes >>>>>>>> the >>>>>>>> >>>>> >> main(String[]) function is responsible for invoking the >>>>>>>> parser. It >>>>>>>> >>>>> >> isn't >>>>>>>> >>>>> >> quite as nice as standard arg parse libraries, but it >>>>>>>> isn't too bad. >>>>>>>> >>>>> >> It >>>>>>>> >>>>> >> would be great to improve, though. >>>>>>>> >>>>> >> >>>>>>>> >>>>> >> Jackson is for machine-to-machine communication or other >>>>>>>> situations >>>>>>>> >>>>> >> where >>>>>>>> >>>>> >> command line parsing doesn't work so well. >>>>>>>> >>>>> >> >>>>>>>> >>>>> >> Are we using these some other way? >>>>>>>> >>>>> >> >>>>>>>> >>>>> >> Kenn >>>>>>>> >>>>> >> >>>>>>>> >>>>> >> On Sun, Jan 7, 2018 at 7:21 AM, Romain Manni-Bucau >>>>>>>> >>>>> >> <rmannibu...@gmail.com> >>>>>>>> >>>>> >> wrote: >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> Le 7 janv. 2018 12:53, "Jean-Baptiste Onofré" < >>>>>>>> j...@nanthrax.net> a >>>>>>>> >>>>> >>> écrit : >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> Hi Romain, >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> I guess you are assuming that pipeline options are flat >>>>>>>> command >>>>>>>> >>>>> >>> line >>>>>>>> >>>>> >>> like >>>>>>>> >>>>> >>> argument, right ? >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> Actually, theoretically, pipeline options can be >>>>>>>> represented as >>>>>>>> >>>>> >>> json, >>>>>>>> >>>>> >>> that's why we use jackson. >>>>>>>> >>>>> >>> The pipeline options can be serialized/deserialized as >>>>>>>> json. >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> Yes but if users (or dev ;)) start to use that then it is >>>>>>>> trivial >>>>>>>> >>>>> >>> to >>>>>>>> >>>>> >>> break the cli handling and fromArgs parsing or, if not, >>>>>>>> break the >>>>>>>> >>>>> >>> user >>>>>>>> >>>>> >>> experience. So at the end it is a kind of "yes but no", >>>>>>>> right? >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> PS: already see some advanced users having a headache >>>>>>>> trying to >>>>>>>> >>>>> >>> pass >>>>>>>> >>>>> >>> pipeline options in json so using the plain command line >>>>>>>> syntax can >>>>>>>> >>>>> >>> be more >>>>>>>> >>>>> >>> friendly too. >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> The purpose is to remove the jackson dependencies ? >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> Later yes, seeing core dep tree I identify a lot of dep >>>>>>>> which can >>>>>>>> >>>>> >>> conflict in some env and are not really needed or bring >>>>>>>> much being >>>>>>>> >>>>> >>> in the >>>>>>>> >>>>> >>> core - like avro as mentionned in another thread. Can >>>>>>>> need a >>>>>>>> >>>>> >>> sanitizing >>>>>>>> >>>>> >>> round. Short term it was really a "why is it that >>>>>>>> complicated" ;). >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> Regards >>>>>>>> >>>>> >>> JB >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> On 01/07/2018 11:38 AM, Romain Manni-Bucau wrote: >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> Hi guys, >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> not sure i fully get why jackson is used to parse >>>>>>>> pipeline options >>>>>>>> >>>>> >>>> in >>>>>>>> >>>>> >>>> the testing integration >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> why not using a token parser like [1] which allows to >>>>>>>> map 1-1 with >>>>>>>> >>>>> >>>> the >>>>>>>> >>>>> >>>> user interface (command line) the options? >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> [1] >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> https://github.com/Talend/comp >>>>>>>> onent-runtime/blob/master/component-server/src/main/java/org >>>>>>>> /talend/sdk/component/server/lang/StringPropertiesTokenizer.java >>>>>>>> >>>>> >>>> >>>>>>>> >>>>> >>>> Romain Manni-Bucau >>>>>>>> >>>>> >>>> @rmannibucau <https://twitter.com/rmannibucau> | Blog >>>>>>>> >>>>> >>>> <https://rmannibucau.metawerx.net/> | Old Blog >>>>>>>> >>>>> >>>> <http://rmannibucau.wordpress.com> | Github >>>>>>>> >>>>> >>>> <https://github.com/rmannibucau> >>>>>>>> >>>>> >>>> | LinkedIn <https://www.linkedin.com/in/rmannibucau> >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> -- >>>>>>>> >>>>> >>> Jean-Baptiste Onofré >>>>>>>> >>>>> >>> jbono...@apache.org >>>>>>>> >>>>> >>> http://blog.nanthrax.net >>>>>>>> >>>>> >>> Talend - http://www.talend.com >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >>> >>>>>>>> >>>>> >> >>>>>>>> >>>>> > >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >> >>>>>>>> > >>>>>>>> > >>>>>>>> >>>>>>> >>>>>> >>>> >>> >>