Hmm, beam already owns the cli parsing - that is what I meant - it only misses the arg delimiter (ie quoting) and adding it is easy no?
Le 9 janv. 2018 21:19, "Robert Bradshaw" <[email protected]> a écrit : > On Tue, Jan 9, 2018 at 11:48 AM, Romain Manni-Bucau > <[email protected]> wrote: > > > > Le 9 janv. 2018 19:50, "Robert Bradshaw" <[email protected]> a écrit : > > > > From what I understand: > > > > 1) The command line argument values (both simple and more complex) are > > all JSON-representable. > > > > And must be all CLI representable > > Sorry, I should have said "all pipeline options." In any case, one can > always do JSON -> string and the resulting string is usable as a > command line argument. > > > 2) The command line is a mapping of keys to these values. > > > > Which makes your 1 not true since json supports more ;) > > > > As such, it seems quite natural to represent the whole set as a single > > JSON map, rather than using a different, custom encoding for the top > > level (whose custom escaping would have to be carried into the inner > > JSON values). Note that JSON has the advantage that one never needs to > > explain or define it, and parsers/serializers already exists for all > > languages (e.g. if one has a driver script in another language for > > launching a java pipeline, it's easy to communicate all the args). > > > > Same reasonning applies to CLI AFAIK > > The spec of what a valid command line argument list is is surprisingly > inconsistent across platforms, languages, and programs. And your > proposal seems to be getting into what the delimiter is, and how to > escape it, and possibly then how to escape the escape character. All > of this is sidestepped by pointing at an existing spec. > > > We can't get rid of Jackson in the core because of (1) so there's > > little value in adding complexity to remove it from (2). The fact that > > Java doesn't ship anything in its expansive standard library for this > > is unfortuante, so we have to take a dependency on something. > > > > We actually can as shown before > > How, if JSON is integral to the parsing of the argument values > themselves? (Or is the argument that it's not, but each extender of > PipelineOptions is responsible for choosing a JSON parser and doing it > themselves.) > > > and jackson is not a simple negligible lib > > but a classpath breaker :( > > Perhaps there are better libraries we could use instead? Is the > ecosystem so bad that it encourages projects to roll their own of > everything rather than share code? > > > On Tue, Jan 9, 2018 at 10:00 AM, Lukasz Cwik <[email protected]> wrote: > >> Ah, you don't want JSON within JSON. I see, if thats the case just > migrate > >> all of them to string tokenization and drop the Jackson usage for string > >> -> > >> string[] conversion. > >> > >> On Mon, Jan 8, 2018 at 10:06 PM, Romain Manni-Bucau > >> <[email protected]> > >> wrote: > >>> > >>> Lukasz, the use case is to znsure the config used can still map the CLI > >>> and that options dont start to abuse json so it is exactly the opposite > >>> of > >>> "we can be fancy" and is closer to "we can be robust". Also the default > >>> should be easy and not json (i just want to set the runner, > --runner=xxx > >>> is > >>> easier than a json version). If the value doesnt start with a [ we can > >>> use > >>> the tokenizer else jackson, wdyt? > >>> > >>> > >>> Le 8 janv. 2018 22:59, "Lukasz Cwik" <[email protected]> a écrit : > >>> > >>> Now that I think about this more. I looked at some of the examples in > the > >>> pom.xml and they don't seem to be tricky to write in JSON. > >>> I also looked at the Jenkins job configurations (specifically the > >>> performance tests) and they pass around maps which they convert to the > >>> required format without needing a user to write it themselves. > >>> Using gradle, we will be able to trivially to do the same thing > (convert > >>> maps to json without needing the person to write it by hand) like > groovy > >>> does. > >>> Since we are migrating away from Maven it doesn't seem worthwhile to > >>> spend > >>> time on to make it easier to write the args in the Maven poms. > >>> > >>> Is there another use case that is being missed? > >>> > >>> On Mon, Jan 8, 2018 at 1:38 PM, Romain Manni-Bucau > >>> <[email protected]> > >>> wrote: > >>>> > >>>> Good point for \t and ,. > >>>> > >>>> Any objection to use jackson as a fallback for that purpose - for > >>>> backwqrd compat only - and make it optional then? Will create the > ticket > >>>> if > >>>> not. > >>>> > >>>> Le 8 janv. 2018 20:32, "Robert Bradshaw" <[email protected]> a > écrit : > >>>>> > >>>>> Part of the motivation to use JSON for more complex options was that > >>>>> it avoids having to define (and document, test, have users learn, > ...) > >>>>> yet another format for expressing lists, maps, etc. > >>>>> > >>>>> On Mon, Jan 8, 2018 at 11:19 AM, Lukasz Cwik <[email protected]> > wrote: > >>>>> > Ken, this is specifically about running integration tests and not a > >>>>> > users > >>>>> > main(). > >>>>> > > >>>>> > Note, that PipelineOptions JSON format was used because it was a > >>>>> > convenient > >>>>> > serialized form that is easy to explain to people what is required. > >>>>> > Using a different string tokenizer and calling > >>>>> > PipelineOptionsFactory.fromArgs() with the parsed strings seems > like > >>>>> > it > >>>>> > would be better. > >>>>> > > >>>>> > These are the supported formats for fromArgs(): > >>>>> > * --project=MyProject (simple property, will set the "project" > >>>>> > property > >>>>> > to "MyProject") > >>>>> > * --readOnly=true (for boolean properties, will set the > >>>>> > "readOnly" > >>>>> > property to "true") > >>>>> > * --readOnly (shorthand for boolean properties, will set the > >>>>> > "readOnly" > >>>>> > property to "true") > >>>>> > * --x=1 --x=2 --x=3 (list style simple property, will set the > >>>>> > "x" > >>>>> > property to [1, 2, 3]) > >>>>> > * --x=1,2,3 (shorthand list style simple property, will set > the > >>>>> > "x" > >>>>> > property to [1, 2, 3]) > >>>>> > * --complexObject='{"key1":"value1",...} (JSON format for all > >>>>> > other > >>>>> > complex types) > >>>>> > > >>>>> > Using a string tokenizer that minimizes the number of required > escape > >>>>> > characters would be good so we could use newline characters as our > >>>>> > only > >>>>> > token. I would avoid ',\t ' as tokens since they are more likely to > >>>>> > appear. > >>>>> > > >>>>> > On Mon, Jan 8, 2018 at 10:33 AM, Kenneth Knowles <[email protected]> > >>>>> > wrote: > >>>>> >> > >>>>> >> We do have a plain command line syntax, and whoever writes the > >>>>> >> main(String[]) function is responsible for invoking the parser. It > >>>>> >> isn't > >>>>> >> quite as nice as standard arg parse libraries, but it isn't too > bad. > >>>>> >> It > >>>>> >> would be great to improve, though. > >>>>> >> > >>>>> >> Jackson is for machine-to-machine communication or other > situations > >>>>> >> where > >>>>> >> command line parsing doesn't work so well. > >>>>> >> > >>>>> >> Are we using these some other way? > >>>>> >> > >>>>> >> Kenn > >>>>> >> > >>>>> >> On Sun, Jan 7, 2018 at 7:21 AM, Romain Manni-Bucau > >>>>> >> <[email protected]> > >>>>> >> wrote: > >>>>> >>> > >>>>> >>> > >>>>> >>> > >>>>> >>> Le 7 janv. 2018 12:53, "Jean-Baptiste Onofré" <[email protected]> > a > >>>>> >>> écrit : > >>>>> >>> > >>>>> >>> Hi Romain, > >>>>> >>> > >>>>> >>> I guess you are assuming that pipeline options are flat command > >>>>> >>> line > >>>>> >>> like > >>>>> >>> argument, right ? > >>>>> >>> > >>>>> >>> Actually, theoretically, pipeline options can be represented as > >>>>> >>> json, > >>>>> >>> that's why we use jackson. > >>>>> >>> The pipeline options can be serialized/deserialized as json. > >>>>> >>> > >>>>> >>> > >>>>> >>> Yes but if users (or dev ;)) start to use that then it is trivial > >>>>> >>> to > >>>>> >>> break the cli handling and fromArgs parsing or, if not, break the > >>>>> >>> user > >>>>> >>> experience. So at the end it is a kind of "yes but no", right? > >>>>> >>> > >>>>> >>> PS: already see some advanced users having a headache trying to > >>>>> >>> pass > >>>>> >>> pipeline options in json so using the plain command line syntax > can > >>>>> >>> be more > >>>>> >>> friendly too. > >>>>> >>> > >>>>> >>> > >>>>> >>> The purpose is to remove the jackson dependencies ? > >>>>> >>> > >>>>> >>> > >>>>> >>> Later yes, seeing core dep tree I identify a lot of dep which can > >>>>> >>> conflict in some env and are not really needed or bring much > being > >>>>> >>> in the > >>>>> >>> core - like avro as mentionned in another thread. Can need a > >>>>> >>> sanitizing > >>>>> >>> round. Short term it was really a "why is it that complicated" > ;). > >>>>> >>> > >>>>> >>> > >>>>> >>> > >>>>> >>> Regards > >>>>> >>> JB > >>>>> >>> > >>>>> >>> > >>>>> >>> On 01/07/2018 11:38 AM, Romain Manni-Bucau wrote: > >>>>> >>>> > >>>>> >>>> Hi guys, > >>>>> >>>> > >>>>> >>>> not sure i fully get why jackson is used to parse pipeline > options > >>>>> >>>> in > >>>>> >>>> the testing integration > >>>>> >>>> > >>>>> >>>> why not using a token parser like [1] which allows to map 1-1 > with > >>>>> >>>> the > >>>>> >>>> user interface (command line) the options? > >>>>> >>>> > >>>>> >>>> [1] > >>>>> >>>> > >>>>> >>>> > >>>>> >>>> https://github.com/Talend/component-runtime/blob/master/ > component-server/src/main/java/org/talend/sdk/component/server/lang/ > StringPropertiesTokenizer.java > >>>>> >>>> > >>>>> >>>> Romain Manni-Bucau > >>>>> >>>> @rmannibucau <https://twitter.com/rmannibucau> | Blog > >>>>> >>>> <https://rmannibucau.metawerx.net/> | Old Blog > >>>>> >>>> <http://rmannibucau.wordpress.com> | Github > >>>>> >>>> <https://github.com/rmannibucau> > >>>>> >>>> | LinkedIn <https://www.linkedin.com/in/rmannibucau> > >>>>> >>> > >>>>> >>> > >>>>> >>> -- > >>>>> >>> Jean-Baptiste Onofré > >>>>> >>> [email protected] > >>>>> >>> http://blog.nanthrax.net > >>>>> >>> Talend - http://www.talend.com > >>>>> >>> > >>>>> >>> > >>>>> >> > >>>>> > > >>> > >>> > >>> > >> > > > > >
