Some inputs - maybe comments? - would be appreciated on: 1. Why using json in https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/TestPipeline.java#L480 and not iterate over the options? 2. Also this is likely used to do a fromArgs and create another pipeline options later no? Is this api that useful here or does it need an options composition - fully object?
Le 11 janv. 2018 19:09, "Lukasz Cwik" <[email protected]> a écrit : > Give links to the code segments that you want background on. > > On Wed, Jan 10, 2018 at 12:44 AM, Romain Manni-Bucau < > [email protected]> wrote: > >> updated my junit 5 PR to show that: https://github.com/apach >> e/beam/pull/4360/files#diff-578d1770f8b47ebbc1e74a2c19de9a6aR28 >> >> It doesn't remove jackson yet but exposes a nicer user interface for the >> config. >> >> I'm not fully clear on all the jackson usage yet, there are some round >> trips (PO -> json -> PO) which are weird without more knowledge. >> >> >> Romain Manni-Bucau >> @rmannibucau <https://twitter.com/rmannibucau> | Blog >> <https://rmannibucau.metawerx.net/> | Old Blog >> <http://rmannibucau.wordpress.com> | Github >> <https://github.com/rmannibucau> | LinkedIn >> <https://www.linkedin.com/in/rmannibucau> >> >> 2018-01-09 22:57 GMT+01:00 Lukasz Cwik <[email protected]>: >> >>> Removing large dependencies is great but I can only assume it will be a >>> lot of work so if you can get it to work in a backwards compatible way >>> great. >>> >>> On Tue, Jan 9, 2018 at 1:52 PM, Romain Manni-Bucau < >>> [email protected]> wrote: >>> >>>> It conflicts easily between libs and containers. Shade is not a good >>>> option too - see the thread on this topic :(. >>>> >>>> At the end i see using the cli solution closer to user - vs framework >>>> dev for json - and hurting less in terms of classpath so pby sthg to test >>>> no? >>>> >>>> Le 9 janv. 2018 22:47, "Lukasz Cwik" <[email protected]> a écrit : >>>> >>>>> Romain, how has Jackson been a classpath breaker? >>>>> >>>>> >>>>> On Tue, Jan 9, 2018 at 1:20 PM, Romain Manni-Bucau < >>>>> [email protected]> wrote: >>>>> >>>>>> Hmm, beam already owns the cli parsing - that is what I meant - it >>>>>> only misses the arg delimiter (ie quoting) and adding it is easy no? >>>>>> >>>>>> Le 9 janv. 2018 21:19, "Robert Bradshaw" <[email protected]> a >>>>>> écrit : >>>>>> >>>>>>> On Tue, Jan 9, 2018 at 11:48 AM, Romain Manni-Bucau >>>>>>> <[email protected]> wrote: >>>>>>> > >>>>>>> > Le 9 janv. 2018 19:50, "Robert Bradshaw" <[email protected]> a >>>>>>> écrit : >>>>>>> > >>>>>>> > From what I understand: >>>>>>> > >>>>>>> > 1) The command line argument values (both simple and more complex) >>>>>>> are >>>>>>> > all JSON-representable. >>>>>>> > >>>>>>> > And must be all CLI representable >>>>>>> >>>>>>> Sorry, I should have said "all pipeline options." In any case, one >>>>>>> can >>>>>>> always do JSON -> string and the resulting string is usable as a >>>>>>> command line argument. >>>>>>> >>>>>>> > 2) The command line is a mapping of keys to these values. >>>>>>> > >>>>>>> > Which makes your 1 not true since json supports more ;) >>>>>>> > >>>>>>> > As such, it seems quite natural to represent the whole set as a >>>>>>> single >>>>>>> > JSON map, rather than using a different, custom encoding for the >>>>>>> top >>>>>>> > level (whose custom escaping would have to be carried into the >>>>>>> inner >>>>>>> > JSON values). Note that JSON has the advantage that one never >>>>>>> needs to >>>>>>> > explain or define it, and parsers/serializers already exists for >>>>>>> all >>>>>>> > languages (e.g. if one has a driver script in another language for >>>>>>> > launching a java pipeline, it's easy to communicate all the args). >>>>>>> > >>>>>>> > Same reasonning applies to CLI AFAIK >>>>>>> >>>>>>> The spec of what a valid command line argument list is is >>>>>>> surprisingly >>>>>>> inconsistent across platforms, languages, and programs. And your >>>>>>> proposal seems to be getting into what the delimiter is, and how to >>>>>>> escape it, and possibly then how to escape the escape character. All >>>>>>> of this is sidestepped by pointing at an existing spec. >>>>>>> >>>>>>> > We can't get rid of Jackson in the core because of (1) so there's >>>>>>> > little value in adding complexity to remove it from (2). The fact >>>>>>> that >>>>>>> > Java doesn't ship anything in its expansive standard library for >>>>>>> this >>>>>>> > is unfortuante, so we have to take a dependency on something. >>>>>>> > >>>>>>> > We actually can as shown before >>>>>>> >>>>>>> How, if JSON is integral to the parsing of the argument values >>>>>>> themselves? (Or is the argument that it's not, but each extender of >>>>>>> PipelineOptions is responsible for choosing a JSON parser and doing >>>>>>> it >>>>>>> themselves.) >>>>>>> >>>>>>> > and jackson is not a simple negligible lib >>>>>>> > but a classpath breaker :( >>>>>>> >>>>>>> Perhaps there are better libraries we could use instead? Is the >>>>>>> ecosystem so bad that it encourages projects to roll their own of >>>>>>> everything rather than share code? >>>>>>> >>>>>>> > On Tue, Jan 9, 2018 at 10:00 AM, Lukasz Cwik <[email protected]> >>>>>>> wrote: >>>>>>> >> Ah, you don't want JSON within JSON. I see, if thats the case >>>>>>> just migrate >>>>>>> >> all of them to string tokenization and drop the Jackson usage for >>>>>>> string >>>>>>> >> -> >>>>>>> >> string[] conversion. >>>>>>> >> >>>>>>> >> On Mon, Jan 8, 2018 at 10:06 PM, Romain Manni-Bucau >>>>>>> >> <[email protected]> >>>>>>> >> wrote: >>>>>>> >>> >>>>>>> >>> Lukasz, the use case is to znsure the config used can still map >>>>>>> the CLI >>>>>>> >>> and that options dont start to abuse json so it is exactly the >>>>>>> opposite >>>>>>> >>> of >>>>>>> >>> "we can be fancy" and is closer to "we can be robust". Also the >>>>>>> default >>>>>>> >>> should be easy and not json (i just want to set the runner, >>>>>>> --runner=xxx >>>>>>> >>> is >>>>>>> >>> easier than a json version). If the value doesnt start with a [ >>>>>>> we can >>>>>>> >>> use >>>>>>> >>> the tokenizer else jackson, wdyt? >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> Le 8 janv. 2018 22:59, "Lukasz Cwik" <[email protected]> a écrit >>>>>>> : >>>>>>> >>> >>>>>>> >>> Now that I think about this more. I looked at some of the >>>>>>> examples in the >>>>>>> >>> pom.xml and they don't seem to be tricky to write in JSON. >>>>>>> >>> I also looked at the Jenkins job configurations (specifically the >>>>>>> >>> performance tests) and they pass around maps which they convert >>>>>>> to the >>>>>>> >>> required format without needing a user to write it themselves. >>>>>>> >>> Using gradle, we will be able to trivially to do the same thing >>>>>>> (convert >>>>>>> >>> maps to json without needing the person to write it by hand) >>>>>>> like groovy >>>>>>> >>> does. >>>>>>> >>> Since we are migrating away from Maven it doesn't seem >>>>>>> worthwhile to >>>>>>> >>> spend >>>>>>> >>> time on to make it easier to write the args in the Maven poms. >>>>>>> >>> >>>>>>> >>> Is there another use case that is being missed? >>>>>>> >>> >>>>>>> >>> On Mon, Jan 8, 2018 at 1:38 PM, Romain Manni-Bucau >>>>>>> >>> <[email protected]> >>>>>>> >>> wrote: >>>>>>> >>>> >>>>>>> >>>> Good point for \t and ,. >>>>>>> >>>> >>>>>>> >>>> Any objection to use jackson as a fallback for that purpose - >>>>>>> for >>>>>>> >>>> backwqrd compat only - and make it optional then? Will create >>>>>>> the ticket >>>>>>> >>>> if >>>>>>> >>>> not. >>>>>>> >>>> >>>>>>> >>>> Le 8 janv. 2018 20:32, "Robert Bradshaw" <[email protected]> >>>>>>> a écrit : >>>>>>> >>>>> >>>>>>> >>>>> Part of the motivation to use JSON for more complex options >>>>>>> was that >>>>>>> >>>>> it avoids having to define (and document, test, have users >>>>>>> learn, ...) >>>>>>> >>>>> yet another format for expressing lists, maps, etc. >>>>>>> >>>>> >>>>>>> >>>>> On Mon, Jan 8, 2018 at 11:19 AM, Lukasz Cwik <[email protected]> >>>>>>> wrote: >>>>>>> >>>>> > Ken, this is specifically about running integration tests >>>>>>> and not a >>>>>>> >>>>> > users >>>>>>> >>>>> > main(). >>>>>>> >>>>> > >>>>>>> >>>>> > Note, that PipelineOptions JSON format was used because it >>>>>>> was a >>>>>>> >>>>> > convenient >>>>>>> >>>>> > serialized form that is easy to explain to people what is >>>>>>> required. >>>>>>> >>>>> > Using a different string tokenizer and calling >>>>>>> >>>>> > PipelineOptionsFactory.fromArgs() with the parsed strings >>>>>>> seems like >>>>>>> >>>>> > it >>>>>>> >>>>> > would be better. >>>>>>> >>>>> > >>>>>>> >>>>> > These are the supported formats for fromArgs(): >>>>>>> >>>>> > * --project=MyProject (simple property, will set the >>>>>>> "project" >>>>>>> >>>>> > property >>>>>>> >>>>> > to "MyProject") >>>>>>> >>>>> > * --readOnly=true (for boolean properties, will set the >>>>>>> >>>>> > "readOnly" >>>>>>> >>>>> > property to "true") >>>>>>> >>>>> > * --readOnly (shorthand for boolean properties, will >>>>>>> set the >>>>>>> >>>>> > "readOnly" >>>>>>> >>>>> > property to "true") >>>>>>> >>>>> > * --x=1 --x=2 --x=3 (list style simple property, will >>>>>>> set the >>>>>>> >>>>> > "x" >>>>>>> >>>>> > property to [1, 2, 3]) >>>>>>> >>>>> > * --x=1,2,3 (shorthand list style simple property, will >>>>>>> set the >>>>>>> >>>>> > "x" >>>>>>> >>>>> > property to [1, 2, 3]) >>>>>>> >>>>> > * --complexObject='{"key1":"value1",...} (JSON format >>>>>>> for all >>>>>>> >>>>> > other >>>>>>> >>>>> > complex types) >>>>>>> >>>>> > >>>>>>> >>>>> > Using a string tokenizer that minimizes the number of >>>>>>> required escape >>>>>>> >>>>> > characters would be good so we could use newline characters >>>>>>> as our >>>>>>> >>>>> > only >>>>>>> >>>>> > token. I would avoid ',\t ' as tokens since they are more >>>>>>> likely to >>>>>>> >>>>> > appear. >>>>>>> >>>>> > >>>>>>> >>>>> > On Mon, Jan 8, 2018 at 10:33 AM, Kenneth Knowles < >>>>>>> [email protected]> >>>>>>> >>>>> > wrote: >>>>>>> >>>>> >> >>>>>>> >>>>> >> We do have a plain command line syntax, and whoever writes >>>>>>> the >>>>>>> >>>>> >> main(String[]) function is responsible for invoking the >>>>>>> parser. It >>>>>>> >>>>> >> isn't >>>>>>> >>>>> >> quite as nice as standard arg parse libraries, but it isn't >>>>>>> too bad. >>>>>>> >>>>> >> It >>>>>>> >>>>> >> would be great to improve, though. >>>>>>> >>>>> >> >>>>>>> >>>>> >> Jackson is for machine-to-machine communication or other >>>>>>> situations >>>>>>> >>>>> >> where >>>>>>> >>>>> >> command line parsing doesn't work so well. >>>>>>> >>>>> >> >>>>>>> >>>>> >> Are we using these some other way? >>>>>>> >>>>> >> >>>>>>> >>>>> >> Kenn >>>>>>> >>>>> >> >>>>>>> >>>>> >> On Sun, Jan 7, 2018 at 7:21 AM, Romain Manni-Bucau >>>>>>> >>>>> >> <[email protected]> >>>>>>> >>>>> >> wrote: >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> Le 7 janv. 2018 12:53, "Jean-Baptiste Onofré" < >>>>>>> [email protected]> a >>>>>>> >>>>> >>> écrit : >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> Hi Romain, >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> I guess you are assuming that pipeline options are flat >>>>>>> command >>>>>>> >>>>> >>> line >>>>>>> >>>>> >>> like >>>>>>> >>>>> >>> argument, right ? >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> Actually, theoretically, pipeline options can be >>>>>>> represented as >>>>>>> >>>>> >>> json, >>>>>>> >>>>> >>> that's why we use jackson. >>>>>>> >>>>> >>> The pipeline options can be serialized/deserialized as >>>>>>> json. >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> Yes but if users (or dev ;)) start to use that then it is >>>>>>> trivial >>>>>>> >>>>> >>> to >>>>>>> >>>>> >>> break the cli handling and fromArgs parsing or, if not, >>>>>>> break the >>>>>>> >>>>> >>> user >>>>>>> >>>>> >>> experience. So at the end it is a kind of "yes but no", >>>>>>> right? >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> PS: already see some advanced users having a headache >>>>>>> trying to >>>>>>> >>>>> >>> pass >>>>>>> >>>>> >>> pipeline options in json so using the plain command line >>>>>>> syntax can >>>>>>> >>>>> >>> be more >>>>>>> >>>>> >>> friendly too. >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> The purpose is to remove the jackson dependencies ? >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> Later yes, seeing core dep tree I identify a lot of dep >>>>>>> which can >>>>>>> >>>>> >>> conflict in some env and are not really needed or bring >>>>>>> much being >>>>>>> >>>>> >>> in the >>>>>>> >>>>> >>> core - like avro as mentionned in another thread. Can need >>>>>>> a >>>>>>> >>>>> >>> sanitizing >>>>>>> >>>>> >>> round. Short term it was really a "why is it that >>>>>>> complicated" ;). >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> Regards >>>>>>> >>>>> >>> JB >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> On 01/07/2018 11:38 AM, Romain Manni-Bucau wrote: >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> Hi guys, >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> not sure i fully get why jackson is used to parse >>>>>>> pipeline options >>>>>>> >>>>> >>>> in >>>>>>> >>>>> >>>> the testing integration >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> why not using a token parser like [1] which allows to map >>>>>>> 1-1 with >>>>>>> >>>>> >>>> the >>>>>>> >>>>> >>>> user interface (command line) the options? >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> [1] >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> https://github.com/Talend/comp >>>>>>> onent-runtime/blob/master/component-server/src/main/java/org >>>>>>> /talend/sdk/component/server/lang/StringPropertiesTokenizer.java >>>>>>> >>>>> >>>> >>>>>>> >>>>> >>>> Romain Manni-Bucau >>>>>>> >>>>> >>>> @rmannibucau <https://twitter.com/rmannibucau> | Blog >>>>>>> >>>>> >>>> <https://rmannibucau.metawerx.net/> | Old Blog >>>>>>> >>>>> >>>> <http://rmannibucau.wordpress.com> | Github >>>>>>> >>>>> >>>> <https://github.com/rmannibucau> >>>>>>> >>>>> >>>> | LinkedIn <https://www.linkedin.com/in/rmannibucau> >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> -- >>>>>>> >>>>> >>> Jean-Baptiste Onofré >>>>>>> >>>>> >>> [email protected] >>>>>>> >>>>> >>> http://blog.nanthrax.net >>>>>>> >>>>> >>> Talend - http://www.talend.com >>>>>>> >>>>> >>> >>>>>>> >>>>> >>> >>>>>>> >>>>> >> >>>>>>> >>>>> > >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >> >>>>>>> > >>>>>>> > >>>>>>> >>>>>> >>>>> >>> >> >
