Option 2 - IMO it’s the path of least surprise. Also, having to encode JSON as string can be cumbersome.
Supporting both options is a really no-no to me. Bas > On 29 May 2020, at 08:40, Jarek Potiuk <jarek.pot...@polidea.com> wrote: > > After giving it quite some time to try and think about it this morning, and > looking at consequences - I am in strong favour of passing string as conf > (Kamil's proposal). > > I don't think the dictionary is good. And trying to accommodate both is I > think combining the worst of both worlds. Let me explain why. > > *Root problem* > > I think the root problem is not in the interface/API but the ambiguity of > the DagRun.conf field. The way it is defined now, It can actually be an > arbitrary object, not even a dictionary. I could pass it an array, or > bool, or pretty much any serializable object it seems. Except for a few > tests where both "string" and "dict" are accepted in the old experimental > API I do not see anywhere (please correct me if I am wrong) any kind of > specification for the conf object. That indeed makes it rather hard to > reason about it in statically typed languages. And while I understand why > we need it and why in Python environment we are fairly relaxed about this > (JINJA for example) - I do not think this should influence the complexity > of the API "structure". > > *API Structure: * > > I think JSON structure in the API should be fixed and well defined. I think > it is much better to be very explicit about the "conf" parameter that it is > "string JSON representation" than the arbitrary object. JSON is, of course, > dynamic but in case of the API, it is just a tool that is used to pass the > data, and there should be some well-defined and fixed logic about the > structure.Precisely to make it easier to parse and prepare. I have a > feeling that putting that dynamic nature of JSON into API structure > definition is quite an abuse of that dynamic nature. > > *Java case (and other static-typed languages)* > > I haven't written Java for years, but I decided to give it a try. I tried > to see how complex it would be to write serialization/deserialization for > that using one of the most common parsers in Java world - Gson. > > The string case is super simple: > > public JsonElement serialize(final DagRun dagRun, final Type type, > final JsonSerializationContext context) { > JsonObject result = new JsonObject(); > result.add("dag_run_id", new > JsonPrimitive(dagRun.getDagRunId())); > result.add("execution_date", new > JsonPrimitive(dagRun.getExecutionDate())); > result.add("conf",new JsonPrimitive(dagRun.getConf())); > return result; > } > > The dynamic object of arbitrary complexity I do not even know how to > approach. Eventually what I would have to do is to convert it to JSON > String anyway - because that's the only way you can keep arbitrary complex > structure in Java. > > *Deferring problem?* > > Also - I do not think we are deferring the problem for later. The thing is > that the only entity that cares about the "content" of the conf being > accessible as an object is the Python DAG reading the conf object (likely > with some JINJA templates). > > We will likely never, ever have to parse, de-jsonize the string on the > client-side. We will just have to prepare it (to send) and possibly display > it (if we ever read that conf via API). > The imaginary client communicating with Airflow will simply pass whatever > the User will tell it to do. And IMHO this is far easier if we do not have > to convert it to object on the flight . > > I can imagine several use cases for that method: > > 1) User types - by hand - the whole "conf" object to pass to the trigger > method. Likely typing JSON-string directly. That's a typical case for any > kind of CLI I can imagine - where usually you pass JSON string for > arbitrary complex objects. > > 2) The client-side implementation will define some limited set of > parameters that could be used by the user and let the user enter it via a > form. Based on that a "conf" object will be created following predefined > structure (specific for this case). For example, the user chooses a date to > run, and the form produces {"date: "${DATE_CHOSEN_BY_THE_USER}" } object. > > 3) Theoretically, it's possible that user enters arbitrary complex OBJECT > via any kind of structured "generic" interface. That would be a nightmare, > however (both from the user and developer point of view). So I disregard > that option. > > In case of 1) we would have to parse the data ??? and turn it into > object to serialize. Python can do that, but Java can't easily. And well - > there is no point in doing it - we cannot do anything with conf as object, > we have no idea if it is valid or not, we have to anyhow pass it to DAG so > that DAG can access whatever fields are needed. I think we would be better > to pass it as the very string user entered (After we check if this is a > valid json - which we can do easily). > > In case of 2) it's also super easy to turn such pre-defined structure into > JSON string. It's trivial in any language. There is virtually no benefit of > passing it as object. - the "slightly" better readabilty maybe the only one > > Of course - as Ash and Kaxil mentioned, we could pass both - string or > dict, but I think that is very wrong. Should we also allow array (this is a > valid json structure)? Should we allow passing standalone bool, int .. > objects (they are valid json). Also how about sending "STRING" as conf > value (is it string or is it json-encoded object). This is a bad, bad idea. > > So summarizing - I am strongly for passing "string" rather than object. > > J. > > > On Fri, May 29, 2020 at 1:58 AM Kaxil Naik <kaxiln...@gmail.com> wrote: > >> If the only problem is with Java and not any other popular languages, I >> would say we go for Option (2). >> >> If not, supporting both is a good idea. >> >> Regards, >> Kaxil >> >> On Fri, May 29, 2020 at 12:19 AM QP Hou <q...@scribd.com> wrote: >> >>> While I understand the difficulty of dealing with nested json without >>> predefined schemas, I feel like returning it as a string only delays >>> the problem to a later stage since the user will still need to parse >>> that string into a strongly typed data structure in order to read the >>> values. >>> >>> I don't have much experience in Java so I can't really comment on >>> that. But I can confirm that it's pretty straightforward to deal with >>> this in C/C++, Rust and Go. >>> >>> On Thu, May 28, 2020 at 2:57 PM Ash Berlin-Taylor <a...@apache.org> >> wrote: >>>> >>>> Hi everyone, >>>> >>>> We're really close to getting the OpenAPI spec merged, just one last >>>> question that's come up around how we should handle/represent >>>> dagrun.conf to triggerDagRun. >>>> >>>> Which of the these two do people prefer? >>>> >>>> >>>> POST /api/v1/dags/{dag_id}/dagRuns/{dag_run_id} >>>> Content-Type: application/json >>>> >>>> { >>>> "dag_run_id": "manual_2020-05-28T21:42:36Z", >>>> "execution_date": "2020-05-28T21:42:36Z", >>>> "conf": "{\"key\": \"value\" }" >>>> } >>>> >>>> OR >>>> >>>> >>>> POST /api/v1/dags/{dag_id}/dagRuns/{dag_run_id} >>>> Content-Type: application/json >>>> >>>> { >>>> "dag_run_id": "manual_2020-05-28T21:42:36Z", >>>> "execution_date": "2020-05-28T21:42:36Z", >>>> "conf": {"key": "value"} >>>> } >>>> >>>> i.e. should the schema/type of conf be a (JSON-encoded) string, or an >>> object. >>>> >>>> I favour the later, Kamil the former. His point is that staticly typed >>>> languages, and Java in particular, would be hard to represent this. >>>> (Please correct me if I've over-simplified or misunderstood your >>>> argument Kamil) >>>> >>>> Mine was that it's easy enough in Go, for example trigger(dagRunId str, >>>> executionDate *time.Time, conf interface{})`, and double json encoding >>>> is always messy/a pain to drive manually on cURL etc. >>>> >>>> >>>> (Using dagRun.conf is quite rare right now, doubly so via the API, so I >>>> don't think we have any precendent to follow.) >>>> >>>> Or does anyone feel strongly that we should support both, and have this >>>> in the python side >>>> >>>> if conf: >>>> if isinstance(conf, dict): >>>> run_conf = conf >>>> else: >>>> run_conf = json.loads(conf) >>>> >>>> >>>> -ash >>> >> > > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/>