After giving it quite some time to try and think about it this morning, and looking at consequences - I am in strong favour of passing string as conf (Kamil's proposal).
I don't think the dictionary is good. And trying to accommodate both is I think combining the worst of both worlds. Let me explain why. *Root problem* I think the root problem is not in the interface/API but the ambiguity of the DagRun.conf field. The way it is defined now, It can actually be an arbitrary object, not even a dictionary. I could pass it an array, or bool, or pretty much any serializable object it seems. Except for a few tests where both "string" and "dict" are accepted in the old experimental API I do not see anywhere (please correct me if I am wrong) any kind of specification for the conf object. That indeed makes it rather hard to reason about it in statically typed languages. And while I understand why we need it and why in Python environment we are fairly relaxed about this (JINJA for example) - I do not think this should influence the complexity of the API "structure". *API Structure: * I think JSON structure in the API should be fixed and well defined. I think it is much better to be very explicit about the "conf" parameter that it is "string JSON representation" than the arbitrary object. JSON is, of course, dynamic but in case of the API, it is just a tool that is used to pass the data, and there should be some well-defined and fixed logic about the structure.Precisely to make it easier to parse and prepare. I have a feeling that putting that dynamic nature of JSON into API structure definition is quite an abuse of that dynamic nature. *Java case (and other static-typed languages)* I haven't written Java for years, but I decided to give it a try. I tried to see how complex it would be to write serialization/deserialization for that using one of the most common parsers in Java world - Gson. The string case is super simple: public JsonElement serialize(final DagRun dagRun, final Type type, final JsonSerializationContext context) { JsonObject result = new JsonObject(); result.add("dag_run_id", new JsonPrimitive(dagRun.getDagRunId())); result.add("execution_date", new JsonPrimitive(dagRun.getExecutionDate())); result.add("conf",new JsonPrimitive(dagRun.getConf())); return result; } The dynamic object of arbitrary complexity I do not even know how to approach. Eventually what I would have to do is to convert it to JSON String anyway - because that's the only way you can keep arbitrary complex structure in Java. *Deferring problem?* Also - I do not think we are deferring the problem for later. The thing is that the only entity that cares about the "content" of the conf being accessible as an object is the Python DAG reading the conf object (likely with some JINJA templates). We will likely never, ever have to parse, de-jsonize the string on the client-side. We will just have to prepare it (to send) and possibly display it (if we ever read that conf via API). The imaginary client communicating with Airflow will simply pass whatever the User will tell it to do. And IMHO this is far easier if we do not have to convert it to object on the flight . I can imagine several use cases for that method: 1) User types - by hand - the whole "conf" object to pass to the trigger method. Likely typing JSON-string directly. That's a typical case for any kind of CLI I can imagine - where usually you pass JSON string for arbitrary complex objects. 2) The client-side implementation will define some limited set of parameters that could be used by the user and let the user enter it via a form. Based on that a "conf" object will be created following predefined structure (specific for this case). For example, the user chooses a date to run, and the form produces {"date: "${DATE_CHOSEN_BY_THE_USER}" } object. 3) Theoretically, it's possible that user enters arbitrary complex OBJECT via any kind of structured "generic" interface. That would be a nightmare, however (both from the user and developer point of view). So I disregard that option. In case of 1) we would have to parse the data ??? and turn it into object to serialize. Python can do that, but Java can't easily. And well - there is no point in doing it - we cannot do anything with conf as object, we have no idea if it is valid or not, we have to anyhow pass it to DAG so that DAG can access whatever fields are needed. I think we would be better to pass it as the very string user entered (After we check if this is a valid json - which we can do easily). In case of 2) it's also super easy to turn such pre-defined structure into JSON string. It's trivial in any language. There is virtually no benefit of passing it as object. - the "slightly" better readabilty maybe the only one Of course - as Ash and Kaxil mentioned, we could pass both - string or dict, but I think that is very wrong. Should we also allow array (this is a valid json structure)? Should we allow passing standalone bool, int .. objects (they are valid json). Also how about sending "STRING" as conf value (is it string or is it json-encoded object). This is a bad, bad idea. So summarizing - I am strongly for passing "string" rather than object. J. On Fri, May 29, 2020 at 1:58 AM Kaxil Naik <kaxiln...@gmail.com> wrote: > If the only problem is with Java and not any other popular languages, I > would say we go for Option (2). > > If not, supporting both is a good idea. > > Regards, > Kaxil > > On Fri, May 29, 2020 at 12:19 AM QP Hou <q...@scribd.com> wrote: > > > While I understand the difficulty of dealing with nested json without > > predefined schemas, I feel like returning it as a string only delays > > the problem to a later stage since the user will still need to parse > > that string into a strongly typed data structure in order to read the > > values. > > > > I don't have much experience in Java so I can't really comment on > > that. But I can confirm that it's pretty straightforward to deal with > > this in C/C++, Rust and Go. > > > > On Thu, May 28, 2020 at 2:57 PM Ash Berlin-Taylor <a...@apache.org> > wrote: > > > > > > Hi everyone, > > > > > > We're really close to getting the OpenAPI spec merged, just one last > > > question that's come up around how we should handle/represent > > > dagrun.conf to triggerDagRun. > > > > > > Which of the these two do people prefer? > > > > > > > > > POST /api/v1/dags/{dag_id}/dagRuns/{dag_run_id} > > > Content-Type: application/json > > > > > > { > > > "dag_run_id": "manual_2020-05-28T21:42:36Z", > > > "execution_date": "2020-05-28T21:42:36Z", > > > "conf": "{\"key\": \"value\" }" > > > } > > > > > > OR > > > > > > > > > POST /api/v1/dags/{dag_id}/dagRuns/{dag_run_id} > > > Content-Type: application/json > > > > > > { > > > "dag_run_id": "manual_2020-05-28T21:42:36Z", > > > "execution_date": "2020-05-28T21:42:36Z", > > > "conf": {"key": "value"} > > > } > > > > > > i.e. should the schema/type of conf be a (JSON-encoded) string, or an > > object. > > > > > > I favour the later, Kamil the former. His point is that staticly typed > > > languages, and Java in particular, would be hard to represent this. > > > (Please correct me if I've over-simplified or misunderstood your > > > argument Kamil) > > > > > > Mine was that it's easy enough in Go, for example trigger(dagRunId str, > > > executionDate *time.Time, conf interface{})`, and double json encoding > > > is always messy/a pain to drive manually on cURL etc. > > > > > > > > > (Using dagRun.conf is quite rare right now, doubly so via the API, so I > > > don't think we have any precendent to follow.) > > > > > > Or does anyone feel strongly that we should support both, and have this > > > in the python side > > > > > > if conf: > > > if isinstance(conf, dict): > > > run_conf = conf > > > else: > > > run_conf = json.loads(conf) > > > > > > > > > -ash > > > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>