After giving it quite some time to try and think about it this morning, and
looking at consequences - I am in strong favour of passing string as conf
(Kamil's proposal).

I don't think the dictionary is good. And trying to accommodate both is I
think combining the worst of both worlds. Let me explain why.

*Root problem*

I think the root problem is not in the interface/API but the ambiguity of
the DagRun.conf field.  The way it is defined now, It can actually be an
arbitrary object, not even a dictionary. I could pass it an array,  or
bool, or pretty much any serializable object it seems. Except for a few
tests where both "string" and "dict" are accepted in the old experimental
API I do not see anywhere (please correct me if I am wrong) any kind of
specification for the conf object. That indeed makes it rather hard to
reason about it in statically typed languages. And while I understand why
we need it and why in Python environment we are fairly relaxed about this
(JINJA for example) - I do not think this should influence the complexity
of the API "structure".

*API Structure: *

I think JSON structure in the API should be fixed and well defined. I think
it is much better to be very explicit about the "conf" parameter that it is
"string JSON representation" than the arbitrary object. JSON is, of course,
dynamic but in case of the API, it is just a tool that is used to pass the
data, and there should be some well-defined and fixed logic about the
structure.Precisely to make it easier to parse and prepare.  I have a
feeling that putting that dynamic nature of JSON into API structure
definition is quite an abuse of that dynamic nature.

*Java case (and other static-typed languages)*

I haven't written Java for years, but I decided to give it a try. I tried
to see how complex it would be to write serialization/deserialization for
that using one of the most common parsers in Java world - Gson.

The string case is super simple:

        public JsonElement serialize(final DagRun dagRun, final Type type,
final JsonSerializationContext context) {
            JsonObject result = new JsonObject();
            result.add("dag_run_id", new
JsonPrimitive(dagRun.getDagRunId()));
            result.add("execution_date", new
JsonPrimitive(dagRun.getExecutionDate()));
            result.add("conf",new JsonPrimitive(dagRun.getConf()));
            return result;
        }

The dynamic object of arbitrary complexity I do not even know how to
approach. Eventually what I would have to do is to convert it to JSON
String anyway - because that's the only way you can keep arbitrary complex
structure in Java.

*Deferring problem?*

Also - I do not think we are deferring the problem for later. The thing is
that the only entity that cares about the "content" of the conf being
accessible as an object is the Python DAG reading the conf object (likely
with some JINJA templates).

We will likely never, ever have to parse,  de-jsonize the string on the
client-side. We will just have to prepare it (to send) and possibly display
it (if we ever read that conf via API).
The imaginary client communicating with Airflow will simply pass whatever
the User will tell it to do. And IMHO this is far easier if we do not have
to convert it to object on the flight .

I can imagine several use cases for that method:

1) User types - by hand - the whole "conf" object to pass to the trigger
method. Likely typing JSON-string directly. That's a typical case for any
kind of CLI I can imagine - where usually you pass JSON string for
arbitrary complex objects.

2) The client-side implementation will define some limited set of
parameters that could be used by the user and let the user enter it via a
form. Based on that a "conf" object will be created following predefined
structure (specific for this case). For example, the user chooses a date to
run, and the form produces  {"date: "${DATE_CHOSEN_BY_THE_USER}" } object.

3) Theoretically, it's possible that user enters arbitrary complex OBJECT
via any kind of structured "generic" interface. That would be a nightmare,
however (both from the user and developer point of view). So I disregard
that option.

In case of 1) we would have to parse the data ??? and turn it into
object to serialize. Python can do that, but Java can't easily. And well -
there is no point in doing it - we cannot do anything with conf as object,
we have no idea if it is valid or not, we have to anyhow pass it to DAG so
that DAG can access whatever fields are needed. I think we would be better
to pass it as the very string user entered (After we check if this is a
valid json - which we can do easily).

In case of 2) it's also super easy to turn such pre-defined structure into
JSON string. It's trivial in any language. There is virtually no benefit of
passing it as object. - the "slightly" better readabilty maybe the only one

Of course - as Ash and Kaxil mentioned, we could pass both - string or
dict, but I think that is very wrong. Should we also allow array (this is a
valid json structure)?  Should we allow passing standalone bool, int ..
objects (they are valid json).  Also how about sending "STRING" as conf
value (is it string or is it json-encoded object). This is a bad, bad idea.

So summarizing - I am strongly for passing "string" rather than object.

J.


On Fri, May 29, 2020 at 1:58 AM Kaxil Naik <kaxiln...@gmail.com> wrote:

> If the only problem is with Java and not any other popular languages, I
> would say we go for Option (2).
>
> If not, supporting both is a good idea.
>
> Regards,
> Kaxil
>
> On Fri, May 29, 2020 at 12:19 AM QP Hou <q...@scribd.com> wrote:
>
> > While I understand the difficulty of dealing with nested json without
> > predefined schemas, I feel like returning it as a string only delays
> > the problem to a later stage since the user will still need to parse
> > that string into a strongly typed data structure in order to read the
> > values.
> >
> > I don't have much experience in Java so I can't really comment on
> > that. But I can confirm that it's pretty straightforward to deal with
> > this in C/C++, Rust and Go.
> >
> > On Thu, May 28, 2020 at 2:57 PM Ash Berlin-Taylor <a...@apache.org>
> wrote:
> > >
> > > Hi everyone,
> > >
> > > We're really close to getting the OpenAPI spec merged, just one last
> > > question that's come up around how we should handle/represent
> > > dagrun.conf to triggerDagRun.
> > >
> > > Which of the these two do people prefer?
> > >
> > >
> > > POST /api/v1/dags/{dag_id}/dagRuns/{dag_run_id}
> > > Content-Type: application/json
> > >
> > > {
> > >   "dag_run_id": "manual_2020-05-28T21:42:36Z",
> > >   "execution_date": "2020-05-28T21:42:36Z",
> > >   "conf": "{\"key\": \"value\" }"
> > > }
> > >
> > > OR
> > >
> > >
> > > POST /api/v1/dags/{dag_id}/dagRuns/{dag_run_id}
> > > Content-Type: application/json
> > >
> > > {
> > >   "dag_run_id": "manual_2020-05-28T21:42:36Z",
> > >   "execution_date": "2020-05-28T21:42:36Z",
> > >   "conf": {"key": "value"}
> > > }
> > >
> > > i.e. should the schema/type of conf be a (JSON-encoded) string, or an
> > object.
> > >
> > > I favour the later, Kamil the former. His point is that staticly typed
> > > languages, and Java in particular, would be hard to represent this.
> > > (Please correct me if I've over-simplified or misunderstood your
> > > argument Kamil)
> > >
> > > Mine was that it's easy enough in Go, for example trigger(dagRunId str,
> > > executionDate *time.Time, conf interface{})`, and double json encoding
> > > is always messy/a pain to drive manually on cURL etc.
> > >
> > >
> > > (Using dagRun.conf is quite rare right now, doubly so via the API, so I
> > > don't think we have any precendent to follow.)
> > >
> > > Or does anyone feel strongly that we should support both, and have this
> > > in the python side
> > >
> > >     if conf:
> > >         if isinstance(conf, dict):
> > >             run_conf = conf
> > >         else:
> > >             run_conf = json.loads(conf)
> > >
> > >
> > > -ash
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to