On May 29 2020, at 8:29 am, Jarek Potiuk <jarek.pot...@polidea.com> wrote:
> *Root problem*
>
> I think the root problem is not in the interface/API but the ambiguity of
> the DagRun.conf field. The way it is defined now, It can actually be an
> arbitrary object, not even a dictionary.

So that is a gap in the definition. It _needs_ to be a dictionary/JSON
*Object* specifically, else it will break when it is used.

> but in case of the API, it is just a tool that is used to pass the
> data, and there should be some well-defined and fixed logic about the
> structure

My view is that specifying it as a JSON object in the spec (but saying
nothing about the properties it might contain) is more precise than a
string. Compare these two snippets of OpenAPI spec:


        conf:
          type: string
          description: >
            JSON string (which must contain a top-level JSON object)
describing additional configuration parameters.

vs

        conf:
          type: object
          description: >
            JSON object describing additional configuration parameters.


The first one means technically, acording to the spec you can submit `{
"conf": "abc" }` -- but that will fail. Where as the second means that
both the OpenAPI client, and the automatic validation from Connexion on
the server will handle that for us.

(With the string case, we would have to JSON deserialize it and check
it's an JSON object/ python dict. Nothing else is allowed.)

> *Java case (and other static-typed languages)*

import com.google.gson.Gson;
import com.google.gson.JsonObject;

class Demo {

  public static class Employee
  {
     private Integer id;
     private String firstName;
     private String lastName;

     public Employee(Integer id, String firstName, String lastName){
        this.id = id;
        this.firstName = firstName;
        this.lastName = lastName;
     }
  }

  public static void main(String args[]) {

    Demo.Employee employee = new Demo.Employee(1, "Ash", "Berlin-Taylor");

    Gson gson = new Gson();

    JsonObject e = gson.toJsonTree(employee).getAsJsonObject();

    // Just for debugging/testing
    System.out.println(e.toString());

  }
}

which outputs

  {"id":1,"firstName":"Ash","lastName":"Berlin-Taylor"}


Entirely possible to create arbitrary JSON structures in Java. (You can
also just create a com.google.gson.JsonObject directly and call `.add()`)

To use your example DagRun class, the signature could become:

  public DagRun(String dagRunId, String executionDate, Object conf);

and then the client can do `gson.toJsonTree(conf).getAsJsonObject()` internally.


So I'm still +1 for dict directly. Even more so now I have written this
in Java.

JSON-encoded-string in a JSON api is a "code smell" to me.

-ash


On May 29 2020, at 8:29 am, Jarek Potiuk <jarek.pot...@polidea.com> wrote:

> And one more comment to add - this is the DagRun class I wrote for the
> imaginary Java client. That's all (including the serializer from the
> previous mail) needed to be able to easily  build the POST method call in
> Java.
> I would really like to challenge someone more experienced with Java writes
> or provides some examples, either the class holding arbitrary complex conf
> object or (if we stick to String way of storing the String) - to
> serialize/deserialise the object from son.
> 
> I really believe it is far from trivial and by choosing "object" way of
> sending the conf, we significantly increase the complexity of clients
> accessing Airflow API (which should be our goal) at the expense of
> "slightly" less readable code.
> 
> J.
> 
> 
>    public static class DagRun {
>        private final String dagRunId;
>        private final String executionDate;
>        private final String conf;
> 
>        public String getDagRunId() {
>            return dagRunId;
>        }
> 
>        public String getExecutionDate() {
>            return executionDate;
>        }
> 
>        public String getConf() {
>            return conf;
>        }
> 
>        public DagRun(String dagRunId, String executionDate, String conf){
>            this.dagRunId = dagRunId;
>            this.executionDate =executionDate;
>            this.conf = conf;
>        }
>    }
> 
> J.
> 
> 
> On Fri, May 29, 2020 at 8:40 AM Jarek Potiuk <jarek.pot...@polidea.com>
> wrote:
> 
>> After giving it quite some time to try and think about it this morning,
>> and looking at consequences - I am in strong favour of passing string as
>> conf (Kamil's proposal).
>> 
>> I don't think the dictionary is good. And trying to accommodate both
>> is I
>> think combining the worst of both worlds. Let me explain why.
>> 
>> *Root problem*
>> 
>> I think the root problem is not in the interface/API but the
>> ambiguity of
>> the DagRun.conf field.  The way it is defined now, It can actually be an
>> arbitrary object, not even a dictionary. I could pass it an array,  or
>> bool, or pretty much any serializable object it seems. Except for a few
>> tests where both "string" and "dict" are accepted in the old experimental
>> API I do not see anywhere (please correct me if I am wrong) any kind of
>> specification for the conf object. That indeed makes it rather hard to
>> reason about it in statically typed languages. And while I understand why
>> we need it and why in Python environment we are fairly relaxed about this
>> (JINJA for example) - I do not think this should influence the complexity
>> of the API "structure".
>> 
>> *API Structure: *
>> 
>> I think JSON structure in the API should be fixed and well defined. I
>> think it is much better to be very explicit about the "conf"
>> parameter that
>> it is "string JSON representation" than the arbitrary object. JSON
>> is, of
>> course, dynamic but in case of the API, it is just a tool that is
>> used to
>> pass the data, and there should be some well-defined and fixed logic about
>> the structure.Precisely to make it easier to parse and prepare.  I
>> have a
>> feeling that putting that dynamic nature of JSON into API structure
>> definition is quite an abuse of that dynamic nature.
>> 
>> *Java case (and other static-typed languages)*
>> 
>> I haven't written Java for years, but I decided to give it a try. I tried
>> to see how complex it would be to write serialization/deserialization for
>> that using one of the most common parsers in Java world - Gson.
>> 
>> The string case is super simple:
>> 
>>         public JsonElement serialize(final DagRun dagRun, final Type type,
>> final JsonSerializationContext context) {
>>             JsonObject result = new JsonObject();
>>             result.add("dag_run_id", new
>> JsonPrimitive(dagRun.getDagRunId()));
>>             result.add("execution_date", new
>> JsonPrimitive(dagRun.getExecutionDate()));
>>             result.add("conf",new JsonPrimitive(dagRun.getConf()));
>>             return result;
>>         }
>> 
>> The dynamic object of arbitrary complexity I do not even know how to
>> approach. Eventually what I would have to do is to convert it to JSON
>> String anyway - because that's the only way you can keep arbitrary complex
>> structure in Java.
>> 
>> *Deferring problem?*
>> 
>> Also - I do not think we are deferring the problem for later. The
>> thing is
>> that the only entity that cares about the "content" of the conf being
>> accessible as an object is the Python DAG reading the conf object (likely
>> with some JINJA templates).
>> 
>> We will likely never, ever have to parse,  de-jsonize the string on the
>> client-side. We will just have to prepare it (to send) and possibly display
>> it (if we ever read that conf via API).
>> The imaginary client communicating with Airflow will simply pass whatever
>> the User will tell it to do. And IMHO this is far easier if we do not have
>> to convert it to object on the flight .
>> 
>> I can imagine several use cases for that method:
>> 
>> 1) User types - by hand - the whole "conf" object to pass to the trigger
>> method. Likely typing JSON-string directly. That's a typical case for any
>> kind of CLI I can imagine - where usually you pass JSON string for
>> arbitrary complex objects.
>> 
>> 2) The client-side implementation will define some limited set of
>> parameters that could be used by the user and let the user enter it
>> via a
>> form. Based on that a "conf" object will be created following predefined
>> structure (specific for this case). For example, the user chooses a
>> date to
>> run, and the form produces  {"date: "${DATE_CHOSEN_BY_THE_USER}" } object.
>> 
>> 3) Theoretically, it's possible that user enters arbitrary complex OBJECT
>> via any kind of structured "generic" interface. That would be a nightmare,
>> however (both from the user and developer point of view). So I disregard
>> that option.
>> 
>> In case of 1) we would have to parse the data ??? and turn it into
>> object to serialize. Python can do that, but Java can't easily. And
>> well -
>> there is no point in doing it - we cannot do anything with conf as object,
>> we have no idea if it is valid or not, we have to anyhow pass it to
>> DAG so
>> that DAG can access whatever fields are needed. I think we would be better
>> to pass it as the very string user entered (After we check if this is a
>> valid json - which we can do easily).
>> 
>> In case of 2) it's also super easy to turn such pre-defined structure into
>> JSON string. It's trivial in any language. There is virtually no
>> benefit of
>> passing it as object. - the "slightly" better readabilty maybe the
>> only one
>> 
>> Of course - as Ash and Kaxil mentioned, we could pass both - string or
>> dict, but I think that is very wrong. Should we also allow array
>> (this is a
>> valid json structure)?  Should we allow passing standalone bool, int ..
>> objects (they are valid json).  Also how about sending "STRING" as conf
>> value (is it string or is it json-encoded object). This is a bad, bad idea.
>> 
>> So summarizing - I am strongly for passing "string" rather than object.
>> 
>> J.
>> 
>> 
>> On Fri, May 29, 2020 at 1:58 AM Kaxil Naik <kaxiln...@gmail.com> wrote:
>> 
>>> If the only problem is with Java and not any other popular
>>> languages, I
>>> would say we go for Option (2).
>>> 
>>> If not, supporting both is a good idea.
>>> 
>>> Regards,
>>> Kaxil
>>> 
>>> On Fri, May 29, 2020 at 12:19 AM QP Hou <q...@scribd.com> wrote:
>>> 
>>> > While I understand the difficulty of dealing with nested json without
>>> > predefined schemas, I feel like returning it as a string only delays
>>> > the problem to a later stage since the user will still need to parse
>>> > that string into a strongly typed data structure in order to read the
>>> > values.
>>> >
>>> > I don't have much experience in Java so I can't really comment on
>>> > that. But I can confirm that it's pretty straightforward to deal with
>>> > this in C/C++, Rust and Go.
>>> >
>>> > On Thu, May 28, 2020 at 2:57 PM Ash Berlin-Taylor <a...@apache.org>
>>> wrote:
>>> > >
>>> > > Hi everyone,
>>> > >
>>> > > We're really close to getting the OpenAPI spec merged, just one last
>>> > > question that's come up around how we should handle/represent
>>> > > dagrun.conf to triggerDagRun.
>>> > >
>>> > > Which of the these two do people prefer?
>>> > >
>>> > >
>>> > > POST /api/v1/dags/{dag_id}/dagRuns/{dag_run_id}
>>> > > Content-Type: application/json
>>> > >
>>> > > {
>>> > >   "dag_run_id": "manual_2020-05-28T21:42:36Z",
>>> > >   "execution_date": "2020-05-28T21:42:36Z",
>>> > >   "conf": "{\"key\": \"value\" }"
>>> > > }
>>> > >
>>> > > OR
>>> > >
>>> > >
>>> > > POST /api/v1/dags/{dag_id}/dagRuns/{dag_run_id}
>>> > > Content-Type: application/json
>>> > >
>>> > > {
>>> > >   "dag_run_id": "manual_2020-05-28T21:42:36Z",
>>> > >   "execution_date": "2020-05-28T21:42:36Z",
>>> > >   "conf": {"key": "value"}
>>> > > }
>>> > >
>>> > > i.e. should the schema/type of conf be a (JSON-encoded) string,
>>> or an
>>> > object.
>>> > >
>>> > > I favour the later, Kamil the former. His point is that staticly typed
>>> > > languages, and Java in particular, would be hard to represent this.
>>> > > (Please correct me if I've over-simplified or misunderstood your
>>> > > argument Kamil)
>>> > >
>>> > > Mine was that it's easy enough in Go, for example trigger(dagRunId
>>> str,
>>> > > executionDate *time.Time, conf interface{})`, and double json encoding
>>> > > is always messy/a pain to drive manually on cURL etc.
>>> > >
>>> > >
>>> > > (Using dagRun.conf is quite rare right now, doubly so via the
>>> API, so
>>> I
>>> > > don't think we have any precendent to follow.)
>>> > >
>>> > > Or does anyone feel strongly that we should support both, and have
>>> this
>>> > > in the python side
>>> > >
>>> > >     if conf:
>>> > >         if isinstance(conf, dict):
>>> > >             run_conf = conf
>>> > >         else:
>>> > >             run_conf = json.loads(conf)
>>> > >
>>> > >
>>> > > -ash
>>> >
>>> 
>> 
>> 
>> --
>> 
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>> 
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>> 
>> 
> 
> -- 
> 
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
> 
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
> 

Reply via email to