+1 for second option (object).

> My view is that specifying it as a JSON object in the spec (but saying
> nothing about the properties it might contain) is more precise than a
> string.

I fully agree with that.

T.

On Fri, May 29, 2020 at 12:08 PM Ash Berlin-Taylor <a...@apache.org> wrote:

> On May 29 2020, at 8:29 am, Jarek Potiuk <jarek.pot...@polidea.com> wrote:
> > *Root problem*
> >
> > I think the root problem is not in the interface/API but the ambiguity of
> > the DagRun.conf field. The way it is defined now, It can actually be an
> > arbitrary object, not even a dictionary.
>
> So that is a gap in the definition. It _needs_ to be a dictionary/JSON
> *Object* specifically, else it will break when it is used.
>
> > but in case of the API, it is just a tool that is used to pass the
> > data, and there should be some well-defined and fixed logic about the
> > structure
>
> My view is that specifying it as a JSON object in the spec (but saying
> nothing about the properties it might contain) is more precise than a
> string. Compare these two snippets of OpenAPI spec:
>
>
>         conf:
>           type: string
>           description: >
>             JSON string (which must contain a top-level JSON object)
> describing additional configuration parameters.
>
> vs
>
>         conf:
>           type: object
>           description: >
>             JSON object describing additional configuration parameters.
>
>
> The first one means technically, acording to the spec you can submit `{
> "conf": "abc" }` -- but that will fail. Where as the second means that
> both the OpenAPI client, and the automatic validation from Connexion on
> the server will handle that for us.
>
> (With the string case, we would have to JSON deserialize it and check
> it's an JSON object/ python dict. Nothing else is allowed.)
>
> > *Java case (and other static-typed languages)*
>
> import com.google.gson.Gson;
> import com.google.gson.JsonObject;
>
> class Demo {
>
>   public static class Employee
>   {
>      private Integer id;
>      private String firstName;
>      private String lastName;
>
>      public Employee(Integer id, String firstName, String lastName){
>         this.id = id;
>         this.firstName = firstName;
>         this.lastName = lastName;
>      }
>   }
>
>   public static void main(String args[]) {
>
>     Demo.Employee employee = new Demo.Employee(1, "Ash", "Berlin-Taylor");
>
>     Gson gson = new Gson();
>
>     JsonObject e = gson.toJsonTree(employee).getAsJsonObject();
>
>     // Just for debugging/testing
>     System.out.println(e.toString());
>
>   }
> }
>
> which outputs
>
>   {"id":1,"firstName":"Ash","lastName":"Berlin-Taylor"}
>
>
> Entirely possible to create arbitrary JSON structures in Java. (You can
> also just create a com.google.gson.JsonObject directly and call `.add()`)
>
> To use your example DagRun class, the signature could become:
>
>   public DagRun(String dagRunId, String executionDate, Object conf);
>
> and then the client can do `gson.toJsonTree(conf).getAsJsonObject()`
> internally.
>
>
> So I'm still +1 for dict directly. Even more so now I have written this
> in Java.
>
> JSON-encoded-string in a JSON api is a "code smell" to me.
>
> -ash
>
>
> On May 29 2020, at 8:29 am, Jarek Potiuk <jarek.pot...@polidea.com> wrote:
>
> > And one more comment to add - this is the DagRun class I wrote for the
> > imaginary Java client. That's all (including the serializer from the
> > previous mail) needed to be able to easily  build the POST method call in
> > Java.
> > I would really like to challenge someone more experienced with Java
> writes
> > or provides some examples, either the class holding arbitrary complex
> conf
> > object or (if we stick to String way of storing the String) - to
> > serialize/deserialise the object from son.
> >
> > I really believe it is far from trivial and by choosing "object" way of
> > sending the conf, we significantly increase the complexity of clients
> > accessing Airflow API (which should be our goal) at the expense of
> > "slightly" less readable code.
> >
> > J.
> >
> >
> >    public static class DagRun {
> >        private final String dagRunId;
> >        private final String executionDate;
> >        private final String conf;
> >
> >        public String getDagRunId() {
> >            return dagRunId;
> >        }
> >
> >        public String getExecutionDate() {
> >            return executionDate;
> >        }
> >
> >        public String getConf() {
> >            return conf;
> >        }
> >
> >        public DagRun(String dagRunId, String executionDate, String conf){
> >            this.dagRunId = dagRunId;
> >            this.executionDate =executionDate;
> >            this.conf = conf;
> >        }
> >    }
> >
> > J.
> >
> >
> > On Fri, May 29, 2020 at 8:40 AM Jarek Potiuk <jarek.pot...@polidea.com>
> > wrote:
> >
> >> After giving it quite some time to try and think about it this morning,
> >> and looking at consequences - I am in strong favour of passing string as
> >> conf (Kamil's proposal).
> >>
> >> I don't think the dictionary is good. And trying to accommodate both
> >> is I
> >> think combining the worst of both worlds. Let me explain why.
> >>
> >> *Root problem*
> >>
> >> I think the root problem is not in the interface/API but the
> >> ambiguity of
> >> the DagRun.conf field.  The way it is defined now, It can actually be an
> >> arbitrary object, not even a dictionary. I could pass it an array,  or
> >> bool, or pretty much any serializable object it seems. Except for a few
> >> tests where both "string" and "dict" are accepted in the old
> experimental
> >> API I do not see anywhere (please correct me if I am wrong) any kind of
> >> specification for the conf object. That indeed makes it rather hard to
> >> reason about it in statically typed languages. And while I understand
> why
> >> we need it and why in Python environment we are fairly relaxed about
> this
> >> (JINJA for example) - I do not think this should influence the
> complexity
> >> of the API "structure".
> >>
> >> *API Structure: *
> >>
> >> I think JSON structure in the API should be fixed and well defined. I
> >> think it is much better to be very explicit about the "conf"
> >> parameter that
> >> it is "string JSON representation" than the arbitrary object. JSON
> >> is, of
> >> course, dynamic but in case of the API, it is just a tool that is
> >> used to
> >> pass the data, and there should be some well-defined and fixed logic
> about
> >> the structure.Precisely to make it easier to parse and prepare.  I
> >> have a
> >> feeling that putting that dynamic nature of JSON into API structure
> >> definition is quite an abuse of that dynamic nature.
> >>
> >> *Java case (and other static-typed languages)*
> >>
> >> I haven't written Java for years, but I decided to give it a try. I
> tried
> >> to see how complex it would be to write serialization/deserialization
> for
> >> that using one of the most common parsers in Java world - Gson.
> >>
> >> The string case is super simple:
> >>
> >>         public JsonElement serialize(final DagRun dagRun, final Type
> type,
> >> final JsonSerializationContext context) {
> >>             JsonObject result = new JsonObject();
> >>             result.add("dag_run_id", new
> >> JsonPrimitive(dagRun.getDagRunId()));
> >>             result.add("execution_date", new
> >> JsonPrimitive(dagRun.getExecutionDate()));
> >>             result.add("conf",new JsonPrimitive(dagRun.getConf()));
> >>             return result;
> >>         }
> >>
> >> The dynamic object of arbitrary complexity I do not even know how to
> >> approach. Eventually what I would have to do is to convert it to JSON
> >> String anyway - because that's the only way you can keep arbitrary
> complex
> >> structure in Java.
> >>
> >> *Deferring problem?*
> >>
> >> Also - I do not think we are deferring the problem for later. The
> >> thing is
> >> that the only entity that cares about the "content" of the conf being
> >> accessible as an object is the Python DAG reading the conf object
> (likely
> >> with some JINJA templates).
> >>
> >> We will likely never, ever have to parse,  de-jsonize the string on the
> >> client-side. We will just have to prepare it (to send) and possibly
> display
> >> it (if we ever read that conf via API).
> >> The imaginary client communicating with Airflow will simply pass
> whatever
> >> the User will tell it to do. And IMHO this is far easier if we do not
> have
> >> to convert it to object on the flight .
> >>
> >> I can imagine several use cases for that method:
> >>
> >> 1) User types - by hand - the whole "conf" object to pass to the trigger
> >> method. Likely typing JSON-string directly. That's a typical case for
> any
> >> kind of CLI I can imagine - where usually you pass JSON string for
> >> arbitrary complex objects.
> >>
> >> 2) The client-side implementation will define some limited set of
> >> parameters that could be used by the user and let the user enter it
> >> via a
> >> form. Based on that a "conf" object will be created following predefined
> >> structure (specific for this case). For example, the user chooses a
> >> date to
> >> run, and the form produces  {"date: "${DATE_CHOSEN_BY_THE_USER}" }
> object.
> >>
> >> 3) Theoretically, it's possible that user enters arbitrary complex
> OBJECT
> >> via any kind of structured "generic" interface. That would be a
> nightmare,
> >> however (both from the user and developer point of view). So I disregard
> >> that option.
> >>
> >> In case of 1) we would have to parse the data ??? and turn it into
> >> object to serialize. Python can do that, but Java can't easily. And
> >> well -
> >> there is no point in doing it - we cannot do anything with conf as
> object,
> >> we have no idea if it is valid or not, we have to anyhow pass it to
> >> DAG so
> >> that DAG can access whatever fields are needed. I think we would be
> better
> >> to pass it as the very string user entered (After we check if this is a
> >> valid json - which we can do easily).
> >>
> >> In case of 2) it's also super easy to turn such pre-defined structure
> into
> >> JSON string. It's trivial in any language. There is virtually no
> >> benefit of
> >> passing it as object. - the "slightly" better readabilty maybe the
> >> only one
> >>
> >> Of course - as Ash and Kaxil mentioned, we could pass both - string or
> >> dict, but I think that is very wrong. Should we also allow array
> >> (this is a
> >> valid json structure)?  Should we allow passing standalone bool, int ..
> >> objects (they are valid json).  Also how about sending "STRING" as conf
> >> value (is it string or is it json-encoded object). This is a bad, bad
> idea.
> >>
> >> So summarizing - I am strongly for passing "string" rather than object.
> >>
> >> J.
> >>
> >>
> >> On Fri, May 29, 2020 at 1:58 AM Kaxil Naik <kaxiln...@gmail.com> wrote:
> >>
> >>> If the only problem is with Java and not any other popular
> >>> languages, I
> >>> would say we go for Option (2).
> >>>
> >>> If not, supporting both is a good idea.
> >>>
> >>> Regards,
> >>> Kaxil
> >>>
> >>> On Fri, May 29, 2020 at 12:19 AM QP Hou <q...@scribd.com> wrote:
> >>>
> >>> > While I understand the difficulty of dealing with nested json without
> >>> > predefined schemas, I feel like returning it as a string only delays
> >>> > the problem to a later stage since the user will still need to parse
> >>> > that string into a strongly typed data structure in order to read the
> >>> > values.
> >>> >
> >>> > I don't have much experience in Java so I can't really comment on
> >>> > that. But I can confirm that it's pretty straightforward to deal with
> >>> > this in C/C++, Rust and Go.
> >>> >
> >>> > On Thu, May 28, 2020 at 2:57 PM Ash Berlin-Taylor <a...@apache.org>
> >>> wrote:
> >>> > >
> >>> > > Hi everyone,
> >>> > >
> >>> > > We're really close to getting the OpenAPI spec merged, just one
> last
> >>> > > question that's come up around how we should handle/represent
> >>> > > dagrun.conf to triggerDagRun.
> >>> > >
> >>> > > Which of the these two do people prefer?
> >>> > >
> >>> > >
> >>> > > POST /api/v1/dags/{dag_id}/dagRuns/{dag_run_id}
> >>> > > Content-Type: application/json
> >>> > >
> >>> > > {
> >>> > >   "dag_run_id": "manual_2020-05-28T21:42:36Z",
> >>> > >   "execution_date": "2020-05-28T21:42:36Z",
> >>> > >   "conf": "{\"key\": \"value\" }"
> >>> > > }
> >>> > >
> >>> > > OR
> >>> > >
> >>> > >
> >>> > > POST /api/v1/dags/{dag_id}/dagRuns/{dag_run_id}
> >>> > > Content-Type: application/json
> >>> > >
> >>> > > {
> >>> > >   "dag_run_id": "manual_2020-05-28T21:42:36Z",
> >>> > >   "execution_date": "2020-05-28T21:42:36Z",
> >>> > >   "conf": {"key": "value"}
> >>> > > }
> >>> > >
> >>> > > i.e. should the schema/type of conf be a (JSON-encoded) string,
> >>> or an
> >>> > object.
> >>> > >
> >>> > > I favour the later, Kamil the former. His point is that staticly
> typed
> >>> > > languages, and Java in particular, would be hard to represent this.
> >>> > > (Please correct me if I've over-simplified or misunderstood your
> >>> > > argument Kamil)
> >>> > >
> >>> > > Mine was that it's easy enough in Go, for example trigger(dagRunId
> >>> str,
> >>> > > executionDate *time.Time, conf interface{})`, and double json
> encoding
> >>> > > is always messy/a pain to drive manually on cURL etc.
> >>> > >
> >>> > >
> >>> > > (Using dagRun.conf is quite rare right now, doubly so via the
> >>> API, so
> >>> I
> >>> > > don't think we have any precendent to follow.)
> >>> > >
> >>> > > Or does anyone feel strongly that we should support both, and have
> >>> this
> >>> > > in the python side
> >>> > >
> >>> > >     if conf:
> >>> > >         if isinstance(conf, dict):
> >>> > >             run_conf = conf
> >>> > >         else:
> >>> > >             run_conf = json.loads(conf)
> >>> > >
> >>> > >
> >>> > > -ash
> >>> >
> >>>
> >>
> >>
> >> --
> >>
> >> Jarek Potiuk
> >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>
> >> M: +48 660 796 129 <+48660796129>
> >> [image: Polidea] <https://www.polidea.com/>
> >>
> >>
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
>

Reply via email to