I can't really imagine a good way of supporting "dynamic python" via
external parameters (pickling AAAAAAARGH!). So still I think your idea
of deprecating non-serializable parameters is good.

True - YAML serialization does not support "all-python", but it does
support "all data structures".  But there is one difference vs
"json-serializable".

But in this case "non YAML serializable" as opposed to "non-json
serializable" check we could introduce it very fast - even at patch
level. We do not have to wait for Airflow 3 IMHO. We already know that
people are using "sets" and this basically means that we have to go
the "deprecation" route. On the other hand allowing arbitrary dynamic
python parameters and not data structures should be considered as
"accidental" (it was never an intention) and not really part of the
public API, so we can simply "fix it" in 2.2.3 by requiring the
parameters to be YAML-serializable.

J.



On Wed, Nov 10, 2021 at 11:11 PM Daniel Standish <[email protected]> wrote:
>
> OK but if you extend to yaml, you don't resolve the problem but only move the 
> goalpost a little bit.  Because then the question remains; enforce 
> yaml-serializable or arbitrary python (which is the soon-to-be status quo 
> after 2.2.2 is released).
>
> On Wed, Nov 10, 2021 at 1:26 PM Jarek Potiuk <[email protected]> wrote:
>>
>> Very good question.
>>
>> Maybe we can solve it differently. We could recommend JSON and the
>> basic datasets that JSON supports, but it should be possible for
>> Airflow to support YAML as input format for params.
>>
>> Every JSON is also valid YAML and you can use Yaml parser to parse
>> JSON. Yaml supports sets (though the syntax is cumbersome)
>> https://yaml.org/type/set.html . So if we support YAML for both API
>> and UI, if you REALLY want to use set - you will be able to .
>>
>> My proposal is that all the examples etc. could show JSON and we
>> should support it, but there should be an asterisk (*) that if you
>> want advanced features like YAML, you could use YAML. Then we would
>> not have to deal with deprecation which is problematic as we could
>> only remove this feature in Airflow 3.
>>
>> J.
>>
>> On Wed, Nov 10, 2021 at 7:14 PM Daniel Standish <[email protected]> wrote:
>> >
>> > Prior to 2.2.0, you could use non-json-serializable params in a dag. 
>> > Here's an example with `set`:
>> >
>> > @dag.task(params={"a": {1, 2, 3}, "b": [3, 4, 5]})
>> > def set_param(intersection):
>> >     print(intersection)
>> >
>> >
>> > set_param("{{ params.a.intersection(params.b).pop() }}")
>> >
>> > In 2.2.0 this was broken, and in 2.2.2rc1 it should be fixed.
>> >
>> > There's a problem though.  An important part of params is the ability to 
>> > override them when triggering a dag from the web UI or CLI.  But params 
>> > supplied through either mechanism should be json-serializable.
>> >
>> > So on one hand we allow arbitrary params, and on the other hand we require 
>> > them to be json serializable.  We can see how this causes a problem in the 
>> > above example: if you provide `a` as a `list` in the UI, it won't have the 
>> > method `intersection`.
>> >
>> > So the question: should we deprecate non-json-serializable params?
>> >
>> > My feeling is yes.  But I'm not sure how widely this flexibility is used 
>> > in the wild.
>> >

Reply via email to