Shouldn't serialization be left to each custom backend?

On Wed, Dec 2, 2020, 8:11 AM Tomasz Urbaszek <[email protected]> wrote:

> Thanks Ry!
>
> > This will allow us to put scone forward as a strong feature rather than
> how it has been historically portrayed as flawed/limited.
>
> This is a good point and I agree that custom backends may increase Airflow
> competitiveness.
>
> However, if we decide to include them in core we need to answer the old
> question - what and how do we serialize objects to persist them?
>
> I'm quite sure we don't want to use pickle, especially when the data is
> retrieved from external systems. Using json by default would kill the power
> of custom  XComs. The only option that comes to my mind is an additional
> function (in airflow_local_settings) that can be defined by users and will
> be used in custom XCom to serialize / deserialize an object. In this way
> users can reuse the "upload/download" part from Airflow codebase, but can
> customize serialization methods for different data types (for example
> pandas > avro, pandas > csv).
>
> What do others think?
>
> Tomek
>
>
> On Wed, Dec 2, 2020 at 4:23 PM Ry Walker <[email protected]> wrote:
>
>> Ha “xcom” was autocorrected to “scone” on my phone, didn’t notice :)
>>
>> On Wed, Dec 2, 2020 at 10:22 AM Ry Walker <[email protected]> wrote:
>>
>>> I’m in favor of including a few backends in core, including some that
>>> can handle larger data, for the sake of Airflow usability and its
>>> competitive positioning.
>>>
>>> This will allow us to put scone forward as a strong feature rather than
>>> how it has been historically portrayed as flawed/limited.
>>>
>>>
>>> On Wed, Dec 2, 2020 at 9:49 AM Tomasz Urbaszek <[email protected]>
>>> wrote:
>>>
>>>> Hello all,
>>>>
>>>> Airflow 2.0 release is sooner and sooner. I would like to start a
>>>> discussion about custom XCom backends.
>>>>
>>>> First of all, if you don't know it - since 1.10.12 users can use a
>>>> custom XCom class that will override serialize and deserialize
>>>> methods. Docs:
>>>> https://airflow.apache.org/docs/stable/concepts.html#custom-xcom-backend
>>>>
>>>> This feature allows users the following things:
>>>> - reduce boilerplate code responsible for downloading / uploading data
>>>> in operators (it's handled by custom XCom)
>>>> - use different storage for XCom data (other database, buckets, cache
>>>> etc.)
>>>> - verifying XCom data on read/write operations
>>>> - and anything else that may be feasible
>>>>
>>>> Some examples:
>>>> https://github.com/apache/airflow/pull/12733
>>>>
>>>> https://www.polidea.com/blog/airflow-2-0-dag-authoring-redesigned/#custom-xcom-backends-8560
>>>>
>>>> The point I want to raise (as I did in this PR
>>>> https://github.com/apache/airflow/pull/12733) is to discuss if we as a
>>>> community want to have custom XComs in our codebase (core or
>>>> providers). I'm happy to hear what the community thinks about it?
>>>>
>>>> From my side, I'm leaning toward creating better documentation around
>>>> this feature (with examples and suggestions) instead of accepting
>>>> XComs to code base. My main concern is that custom XComs are easy to
>>>> write (using for example hooks) and will work best when they are built
>>>> to suit exact users' needs. On the other hand, I see some potential in
>>>> "low level" XComs that just implement logic of storing and retrieving
>>>> data from particular storage. But anything that gets too use-case /
>>>> data type specific should not be accepted.
>>>>
>>>> Cheers,
>>>> Tomek
>>>>
>>> --
>>> Sent from Gmail Mobile
>>>
>> --
>> Sent from Gmail Mobile
>>
>

Reply via email to