> Shouldn't serialization be left to each custom backend?

In my opinion - yes. That's why I'm not 100% convinced to have custom xcoms
in core/providers. But if we decide to have them, then we have to decide on
a serialization mechanism I think.

On Wed, Dec 2, 2020 at 5:16 PM Daniel Standish <[email protected]> wrote:

> Shouldn't serialization be left to each custom backend?
>
> On Wed, Dec 2, 2020, 8:11 AM Tomasz Urbaszek <[email protected]> wrote:
>
>> Thanks Ry!
>>
>> > This will allow us to put scone forward as a strong feature rather than
>> how it has been historically portrayed as flawed/limited.
>>
>> This is a good point and I agree that custom backends may increase
>> Airflow competitiveness.
>>
>> However, if we decide to include them in core we need to answer the old
>> question - what and how do we serialize objects to persist them?
>>
>> I'm quite sure we don't want to use pickle, especially when the data is
>> retrieved from external systems. Using json by default would kill the power
>> of custom  XComs. The only option that comes to my mind is an additional
>> function (in airflow_local_settings) that can be defined by users and will
>> be used in custom XCom to serialize / deserialize an object. In this way
>> users can reuse the "upload/download" part from Airflow codebase, but can
>> customize serialization methods for different data types (for example
>> pandas > avro, pandas > csv).
>>
>> What do others think?
>>
>> Tomek
>>
>>
>> On Wed, Dec 2, 2020 at 4:23 PM Ry Walker <[email protected]> wrote:
>>
>>> Ha “xcom” was autocorrected to “scone” on my phone, didn’t notice :)
>>>
>>> On Wed, Dec 2, 2020 at 10:22 AM Ry Walker <[email protected]> wrote:
>>>
>>>> I’m in favor of including a few backends in core, including some that
>>>> can handle larger data, for the sake of Airflow usability and its
>>>> competitive positioning.
>>>>
>>>> This will allow us to put scone forward as a strong feature rather than
>>>> how it has been historically portrayed as flawed/limited.
>>>>
>>>>
>>>> On Wed, Dec 2, 2020 at 9:49 AM Tomasz Urbaszek <[email protected]>
>>>> wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>> Airflow 2.0 release is sooner and sooner. I would like to start a
>>>>> discussion about custom XCom backends.
>>>>>
>>>>> First of all, if you don't know it - since 1.10.12 users can use a
>>>>> custom XCom class that will override serialize and deserialize
>>>>> methods. Docs:
>>>>> https://airflow.apache.org/docs/stable/concepts.html#custom-xcom-backend
>>>>>
>>>>> This feature allows users the following things:
>>>>> - reduce boilerplate code responsible for downloading / uploading data
>>>>> in operators (it's handled by custom XCom)
>>>>> - use different storage for XCom data (other database, buckets, cache
>>>>> etc.)
>>>>> - verifying XCom data on read/write operations
>>>>> - and anything else that may be feasible
>>>>>
>>>>> Some examples:
>>>>> https://github.com/apache/airflow/pull/12733
>>>>>
>>>>> https://www.polidea.com/blog/airflow-2-0-dag-authoring-redesigned/#custom-xcom-backends-8560
>>>>>
>>>>> The point I want to raise (as I did in this PR
>>>>> https://github.com/apache/airflow/pull/12733) is to discuss if we as a
>>>>> community want to have custom XComs in our codebase (core or
>>>>> providers). I'm happy to hear what the community thinks about it?
>>>>>
>>>>> From my side, I'm leaning toward creating better documentation around
>>>>> this feature (with examples and suggestions) instead of accepting
>>>>> XComs to code base. My main concern is that custom XComs are easy to
>>>>> write (using for example hooks) and will work best when they are built
>>>>> to suit exact users' needs. On the other hand, I see some potential in
>>>>> "low level" XComs that just implement logic of storing and retrieving
>>>>> data from particular storage. But anything that gets too use-case /
>>>>> data type specific should not be accepted.
>>>>>
>>>>> Cheers,
>>>>> Tomek
>>>>>
>>>> --
>>>> Sent from Gmail Mobile
>>>>
>>> --
>>> Sent from Gmail Mobile
>>>
>>

Reply via email to