> Shouldn't serialization be left to each custom backend? In my opinion - yes. That's why I'm not 100% convinced to have custom xcoms in core/providers. But if we decide to have them, then we have to decide on a serialization mechanism I think.
On Wed, Dec 2, 2020 at 5:16 PM Daniel Standish <[email protected]> wrote: > Shouldn't serialization be left to each custom backend? > > On Wed, Dec 2, 2020, 8:11 AM Tomasz Urbaszek <[email protected]> wrote: > >> Thanks Ry! >> >> > This will allow us to put scone forward as a strong feature rather than >> how it has been historically portrayed as flawed/limited. >> >> This is a good point and I agree that custom backends may increase >> Airflow competitiveness. >> >> However, if we decide to include them in core we need to answer the old >> question - what and how do we serialize objects to persist them? >> >> I'm quite sure we don't want to use pickle, especially when the data is >> retrieved from external systems. Using json by default would kill the power >> of custom XComs. The only option that comes to my mind is an additional >> function (in airflow_local_settings) that can be defined by users and will >> be used in custom XCom to serialize / deserialize an object. In this way >> users can reuse the "upload/download" part from Airflow codebase, but can >> customize serialization methods for different data types (for example >> pandas > avro, pandas > csv). >> >> What do others think? >> >> Tomek >> >> >> On Wed, Dec 2, 2020 at 4:23 PM Ry Walker <[email protected]> wrote: >> >>> Ha “xcom” was autocorrected to “scone” on my phone, didn’t notice :) >>> >>> On Wed, Dec 2, 2020 at 10:22 AM Ry Walker <[email protected]> wrote: >>> >>>> I’m in favor of including a few backends in core, including some that >>>> can handle larger data, for the sake of Airflow usability and its >>>> competitive positioning. >>>> >>>> This will allow us to put scone forward as a strong feature rather than >>>> how it has been historically portrayed as flawed/limited. >>>> >>>> >>>> On Wed, Dec 2, 2020 at 9:49 AM Tomasz Urbaszek <[email protected]> >>>> wrote: >>>> >>>>> Hello all, >>>>> >>>>> Airflow 2.0 release is sooner and sooner. I would like to start a >>>>> discussion about custom XCom backends. >>>>> >>>>> First of all, if you don't know it - since 1.10.12 users can use a >>>>> custom XCom class that will override serialize and deserialize >>>>> methods. Docs: >>>>> https://airflow.apache.org/docs/stable/concepts.html#custom-xcom-backend >>>>> >>>>> This feature allows users the following things: >>>>> - reduce boilerplate code responsible for downloading / uploading data >>>>> in operators (it's handled by custom XCom) >>>>> - use different storage for XCom data (other database, buckets, cache >>>>> etc.) >>>>> - verifying XCom data on read/write operations >>>>> - and anything else that may be feasible >>>>> >>>>> Some examples: >>>>> https://github.com/apache/airflow/pull/12733 >>>>> >>>>> https://www.polidea.com/blog/airflow-2-0-dag-authoring-redesigned/#custom-xcom-backends-8560 >>>>> >>>>> The point I want to raise (as I did in this PR >>>>> https://github.com/apache/airflow/pull/12733) is to discuss if we as a >>>>> community want to have custom XComs in our codebase (core or >>>>> providers). I'm happy to hear what the community thinks about it? >>>>> >>>>> From my side, I'm leaning toward creating better documentation around >>>>> this feature (with examples and suggestions) instead of accepting >>>>> XComs to code base. My main concern is that custom XComs are easy to >>>>> write (using for example hooks) and will work best when they are built >>>>> to suit exact users' needs. On the other hand, I see some potential in >>>>> "low level" XComs that just implement logic of storing and retrieving >>>>> data from particular storage. But anything that gets too use-case / >>>>> data type specific should not be accepted. >>>>> >>>>> Cheers, >>>>> Tomek >>>>> >>>> -- >>>> Sent from Gmail Mobile >>>> >>> -- >>> Sent from Gmail Mobile >>> >>
