Thanks Ry! > This will allow us to put scone forward as a strong feature rather than how it has been historically portrayed as flawed/limited.
This is a good point and I agree that custom backends may increase Airflow competitiveness. However, if we decide to include them in core we need to answer the old question - what and how do we serialize objects to persist them? I'm quite sure we don't want to use pickle, especially when the data is retrieved from external systems. Using json by default would kill the power of custom XComs. The only option that comes to my mind is an additional function (in airflow_local_settings) that can be defined by users and will be used in custom XCom to serialize / deserialize an object. In this way users can reuse the "upload/download" part from Airflow codebase, but can customize serialization methods for different data types (for example pandas > avro, pandas > csv). What do others think? Tomek On Wed, Dec 2, 2020 at 4:23 PM Ry Walker <[email protected]> wrote: > Ha “xcom” was autocorrected to “scone” on my phone, didn’t notice :) > > On Wed, Dec 2, 2020 at 10:22 AM Ry Walker <[email protected]> wrote: > >> I’m in favor of including a few backends in core, including some that can >> handle larger data, for the sake of Airflow usability and its competitive >> positioning. >> >> This will allow us to put scone forward as a strong feature rather than >> how it has been historically portrayed as flawed/limited. >> >> >> On Wed, Dec 2, 2020 at 9:49 AM Tomasz Urbaszek <[email protected]> >> wrote: >> >>> Hello all, >>> >>> Airflow 2.0 release is sooner and sooner. I would like to start a >>> discussion about custom XCom backends. >>> >>> First of all, if you don't know it - since 1.10.12 users can use a >>> custom XCom class that will override serialize and deserialize >>> methods. Docs: >>> https://airflow.apache.org/docs/stable/concepts.html#custom-xcom-backend >>> >>> This feature allows users the following things: >>> - reduce boilerplate code responsible for downloading / uploading data >>> in operators (it's handled by custom XCom) >>> - use different storage for XCom data (other database, buckets, cache >>> etc.) >>> - verifying XCom data on read/write operations >>> - and anything else that may be feasible >>> >>> Some examples: >>> https://github.com/apache/airflow/pull/12733 >>> >>> https://www.polidea.com/blog/airflow-2-0-dag-authoring-redesigned/#custom-xcom-backends-8560 >>> >>> The point I want to raise (as I did in this PR >>> https://github.com/apache/airflow/pull/12733) is to discuss if we as a >>> community want to have custom XComs in our codebase (core or >>> providers). I'm happy to hear what the community thinks about it? >>> >>> From my side, I'm leaning toward creating better documentation around >>> this feature (with examples and suggestions) instead of accepting >>> XComs to code base. My main concern is that custom XComs are easy to >>> write (using for example hooks) and will work best when they are built >>> to suit exact users' needs. On the other hand, I see some potential in >>> "low level" XComs that just implement logic of storing and retrieving >>> data from particular storage. But anything that gets too use-case / >>> data type specific should not be accepted. >>> >>> Cheers, >>> Tomek >>> >> -- >> Sent from Gmail Mobile >> > -- > Sent from Gmail Mobile >
