Shouldn't serialization be left to each custom backend? On Wed, Dec 2, 2020, 8:11 AM Tomasz Urbaszek <[email protected]> wrote:
> Thanks Ry! > > > This will allow us to put scone forward as a strong feature rather than > how it has been historically portrayed as flawed/limited. > > This is a good point and I agree that custom backends may increase Airflow > competitiveness. > > However, if we decide to include them in core we need to answer the old > question - what and how do we serialize objects to persist them? > > I'm quite sure we don't want to use pickle, especially when the data is > retrieved from external systems. Using json by default would kill the power > of custom XComs. The only option that comes to my mind is an additional > function (in airflow_local_settings) that can be defined by users and will > be used in custom XCom to serialize / deserialize an object. In this way > users can reuse the "upload/download" part from Airflow codebase, but can > customize serialization methods for different data types (for example > pandas > avro, pandas > csv). > > What do others think? > > Tomek > > > On Wed, Dec 2, 2020 at 4:23 PM Ry Walker <[email protected]> wrote: > >> Ha “xcom” was autocorrected to “scone” on my phone, didn’t notice :) >> >> On Wed, Dec 2, 2020 at 10:22 AM Ry Walker <[email protected]> wrote: >> >>> I’m in favor of including a few backends in core, including some that >>> can handle larger data, for the sake of Airflow usability and its >>> competitive positioning. >>> >>> This will allow us to put scone forward as a strong feature rather than >>> how it has been historically portrayed as flawed/limited. >>> >>> >>> On Wed, Dec 2, 2020 at 9:49 AM Tomasz Urbaszek <[email protected]> >>> wrote: >>> >>>> Hello all, >>>> >>>> Airflow 2.0 release is sooner and sooner. I would like to start a >>>> discussion about custom XCom backends. >>>> >>>> First of all, if you don't know it - since 1.10.12 users can use a >>>> custom XCom class that will override serialize and deserialize >>>> methods. Docs: >>>> https://airflow.apache.org/docs/stable/concepts.html#custom-xcom-backend >>>> >>>> This feature allows users the following things: >>>> - reduce boilerplate code responsible for downloading / uploading data >>>> in operators (it's handled by custom XCom) >>>> - use different storage for XCom data (other database, buckets, cache >>>> etc.) >>>> - verifying XCom data on read/write operations >>>> - and anything else that may be feasible >>>> >>>> Some examples: >>>> https://github.com/apache/airflow/pull/12733 >>>> >>>> https://www.polidea.com/blog/airflow-2-0-dag-authoring-redesigned/#custom-xcom-backends-8560 >>>> >>>> The point I want to raise (as I did in this PR >>>> https://github.com/apache/airflow/pull/12733) is to discuss if we as a >>>> community want to have custom XComs in our codebase (core or >>>> providers). I'm happy to hear what the community thinks about it? >>>> >>>> From my side, I'm leaning toward creating better documentation around >>>> this feature (with examples and suggestions) instead of accepting >>>> XComs to code base. My main concern is that custom XComs are easy to >>>> write (using for example hooks) and will work best when they are built >>>> to suit exact users' needs. On the other hand, I see some potential in >>>> "low level" XComs that just implement logic of storing and retrieving >>>> data from particular storage. But anything that gets too use-case / >>>> data type specific should not be accepted. >>>> >>>> Cheers, >>>> Tomek >>>> >>> -- >>> Sent from Gmail Mobile >>> >> -- >> Sent from Gmail Mobile >> >
