Hi folks!

Reading the conversation, I agree w Tomek.
At the same time I see value in adding some options out of the box for
serialization and storage.

I see there's a pattern here where we can decouple storage service (Redis,
S3, GCS, Airflow DB...) and serialization format (pandas to csv, pickling,
json...). If we decouple them, then we can provide some options in core for
each, and provide options to configure them.

something like:

[xcom]
storage_layer = [airflow.xcom.GCS]
serialization = [airflow.xcom.panda2csv, airflow.xcom.jsondump]

Where XCom layer would use GCS to store
(gs://bucket-name/{dag_id}/{dagrun_id}/{ti_id}/{key}) and then to serialize
it would try to use pandas2csv first (if class is pandas) and then json
dump otherwise. This could be extended to serialize as preferred (even
using GCS/S3 and loading again) and would allow for adding some options to
core while providing extensibility.

What do you all think?


Gerard Casas Saez
Twitter | Cortex | @casassaez <http://twitter.com/casassaez>


On Wed, Dec 2, 2020 at 10:25 AM Tomasz Urbaszek <[email protected]>
wrote:

> > Then you could have XComBackendSerializationBackend
>
> That's definitely something we should avoid... :D
>
> On Wed, Dec 2, 2020 at 6:18 PM Daniel Standish <[email protected]>
> wrote:
> >
> > You could add xcom serialization utils in airflow.utils
> >
> > Then you could have XComBackendSerializationBackend ;)
>

Reply via email to