Thank You for the thoughts. I will create a VOTE thread at some point today / early tomorrow.
Regards, Kaxil On Mon, Sep 21, 2020 at 6:11 PM Jarek Potiuk <[email protected]> wrote: > Happy to kill pickling for 2.0. While starting to review the HA, I see > that more and more we rely on Serialization and there are some rather > weird-looking left-overs in Ash's change that are only there because of > pickling. > > I think we already know that Serialization becomes a first-class citizen > in Airflow 2.0. And while we know the first versions of serialization had > some teething problems - most of which have been already addressed (the > most interesting one was few orders of magnitude increase in outbound > traffic from the Airflow to the DB - but it's already fixed I believe). > If we think that what pickling was used for can be handled entirely by > serialization, I am all for killing pickling and rather than that focus > 100% on serialization improvements, testing, and making it rock solid. > > J. > > On Fri, Sep 18, 2020 at 11:59 PM Daniel Imberman < > [email protected]> wrote: > >> Are there any use-cases that REQUIRE pickle? Do we have any sense of what >> % of the Airflow community depends on Pickle? I’m all for killing it if >> possible but I want to make sure we’re not setting up a major hurdle for >> migration. >> >> via Newton Mail [ >> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.6&source=email_footer_2 >> ] >> On Fri, Sep 18, 2020 at 2:50 PM, Maxime Beauchemin < >> [email protected]> wrote: >> I'm getting bad flashbacks of fighting with pickles early on in the >> history >> of the project. I've learned since then to stay away. Almost all solutions >> that involve pickles are bad solutions. Beyond but related to the security >> implication are the issues of pickle entanglement, not really knowing >> what's in the pickle and how big it might get, and how it may affect the >> environment it's deserialized into. >> >> 2.0 is a great time to kill pickles with fire. >> >> On Fri, Sep 18, 2020 at 5:01 AM Kaxil Naik <[email protected]> wrote: >> >> > Hi all, >> > >> > We briefly discussed how pickling is currently used in Airflow codebase >> and >> > whether or not we should remove it for 2.0 in the Airflow 2.0 Dev call >> this >> > Monday. >> > >> > Currently, AFAIK only *CeleryExecutor* supports pickling (code >> > < >> > >> https://github.com/apache/airflow/blob/master/airflow/executors/executor_loader.py#L122-L126 >> > >). >> > We also have a flag on *airflow scheduler >> > <https://airflow.readthedocs.io/en/latest/cli-ref.html#scheduler> *CLI >> > command (*--do-pickle*) and "*--ship-dag*" on *airflow tasks run >> > <https://airflow.readthedocs.io/en/latest/cli-ref.html#run>* command. >> > >> > If we want to remove pickling, I think Airflow 2.0 is the right time. >> > >> > We have also deprecated the use of pickling in XComs. >> > >> > https://docs.python.org/3/library/pickle.html -- lists some items on >> the >> > security implications of pickle and comparisons with JSON. >> > >> > Another alternative is using *cloudpickle >> > <https://github.com/cloudpipe/cloudpickle> *(used by PySpark) instead >> > of *pickle, >> > *it suffers from the same security issues like *pickle *but does have >> some >> > more features compared to pickle. >> > >> > What do you all think? >> > >> > Regards, >> > Kaxil >> > > > > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/> > >
