I'm getting bad flashbacks of fighting with pickles early on in the history
of the project. I've learned since then to stay away. Almost all solutions
that involve pickles are bad solutions. Beyond but related to the security
implication are the issues of pickle entanglement, not really knowing
what's in the pickle and how big it might get, and how it may affect the
environment it's deserialized into.

2.0 is a great time to kill pickles with fire.

On Fri, Sep 18, 2020 at 5:01 AM Kaxil Naik <kaxiln...@apache.org> wrote:

> Hi all,
>
> We briefly discussed how pickling is currently used in Airflow codebase and
> whether or not we should remove it for 2.0 in the Airflow 2.0 Dev call this
> Monday.
>
> Currently, AFAIK only *CeleryExecutor* supports pickling (code
> <
> https://github.com/apache/airflow/blob/master/airflow/executors/executor_loader.py#L122-L126
> >).
> We also have a flag on *airflow scheduler
> <https://airflow.readthedocs.io/en/latest/cli-ref.html#scheduler> *CLI
> command (*--do-pickle*) and "*--ship-dag*" on *airflow tasks run
> <https://airflow.readthedocs.io/en/latest/cli-ref.html#run>* command.
>
> If we want to remove pickling, I think Airflow 2.0 is the right time.
>
> We have also deprecated the use of pickling in XComs.
>
> https://docs.python.org/3/library/pickle.html -- lists some items on the
> security implications of pickle and comparisons with JSON.
>
> Another alternative is using *cloudpickle
> <https://github.com/cloudpipe/cloudpickle> *(used by PySpark) instead
> of *pickle,
> *it suffers from the same security issues like *pickle *but does have some
> more features compared to pickle.
>
> What do you all think?
>
> Regards,
> Kaxil
>

Reply via email to