Thank You for the thoughts. I will create a VOTE thread at some point today
/ early tomorrow.

Regards,
Kaxil

On Mon, Sep 21, 2020 at 6:11 PM Jarek Potiuk <[email protected]>
wrote:

> Happy to kill pickling for 2.0. While starting to review the HA, I see
> that more and more we rely on Serialization and there are some rather
> weird-looking left-overs in Ash's change that are only there because of
> pickling.
>
> I think we already know that Serialization becomes a first-class citizen
> in Airflow 2.0. And while we know the first versions of serialization had
> some teething problems - most of which have been already addressed (the
> most interesting one was few orders of magnitude increase in outbound
> traffic from the Airflow to the DB - but it's already fixed I believe).
> If we think that what pickling was used for can be handled entirely by
> serialization, I am all for killing pickling and rather than that focus
> 100% on serialization improvements, testing, and making it rock solid.
>
> J.
>
> On Fri, Sep 18, 2020 at 11:59 PM Daniel Imberman <
> [email protected]> wrote:
>
>> Are there any use-cases that REQUIRE pickle? Do we have any sense of what
>> % of the Airflow community depends on Pickle? I’m all for killing it if
>> possible but I want to make sure we’re not setting up a major hurdle for
>> migration.
>>
>> via Newton Mail [
>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.6&source=email_footer_2
>> ]
>> On Fri, Sep 18, 2020 at 2:50 PM, Maxime Beauchemin <
>> [email protected]> wrote:
>> I'm getting bad flashbacks of fighting with pickles early on in the
>> history
>> of the project. I've learned since then to stay away. Almost all solutions
>> that involve pickles are bad solutions. Beyond but related to the security
>> implication are the issues of pickle entanglement, not really knowing
>> what's in the pickle and how big it might get, and how it may affect the
>> environment it's deserialized into.
>>
>> 2.0 is a great time to kill pickles with fire.
>>
>> On Fri, Sep 18, 2020 at 5:01 AM Kaxil Naik <[email protected]> wrote:
>>
>> > Hi all,
>> >
>> > We briefly discussed how pickling is currently used in Airflow codebase
>> and
>> > whether or not we should remove it for 2.0 in the Airflow 2.0 Dev call
>> this
>> > Monday.
>> >
>> > Currently, AFAIK only *CeleryExecutor* supports pickling (code
>> > <
>> >
>> https://github.com/apache/airflow/blob/master/airflow/executors/executor_loader.py#L122-L126
>> > >).
>> > We also have a flag on *airflow scheduler
>> > <https://airflow.readthedocs.io/en/latest/cli-ref.html#scheduler> *CLI
>> > command (*--do-pickle*) and "*--ship-dag*" on *airflow tasks run
>> > <https://airflow.readthedocs.io/en/latest/cli-ref.html#run>* command.
>> >
>> > If we want to remove pickling, I think Airflow 2.0 is the right time.
>> >
>> > We have also deprecated the use of pickling in XComs.
>> >
>> > https://docs.python.org/3/library/pickle.html -- lists some items on
>> the
>> > security implications of pickle and comparisons with JSON.
>> >
>> > Another alternative is using *cloudpickle
>> > <https://github.com/cloudpipe/cloudpickle> *(used by PySpark) instead
>> > of *pickle,
>> > *it suffers from the same security issues like *pickle *but does have
>> some
>> > more features compared to pickle.
>> >
>> > What do you all think?
>> >
>> > Regards,
>> > Kaxil
>> >
>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

Reply via email to