Re: [Python SDK] PyArrow Critical Vulnerability

2023-11-12 Thread Wiśniowski Piotr

Hi Valentyn,

Thank You for information and details. All make sense! I think we can 
wait for 2.53.0 release and meantime apply hotfix.


Best

Wiśniowski Piotr

On 10.11.2023 20:27, Valentyn Tymofieiev via user wrote:

From https://pypi.org/project/pyarrow-hotfix/ :

pyarrow_hotfix must be imported in your application or library code 
for it to take effect.

Just installing the package is not sufficient:

For Beam users, that means that the pipeline code running on the 
workers would need to import this module on every worker, for example 
by adding this line to DoFn.setup or in main session (if pipeline is 
composed only from one file AND uses dill pickler with 
--save_main_session flag).


We will continue addressing this in 
https://github.com/apache/beam/issues/29392.


On Fri, Nov 10, 2023 at 10:23 AM Valentyn Tymofieiev 
 wrote:


Hi Piotr, thanks for bringing this to the list.

There is a FR to support pyarrow
https://github.com/apache/beam/issues/28410 . I looked into it
briefly in https://github.com/apache/beam/pull/28437 but saw some
test failures and it has been on back burner. Given the news about
vulnerability it would make sense to prioritize this.

I think we could decouple this from 2.52.0 release since:
  1) there is a workaround
  2) new versions of pyarrow haven't been fully tested with Beam
  3) Beam 2.52.0 fixes some other issues that are known to
affecting users, e.g. https://github.com/apache/beam/issues/28246

From

https://securityonline.info/cve-2023-47248-pyarrow-arbitrary-code-execution-vulnerability-a-critical-threat-to-data-analysts/
:
  > If you cannot upgrade to PyArrow 14.0.1, you can use the
pyarrow-hotfix package to disable the vulnerability on older
versions of PyArrow. However, this is not a permanent solution,
and you should upgrade to PyArrow 14.0.1 as soon as possible. We
could consider adding pyarrow-hotfix to the containers for 2.52.0
release. CC: @Danny McCormick
 (release manager).

Beam users can also install this additional dependency via one of
the ways described in
https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/
.



On Fri, Nov 10, 2023 at 4:42 AM Wiśniowski Piotr
 wrote:

Hi,

Few days ago this one was detected:

https://securityonline.info/cve-2023-47248-pyarrow-arbitrary-code-execution-vulnerability-a-critical-threat-to-data-analysts/

I do see that beam 2.51.0 does have `pyarrow<=12.0.0` in
requirements.

1. Is there a reason for not allowing newer versions of pyarrow?

2. Is there any planned effort on updating this to `14.0.1`?
Is it
possible to push the update to `2.52.0` beam release? I know
the beam
release is almost there.

Best

Wiśniowski Piotr


Re: [Python SDK] PyArrow Critical Vulnerability

2023-11-10 Thread Valentyn Tymofieiev via user
>From  https://pypi.org/project/pyarrow-hotfix/ :

pyarrow_hotfix must be imported in your application or library code for it
to take effect.
Just installing the package is not sufficient:

For Beam users, that means that the pipeline code running on the workers
would need to import this module on every worker, for example by adding
this line to DoFn.setup or in main session (if pipeline is composed only
from one file AND uses dill pickler with --save_main_session flag).

We will continue addressing this in
https://github.com/apache/beam/issues/29392.

On Fri, Nov 10, 2023 at 10:23 AM Valentyn Tymofieiev 
wrote:

> Hi Piotr, thanks for bringing this to the list.
>
> There is a FR to support pyarrow
> https://github.com/apache/beam/issues/28410 . I looked into it briefly in
> https://github.com/apache/beam/pull/28437 but saw some test failures and
> it has been on back burner. Given the news about vulnerability it would
> make sense to prioritize this.
>
> I think we could decouple this from 2.52.0 release since:
>   1) there is a workaround
>   2) new versions of pyarrow haven't been fully tested with Beam
>   3) Beam 2.52.0 fixes some other issues that are known to affecting
> users, e.g. https://github.com/apache/beam/issues/28246
>
> From
> https://securityonline.info/cve-2023-47248-pyarrow-arbitrary-code-execution-vulnerability-a-critical-threat-to-data-analysts/
> :
>   > If you cannot upgrade to PyArrow 14.0.1, you can use the
> pyarrow-hotfix package to disable the vulnerability on older versions of
> PyArrow. However, this is not a permanent solution, and you should upgrade
> to PyArrow 14.0.1 as soon as possible. We could consider adding
> pyarrow-hotfix to the containers for 2.52.0 release. CC: @Danny McCormick
>  (release manager).
>
> Beam users can also install this additional dependency via one of the ways
> described in
> https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/ .
>
>
>
> On Fri, Nov 10, 2023 at 4:42 AM Wiśniowski Piotr <
> contact.wisniowskipi...@gmail.com> wrote:
>
>> Hi,
>>
>> Few days ago this one was detected:
>>
>> https://securityonline.info/cve-2023-47248-pyarrow-arbitrary-code-execution-vulnerability-a-critical-threat-to-data-analysts/
>>
>> I do see that beam 2.51.0 does have `pyarrow<=12.0.0` in requirements.
>>
>> 1. Is there a reason for not allowing newer versions of pyarrow?
>>
>> 2. Is there any planned effort on updating this to `14.0.1`? Is it
>> possible to push the update to `2.52.0` beam release? I know the beam
>> release is almost there.
>>
>> Best
>>
>> Wiśniowski Piotr
>>
>>


Re: [Python SDK] PyArrow Critical Vulnerability

2023-11-10 Thread Valentyn Tymofieiev via user
Hi Piotr, thanks for bringing this to the list.

There is a FR to support pyarrow https://github.com/apache/beam/issues/28410
. I looked into it briefly in https://github.com/apache/beam/pull/28437 but
saw some test failures and it has been on back burner. Given the news about
vulnerability it would make sense to prioritize this.

I think we could decouple this from 2.52.0 release since:
  1) there is a workaround
  2) new versions of pyarrow haven't been fully tested with Beam
  3) Beam 2.52.0 fixes some other issues that are known to affecting users,
e.g. https://github.com/apache/beam/issues/28246

From
https://securityonline.info/cve-2023-47248-pyarrow-arbitrary-code-execution-vulnerability-a-critical-threat-to-data-analysts/
:
  > If you cannot upgrade to PyArrow 14.0.1, you can use the pyarrow-hotfix
package to disable the vulnerability on older versions of PyArrow. However,
this is not a permanent solution, and you should upgrade to PyArrow 14.0.1
as soon as possible. We could consider adding pyarrow-hotfix to the
containers for 2.52.0 release. CC: @Danny McCormick
 (release manager).

Beam users can also install this additional dependency via one of the ways
described in
https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/ .



On Fri, Nov 10, 2023 at 4:42 AM Wiśniowski Piotr <
contact.wisniowskipi...@gmail.com> wrote:

> Hi,
>
> Few days ago this one was detected:
>
> https://securityonline.info/cve-2023-47248-pyarrow-arbitrary-code-execution-vulnerability-a-critical-threat-to-data-analysts/
>
> I do see that beam 2.51.0 does have `pyarrow<=12.0.0` in requirements.
>
> 1. Is there a reason for not allowing newer versions of pyarrow?
>
> 2. Is there any planned effort on updating this to `14.0.1`? Is it
> possible to push the update to `2.52.0` beam release? I know the beam
> release is almost there.
>
> Best
>
> Wiśniowski Piotr
>
>


Fwd: [Python SDK] PyArrow Critical Vulnerability

2023-11-10 Thread Wiśniowski Piotr

Hi,

Few days ago this one was detected: 
https://securityonline.info/cve-2023-47248-pyarrow-arbitrary-code-execution-vulnerability-a-critical-threat-to-data-analysts/


I do see that beam 2.51.0 does have `pyarrow<=12.0.0` in requirements.

1. Is there a reason for not allowing newer versions of pyarrow?

2. Is there any planned effort on updating this to `14.0.1`? Is it 
possible to push the update to `2.52.0` beam release? I know the beam 
release is almost there.


Best

Wiśniowski Piotr