[ 
https://issues.apache.org/jira/browse/BEAM-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16788810#comment-16788810
 ] 

Ryan Williams commented on BEAM-6765:
-------------------------------------

I'm seeing this while trying to run [this TF estimator 
example|https://cloud.google.com/solutions/machine-learning/data-preprocessing-for-ml-with-tf-transform-pt2#read_raw_training_data]
 
([notebook|https://github.com/GoogleCloudPlatform/tf-estimator-tutorials/blob/7af539a0f4d6113986dde65abe96c9e1c7701ae0/00_Miscellaneous/tf_transform/tft-01%20-%20Babyweight%20preprocessing%20with%20tf.Transform.ipynb])
 with any recent versions of Tensorflow Transform (0.12.0, 0.13.0, which depend 
on Beam 0.10.0 / 0.11.0, resp., both of which depend on pyarrow 0.11.1).

Running a Beam+Dataflow job that uses TFT requires staging [source 
artifacts|https://github.com/apache/beam/blob/v2.11.0/sdks/python/apache_beam/runners/portability/stager.py#L423]
 for TFT and therefore pyarrow 0.11.1, but the latter don't exist. Only pyarrow 
0.11.0 and 0.12.1 have published sources.

Possible solutions:
 * pyarrow publish sources for 0.11.1
 * Beam depend on a wider range of pyarrows (0.12.1? Too late for Beam 0.10.0 / 
0.11.0)

I'm curious why you closed this [~barrywhart]; it seems like an ongoing problem 
to me.

 

> Beam 2.10.0 for Python requires pyarrow 0.11.1, which is not installable in 
> Google Cloud DataFlow
> -------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-6765
>                 URL: https://issues.apache.org/jira/browse/BEAM-6765
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>    Affects Versions: 2.10.0
>            Reporter: Barry Hart
>            Priority: Major
>             Fix For: 2.10.0
>
>
> When trying to run a Beam 2.10.0 job in Google Cloud DataFlow, I get the 
> following error:
> {noformat}
> Collecting pyarrow==0.11.1 (from -r requirements.txt (line 51))
> Could not find a version that satisfies the requirement pyarrow==0.11.1 (from 
> -r requirements.txt (line 51)) (from versions: 0.9.0, 0.10.0, 0.11.0, 0.12.1)
> No matching distribution found for pyarrow==0.11.1 (from -r requirements.txt 
> (line 51))
> {noformat}
> This version, while it exists, cannot be installed in Google Cloud DataFlow, 
> because it is only available on PyPI as a wheel, and DataFlow does not allow 
> installing binary packages, only source packages.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to