[ https://issues.apache.org/jira/browse/BEAM-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16788810#comment-16788810 ]
Ryan Williams commented on BEAM-6765: ------------------------------------- I'm seeing this while trying to run [this TF estimator example|https://cloud.google.com/solutions/machine-learning/data-preprocessing-for-ml-with-tf-transform-pt2#read_raw_training_data] ([notebook|https://github.com/GoogleCloudPlatform/tf-estimator-tutorials/blob/7af539a0f4d6113986dde65abe96c9e1c7701ae0/00_Miscellaneous/tf_transform/tft-01%20-%20Babyweight%20preprocessing%20with%20tf.Transform.ipynb]) with any recent versions of Tensorflow Transform (0.12.0, 0.13.0, which depend on Beam 0.10.0 / 0.11.0, resp., both of which depend on pyarrow 0.11.1). Running a Beam+Dataflow job that uses TFT requires staging [source artifacts|https://github.com/apache/beam/blob/v2.11.0/sdks/python/apache_beam/runners/portability/stager.py#L423] for TFT and therefore pyarrow 0.11.1, but the latter don't exist. Only pyarrow 0.11.0 and 0.12.1 have published sources. Possible solutions: * pyarrow publish sources for 0.11.1 * Beam depend on a wider range of pyarrows (0.12.1? Too late for Beam 0.10.0 / 0.11.0) I'm curious why you closed this [~barrywhart]; it seems like an ongoing problem to me. > Beam 2.10.0 for Python requires pyarrow 0.11.1, which is not installable in > Google Cloud DataFlow > ------------------------------------------------------------------------------------------------- > > Key: BEAM-6765 > URL: https://issues.apache.org/jira/browse/BEAM-6765 > Project: Beam > Issue Type: Bug > Components: sdk-py-core > Affects Versions: 2.10.0 > Reporter: Barry Hart > Priority: Major > Fix For: 2.10.0 > > > When trying to run a Beam 2.10.0 job in Google Cloud DataFlow, I get the > following error: > {noformat} > Collecting pyarrow==0.11.1 (from -r requirements.txt (line 51)) > Could not find a version that satisfies the requirement pyarrow==0.11.1 (from > -r requirements.txt (line 51)) (from versions: 0.9.0, 0.10.0, 0.11.0, 0.12.1) > No matching distribution found for pyarrow==0.11.1 (from -r requirements.txt > (line 51)) > {noformat} > This version, while it exists, cannot be installed in Google Cloud DataFlow, > because it is only available on PyPI as a wheel, and DataFlow does not allow > installing binary packages, only source packages. -- This message was sent by Atlassian JIRA (v7.6.3#76005)