[ https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614047#comment-16614047 ]
Scott Jungwirth commented on BEAM-3106: --------------------------------------- I just ran into this issue using Google's Cloud Composer (managed airflow) after adding the 2.6.0 (current latest) beam sdk pypy package (apache-beam[gcp]>=2.6.0). Looking at the build log, it looks like apache-beam[gcp] caused a downgrade of some other google-cloud packages: ... Installing collected packages: pydot, fastavro, pytz, google-cloud-core, google-cloud-bigquery, apache-beam, pysftp, google-cloud-firestore, msgpack, cachecontrol, firebase-admin, webob, bugsnag Found existing installation: pytz 2018.5 Uninstalling pytz-2018.5: Successfully uninstalled pytz-2018.5 Found existing installation: google-cloud-core 0.28.1 Uninstalling google-cloud-core-0.28.1: Successfully uninstalled google-cloud-core-0.28.1 Found existing installation: google-cloud-bigquery 1.5.0 Uninstalling google-cloud-bigquery-1.5.0: Successfully uninstalled google-cloud-bigquery-1.5.0 Found existing installation: apache-beam 2.5.0 Uninstalling apache-beam-2.5.0: Successfully uninstalled apache-beam-2.5.0 Successfully installed apache-beam-2.6.0 bugsnag-3.4.3 cachecontrol-0.12.5 fastavro-0.19.7 firebase-admin-2.13.0 google-cloud-bigquery-0.25.0 google-cloud-core-0.25.0 google-cloud-firestore-0.29.0 msgpack-0.5.6 pydot-1.2.4 pysftp-0.2.9 pytz-2018.4 webob-1.8.2 I tracked this down to the pinned requirement for bigquery: {{google-cloud-bigquery==0.25.0}} [https://github.com/apache/beam/blob/v2.6.0/sdks/python/setup.py#L140] Which led to these pip warnings $ pipdeptree --warn Warning!!! Possibly conflicting dependencies found: * google-cloud-storage==1.10.0 - google-cloud-core [required: <0.29dev,>=0.28.0, installed: 0.25.0] * google-cloud-firestore==0.29.0 - google-cloud-core [required: <0.29dev,>=0.28.0, installed: 0.25.0] * pandas-gbq==0.6.0 - google-cloud-bigquery [required: >=0.32.0, installed: 0.25.0] * google-cloud-dataflow==2.5.0 - apache-beam [required: ==2.5.0, installed: 2.6.0] * google-cloud-logging==1.6.0 - google-cloud-core [required: <0.29dev,>=0.28.0, installed: 0.25.0] And the exception I was getting was from another google cloud storage module File "/usr/local/lib/python2.7/site-packages/google/cloud/storage/blob.py", line 535, in download_to_file ... File "/usr/local/lib/python2.7/site-packages/google/resumable_media/_helpers.py", line 146, in wait_and_retry response = func() File "/usr/local/lib/python2.7/site-packages/google_auth_httplib2.py", line 198, in request uri, method, body=body, headers=request_headers, **kwargs) TypeError: request() got an unexpected keyword argument 'data' I was able to work-around this issue by explicitly installing the desired versions of the google-cloud-core>=0.28.0 and google-cloud-bigquery>=1.5.0 modules after the apache-beam[gcp]>=2.6.0 module. > Consider not pinning all python dependencies, or moving them to > requirements.txt > -------------------------------------------------------------------------------- > > Key: BEAM-3106 > URL: https://issues.apache.org/jira/browse/BEAM-3106 > Project: Beam > Issue Type: Wish > Components: build-system > Affects Versions: 2.1.0 > Environment: python > Reporter: Maximilian Roos > Priority: Major > > Currently all python dependencies are [pinned or > capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97] > While there's a good argument for supplying a `requirements.txt` with well > tested dependencies, having them specified in `setup.py` forces them to an > exact state on each install of Beam. This makes using Beam in any environment > with other libraries nigh on impossible. > This is particularly severe for the `gcp` dependencies, where we have > libraries that won't work with an older version (but Beam _does_ work with an > newer version). We have to do a bunch of gymnastics to get the correct > versions installed because of this. Unfortunately, airflow repeats this > practice and conflicts on a number of dependencies, adding further > complication (but, again there is no real conflict). > I haven't seen this practice outside of the Apache & Google ecosystem - for > example no libraries in numerical python do this. Here's a [discussion on > SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions] -- This message was sent by Atlassian JIRA (v7.6.3#76005)