[ 
https://issues.apache.org/jira/browse/BEAM-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614047#comment-16614047
 ] 

Scott Jungwirth commented on BEAM-3106:
---------------------------------------

I just ran into this issue using Google's Cloud Composer (managed airflow) 
after adding the 2.6.0 (current latest) beam sdk pypy package 
(apache-beam[gcp]>=2.6.0). Looking at the build log, it looks like 
apache-beam[gcp] caused a downgrade of some other google-cloud packages:
...
Installing collected packages: pydot, fastavro, pytz, google-cloud-core, 
google-cloud-bigquery, apache-beam, pysftp, google-cloud-firestore, msgpack, 
cachecontrol, firebase-admin, webob, bugsnag
Found existing installation: pytz 2018.5
Uninstalling pytz-2018.5:
Successfully uninstalled pytz-2018.5
Found existing installation: google-cloud-core 0.28.1
Uninstalling google-cloud-core-0.28.1:
Successfully uninstalled google-cloud-core-0.28.1
Found existing installation: google-cloud-bigquery 1.5.0
Uninstalling google-cloud-bigquery-1.5.0:
Successfully uninstalled google-cloud-bigquery-1.5.0
Found existing installation: apache-beam 2.5.0
Uninstalling apache-beam-2.5.0:
Successfully uninstalled apache-beam-2.5.0
Successfully installed apache-beam-2.6.0 bugsnag-3.4.3 cachecontrol-0.12.5 
fastavro-0.19.7 firebase-admin-2.13.0 google-cloud-bigquery-0.25.0 
google-cloud-core-0.25.0 google-cloud-firestore-0.29.0 msgpack-0.5.6 
pydot-1.2.4 pysftp-0.2.9 pytz-2018.4 webob-1.8.2
I tracked this down to the pinned requirement for bigquery: 
{{google-cloud-bigquery==0.25.0}}  
[https://github.com/apache/beam/blob/v2.6.0/sdks/python/setup.py#L140]

Which led to these pip warnings
$ pipdeptree --warn
Warning!!! Possibly conflicting dependencies found:
* google-cloud-storage==1.10.0
- google-cloud-core [required: <0.29dev,>=0.28.0, installed: 0.25.0]
* google-cloud-firestore==0.29.0
- google-cloud-core [required: <0.29dev,>=0.28.0, installed: 0.25.0]
* pandas-gbq==0.6.0
- google-cloud-bigquery [required: >=0.32.0, installed: 0.25.0]
* google-cloud-dataflow==2.5.0
- apache-beam [required: ==2.5.0, installed: 2.6.0]
* google-cloud-logging==1.6.0
- google-cloud-core [required: <0.29dev,>=0.28.0, installed: 0.25.0]
 And the exception I was getting was from another google cloud storage module
File "/usr/local/lib/python2.7/site-packages/google/cloud/storage/blob.py", 
line 535, in download_to_file
  ...
File 
"/usr/local/lib/python2.7/site-packages/google/resumable_media/_helpers.py", 
line 146, in wait_and_retry 
  response = func() 
File "/usr/local/lib/python2.7/site-packages/google_auth_httplib2.py", line 
198, in request 
  uri, method, body=body, headers=request_headers, **kwargs) 
TypeError: request() got an unexpected keyword argument 'data'
 

 I was able to work-around this issue by explicitly installing the desired 
versions of the google-cloud-core>=0.28.0 and google-cloud-bigquery>=1.5.0 
modules after the apache-beam[gcp]>=2.6.0 module.

 

 

> Consider not pinning all python dependencies, or moving them to 
> requirements.txt
> --------------------------------------------------------------------------------
>
>                 Key: BEAM-3106
>                 URL: https://issues.apache.org/jira/browse/BEAM-3106
>             Project: Beam
>          Issue Type: Wish
>          Components: build-system
>    Affects Versions: 2.1.0
>         Environment: python
>            Reporter: Maximilian Roos
>            Priority: Major
>
> Currently all python dependencies are [pinned or 
> capped|https://github.com/apache/beam/blob/master/sdks/python/setup.py#L97]
> While there's a good argument for supplying a `requirements.txt` with well 
> tested dependencies, having them specified in `setup.py` forces them to an 
> exact state on each install of Beam. This makes using Beam in any environment 
> with other libraries nigh on impossible. 
> This is particularly severe for the `gcp` dependencies, where we have 
> libraries that won't work with an older version (but Beam _does_ work with an 
> newer version). We have to do a bunch of gymnastics to get the correct 
> versions installed because of this. Unfortunately, airflow repeats this 
> practice and conflicts on a number of dependencies, adding further 
> complication (but, again there is no real conflict).
> I haven't seen this practice outside of the Apache & Google ecosystem - for 
> example no libraries in numerical python do this. Here's a [discussion on 
> SO|https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to