[ 
https://issues.apache.org/jira/browse/BEAM-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16271227#comment-16271227
 ] 

ASF GitHub Bot commented on BEAM-3041:
--------------------------------------

asfgit closed pull request #4192: [BEAM-3041] preinstall various packages for 
better startup performance and reliability
URL: https://github.com/apache/beam/pull/4192
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/container/Dockerfile b/sdks/python/container/Dockerfile
index 826e36c13ee..7cbe6c2632b 100644
--- a/sdks/python/container/Dockerfile
+++ b/sdks/python/container/Dockerfile
@@ -19,8 +19,66 @@
 FROM python:2
 MAINTAINER "Apache Beam <d...@beam.apache.org>"
 
-# TODO(herohde): preinstall various packages for better startup
-# performance and reliability.
+# Install native bindings required for dependencies.
+RUN apt-get update && \
+    apt-get install -y \
+       # These packages are needed for "pip install python-snappy" below.
+       libsnappy-dev \
+       # This package is needed for "pip install pyyaml" below to have c 
bindings.
+       libyaml-dev \
+       && \
+    rm -rf /var/lib/apt/lists/*
+
+# Install packages required by the Python SDK.
+#
+# These packages should be kept in sync with the dependencies at
+# sdks/python/setup.py.  If not installed, sdk harness will install these at
+# runtime, but we would like to avoid doing this so that we do not depend on
+# PyPI at runtime whenever possible.
+#
+# Also install cython, numpy, pandas and scipy as well as their dependencies.
+# These are standard Python packages, likely to be used by python SDK 
customers,
+# and their dependencies.
+#
+RUN \
+    # These are packages needed by the Python SDK.
+    pip install "avro == 1.8.2" && \
+    pip install "crcmod == 1.7" && \
+    pip install "dill == 0.2.6" && \
+    pip install "grpcio == 1.3.0" && \
+    pip install "httplib2 == 0.9.2" && \
+    pip install "mock == 2.0.0" && \
+    pip install "oauth2client == 3.0.0" && \
+    pip install "protobuf == 3.3.0" && \
+    pip install "pyyaml == 3.12" && \
+    pip install "pyvcf == 0.6.8" && \
+    pip install "six == 1.10.0" && \
+    pip install "typing == 3.6.1" && \
+    pip install "futures == 3.1.1" && \
+    # Setup packages
+    pip install "nose == 1.3.7" && \
+    # GCP extra features
+    pip install "google-apitools == 0.5.11" && \
+    pip install "proto-google-cloud-datastore-v1 == 0.90.4" && \
+    pip install "googledatastore == 7.0.1" && \
+    pip install "google-cloud-pubsub == 0.26.0" && \
+    pip install "google-cloud-bigquery == 0.25.0" && \
+    # Optional packages
+    pip install "cython == 0.27.2" && \
+    pip install "guppy == 0.1.10" && \
+    pip install "python-snappy == 0.5.1" && \
+    # These are additional packages likely to be used by customers.
+    pip install "numpy == 1.13.3" --no-binary=:all: && \
+    pip install "pandas == 0.18.1" && \
+    pip install "scipy == 1.0.0" && \
+    pip install "protobuf == 3.3.0" && \
+    pip install "tensorflow == 1.4.0" && \
+    pip install "protorpc == 0.11.1" && \
+    pip install "python-gflags == 3.0.6" && \
+    # Remove pip cache.
+    rm -rf /root/.cache/pip && \
+    # Check that the fast implementation of protobuf is used.
+    python -c "from google.protobuf.internal import api_implementation; assert 
api_implementation._default_implementation_type == 'cpp'; print 'Verified fast 
protobuf used.'"
 
 ADD target/linux_amd64/boot /opt/apache/beam/
 


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add portable Python SDK container setup support
> -----------------------------------------------
>
>                 Key: BEAM-3041
>                 URL: https://issues.apache.org/jira/browse/BEAM-3041
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-harness
>            Reporter: Henning Rohde
>            Assignee: Ahmet Altay
>              Labels: portability
>
> The minimal python container setup should be brought up to par with SDK 
> features:
>  - requirements.txt
>  - main session
>  - extra packages
> The name of the SDK package in boot.go should also not be hardcoded.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to