I recommend to put all top-level dependencies for your pipeline in setup.py
install_requires section, and autogenerate the requirements.txt, which
would then include all transitive dependencies and ensure reproducible
builds.
For approaches to generate the requirements.txt file from top level
requirements specified in the setup.py file, see:
https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies#optional-update-the-dependencies-in-the-requirements-file-and-rebuild-the-docker-images
.
Valentyn
On Thu, Jun 13, 2024 at 9:52 PM Sofia’s World wrote:
> Many thanks Hu, worked like a charm
>
> few qq
> so in my reqs.txt i should put all beam requirements PLUS my own?
>
> and in the setup.py, shall i just declare
>
> "apache-beam[gcp]==2.54.0", # Must match the version in `Dockerfile``.
>
> thanks and kind regards
> Marco
>
>
>
>
>
>
> On Wed, Jun 12, 2024 at 1:48 PM XQ Hu wrote:
>
>> Any reason to use this?
>>
>> RUN pip install avro-python3 pyarrow==0.15.1 apache-beam[gcp]==2.30.0
>> pandas-datareader==0.9.0
>>
>> It is typically recommended to use the latest Beam and build the docker
>> image using the requirements released for each Beam, for example,
>> https://github.com/apache/beam/blob/release-2.56.0/sdks/python/container/py311/base_image_requirements.txt
>>
>> On Wed, Jun 12, 2024 at 1:31 AM Sofia’s World
>> wrote:
>>
>>> Sure, apologies, it crossed my mind it would have been useful to refert
>>> to it
>>>
>>> so this is the docker file
>>>
>>>
>>> https://github.com/mmistroni/GCP_Experiments/edit/master/dataflow/shareloader/Dockerfile_tester
>>>
>>> I was using a setup.py as well, but then i commented out the usage in
>>> the dockerfile after checking some flex templates which said it is not
>>> needed
>>>
>>>
>>> https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/shareloader/setup_dftester.py
>>>
>>> thanks in advance
>>> Marco
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jun 11, 2024 at 10:54 PM XQ Hu wrote:
>>>
Can you share your Dockerfile?
On Tue, Jun 11, 2024 at 4:43 PM Sofia’s World
wrote:
> thanks all, it seemed to work but now i am getting a different
> problem, having issues in building pyarrow...
>
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
> :36: DeprecationWarning: pkg_resources is deprecated as an API.
> See https://setuptools.pypa.io/en/latest/pkg_resources.html
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
> WARNING setuptools_scm.pyproject_reading toml section missing
> 'pyproject.toml does not contain a tool.setuptools_scm section'
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
> Traceback (most recent call last):
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
> File
> "/tmp/pip-build-env-meihcxsp/overlay/lib/python3.11/site-packages/setuptools_scm/_integration/pyproject_reading.py",
> line 36, in read_pyproject
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
> section = defn.get("tool", {})[tool_name]
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
> ^^^
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
> KeyError: 'setuptools_scm'
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
> running bdist_wheel
>
>
>
>
> It is somehow getting messed up with a toml ?
>
>
> Could anyone advise?
>
> thanks
>
> Marco
>
>
>
>
>
> On Tue, Jun 11, 2024 at 1:00 AM XQ Hu via user
> wrote:
>
>>
>> https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies
>> is a great example.
>>
>> On Mon, Jun 10, 2024 at 4:28 PM Valentyn Tymofieiev via user <
>> user@beam.apache.org> wrote:
>>
>>> In this case the Python version will be defined by the Python
>>> version installed in the docker image of your flex template. So, you'd
>>> have to build your flex template from a base image with Python 3.11.
>>>
>>> On Mon, Jun 10, 2024 at 12:50 PM Sofia’s World
>>> wrote:
>>>
Hello
no i am running my pipelien on GCP directly via a flex template,
configured using a Docker file
Any chances to do something in the Dockerfile to force the version
at runtime?
Thanks
On Mon, Jun 10, 2024 at 7:24 PM Anand Inguva via user <
user@beam.apache.org> wrote:
> Hello,
>
> Are you running your pipeline from the python 3.11 environment?
> If you are running from a python 3.11 environment and don't use a
> custom
> docker c