Apache Bean on GCP / Forcing to use py 3.11

2024-06-10 Thread Sofia’s World
Hello
 sorry for the partially off topic question
I am running a pipeline in which one of hte dependencies need to run on py
3.11
But i dont see any options that allow me to force the python version to be
used

Could anyone help?
Kind regards
Marco


Re: Apache Bean on GCP / Forcing to use py 3.11

2024-06-10 Thread Ahmet Altay via user
If you could use py 3.11 locally, you will get python 3.11 in your cloud
environment as well. Is that not happening?

When you run Apache Beam on GCP, the python version you are using in your
local virtual environment will be used in the cloud environment as well. I
believe this is true for non-GCP environments as well.


On Mon, Jun 10, 2024 at 11:08 AM Sofia’s World  wrote:

> Hello
>  sorry for the partially off topic question
> I am running a pipeline in which one of hte dependencies need to run on py
> 3.11
> But i dont see any options that allow me to force the python version to be
> used
>
> Could anyone help?
> Kind regards
> Marco
>


Re: Apache Bean on GCP / Forcing to use py 3.11

2024-06-10 Thread Anand Inguva via user
Hello,

Are you running your pipeline from the python 3.11 environment?  If you are
running from a python 3.11 environment and don't use a custom docker
container image, DataflowRunner(Assuming Apache Beam on GCP means Apache
Beam on DataflowRunner), will use Python 3.11.

Thanks,
Anand


Re: Apache Bean on GCP / Forcing to use py 3.11

2024-06-10 Thread Sofia’s World
Hello
 no i am running my pipelien on  GCP directly via a flex template,
configured using a Docker file
Any chances to do something in the Dockerfile to force the version at
runtime?
Thanks

On Mon, Jun 10, 2024 at 7:24 PM Anand Inguva via user 
wrote:

> Hello,
>
> Are you running your pipeline from the python 3.11 environment?  If you
> are running from a python 3.11 environment and don't use a custom docker
> container image, DataflowRunner(Assuming Apache Beam on GCP means Apache
> Beam on DataflowRunner), will use Python 3.11.
>
> Thanks,
> Anand
>


Re: Apache Bean on GCP / Forcing to use py 3.11

2024-06-10 Thread Valentyn Tymofieiev via user
In this case the Python version will be defined by the Python version
installed in the docker image of your flex template. So, you'd have to
build your flex template from a base image with Python 3.11.

On Mon, Jun 10, 2024 at 12:50 PM Sofia’s World  wrote:

> Hello
>  no i am running my pipelien on  GCP directly via a flex template,
> configured using a Docker file
> Any chances to do something in the Dockerfile to force the version at
> runtime?
> Thanks
>
> On Mon, Jun 10, 2024 at 7:24 PM Anand Inguva via user <
> user@beam.apache.org> wrote:
>
>> Hello,
>>
>> Are you running your pipeline from the python 3.11 environment?  If you
>> are running from a python 3.11 environment and don't use a custom docker
>> container image, DataflowRunner(Assuming Apache Beam on GCP means Apache
>> Beam on DataflowRunner), will use Python 3.11.
>>
>> Thanks,
>> Anand
>>
>


Re: Apache Bean on GCP / Forcing to use py 3.11

2024-06-10 Thread XQ Hu via user
https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies
is a great example.

On Mon, Jun 10, 2024 at 4:28 PM Valentyn Tymofieiev via user <
user@beam.apache.org> wrote:

> In this case the Python version will be defined by the Python version
> installed in the docker image of your flex template. So, you'd have to
> build your flex template from a base image with Python 3.11.
>
> On Mon, Jun 10, 2024 at 12:50 PM Sofia’s World 
> wrote:
>
>> Hello
>>  no i am running my pipelien on  GCP directly via a flex template,
>> configured using a Docker file
>> Any chances to do something in the Dockerfile to force the version at
>> runtime?
>> Thanks
>>
>> On Mon, Jun 10, 2024 at 7:24 PM Anand Inguva via user <
>> user@beam.apache.org> wrote:
>>
>>> Hello,
>>>
>>> Are you running your pipeline from the python 3.11 environment?  If you
>>> are running from a python 3.11 environment and don't use a custom docker
>>> container image, DataflowRunner(Assuming Apache Beam on GCP means Apache
>>> Beam on DataflowRunner), will use Python 3.11.
>>>
>>> Thanks,
>>> Anand
>>>
>>


Re: Apache Bean on GCP / Forcing to use py 3.11

2024-06-11 Thread Sofia’s World
thanks all,  it seemed to work but now i am getting a different problem,
having issues in building pyarrow...

Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
   :36: DeprecationWarning: pkg_resources is deprecated as an
API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
   WARNING setuptools_scm.pyproject_reading toml section missing
'pyproject.toml does not contain a tool.setuptools_scm section'
Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
   Traceback (most recent call last):
Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
 File 
"/tmp/pip-build-env-meihcxsp/overlay/lib/python3.11/site-packages/setuptools_scm/_integration/pyproject_reading.py",
line 36, in read_pyproject
Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
   section = defn.get("tool", {})[tool_name]
Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
 ^^^
Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
   KeyError: 'setuptools_scm'
Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
   running bdist_wheel




It is somehow getting messed up with a toml ?


Could anyone advise?

thanks

 Marco





On Tue, Jun 11, 2024 at 1:00 AM XQ Hu via user  wrote:

>
> https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies
> is a great example.
>
> On Mon, Jun 10, 2024 at 4:28 PM Valentyn Tymofieiev via user <
> user@beam.apache.org> wrote:
>
>> In this case the Python version will be defined by the Python version
>> installed in the docker image of your flex template. So, you'd have to
>> build your flex template from a base image with Python 3.11.
>>
>> On Mon, Jun 10, 2024 at 12:50 PM Sofia’s World 
>> wrote:
>>
>>> Hello
>>>  no i am running my pipelien on  GCP directly via a flex template,
>>> configured using a Docker file
>>> Any chances to do something in the Dockerfile to force the version at
>>> runtime?
>>> Thanks
>>>
>>> On Mon, Jun 10, 2024 at 7:24 PM Anand Inguva via user <
>>> user@beam.apache.org> wrote:
>>>
 Hello,

 Are you running your pipeline from the python 3.11 environment?  If you
 are running from a python 3.11 environment and don't use a custom docker
 container image, DataflowRunner(Assuming Apache Beam on GCP means Apache
 Beam on DataflowRunner), will use Python 3.11.

 Thanks,
 Anand

>>>


Re: Apache Bean on GCP / Forcing to use py 3.11

2024-06-11 Thread XQ Hu via user
Can you share your Dockerfile?

On Tue, Jun 11, 2024 at 4:43 PM Sofia’s World  wrote:

> thanks all,  it seemed to work but now i am getting a different problem,
> having issues in building pyarrow...
>
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
> :36: DeprecationWarning: pkg_resources is deprecated as an API. See 
> https://setuptools.pypa.io/en/latest/pkg_resources.html
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
> WARNING setuptools_scm.pyproject_reading toml section missing 'pyproject.toml 
> does not contain a tool.setuptools_scm section'
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
> Traceback (most recent call last):
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image": 
> File 
> "/tmp/pip-build-env-meihcxsp/overlay/lib/python3.11/site-packages/setuptools_scm/_integration/pyproject_reading.py",
>  line 36, in read_pyproject
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
> section = defn.get("tool", {})[tool_name]
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
>   ^^^
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
> KeyError: 'setuptools_scm'
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
> running bdist_wheel
>
>
>
>
> It is somehow getting messed up with a toml ?
>
>
> Could anyone advise?
>
> thanks
>
>  Marco
>
>
>
>
>
> On Tue, Jun 11, 2024 at 1:00 AM XQ Hu via user 
> wrote:
>
>>
>> https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies
>> is a great example.
>>
>> On Mon, Jun 10, 2024 at 4:28 PM Valentyn Tymofieiev via user <
>> user@beam.apache.org> wrote:
>>
>>> In this case the Python version will be defined by the Python version
>>> installed in the docker image of your flex template. So, you'd have to
>>> build your flex template from a base image with Python 3.11.
>>>
>>> On Mon, Jun 10, 2024 at 12:50 PM Sofia’s World 
>>> wrote:
>>>
 Hello
  no i am running my pipelien on  GCP directly via a flex template,
 configured using a Docker file
 Any chances to do something in the Dockerfile to force the version at
 runtime?
 Thanks

 On Mon, Jun 10, 2024 at 7:24 PM Anand Inguva via user <
 user@beam.apache.org> wrote:

> Hello,
>
> Are you running your pipeline from the python 3.11 environment?  If
> you are running from a python 3.11 environment and don't use a custom
> docker container image, DataflowRunner(Assuming Apache Beam on GCP means
> Apache Beam on DataflowRunner), will use Python 3.11.
>
> Thanks,
> Anand
>



Re: Apache Bean on GCP / Forcing to use py 3.11

2024-06-12 Thread XQ Hu via user
Any reason to use this?

RUN pip install avro-python3 pyarrow==0.15.1 apache-beam[gcp]==2.30.0
 pandas-datareader==0.9.0

It is typically recommended to use the latest Beam and build the docker
image using the requirements released for each Beam, for example,
https://github.com/apache/beam/blob/release-2.56.0/sdks/python/container/py311/base_image_requirements.txt

On Wed, Jun 12, 2024 at 1:31 AM Sofia’s World  wrote:

> Sure, apologies, it crossed my mind it would have been useful to refert to
> it
>
> so this is the docker file
>
>
> https://github.com/mmistroni/GCP_Experiments/edit/master/dataflow/shareloader/Dockerfile_tester
>
> I was using a setup.py as well, but then i commented out the usage in the
> dockerfile after checking some flex templates which said it is not needed
>
>
> https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/shareloader/setup_dftester.py
>
> thanks in advance
>  Marco
>
>
>
>
>
>
>
> On Tue, Jun 11, 2024 at 10:54 PM XQ Hu  wrote:
>
>> Can you share your Dockerfile?
>>
>> On Tue, Jun 11, 2024 at 4:43 PM Sofia’s World 
>> wrote:
>>
>>> thanks all,  it seemed to work but now i am getting a different problem,
>>> having issues in building pyarrow...
>>>
>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
>>> :36: DeprecationWarning: pkg_resources is deprecated as an API. See 
>>> https://setuptools.pypa.io/en/latest/pkg_resources.html
>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
>>> WARNING setuptools_scm.pyproject_reading toml section missing 
>>> 'pyproject.toml does not contain a tool.setuptools_scm section'
>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
>>> Traceback (most recent call last):
>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image": 
>>> File 
>>> "/tmp/pip-build-env-meihcxsp/overlay/lib/python3.11/site-packages/setuptools_scm/_integration/pyproject_reading.py",
>>>  line 36, in read_pyproject
>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image": 
>>>   section = defn.get("tool", {})[tool_name]
>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image": 
>>> ^^^
>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
>>> KeyError: 'setuptools_scm'
>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
>>> running bdist_wheel
>>>
>>>
>>>
>>>
>>> It is somehow getting messed up with a toml ?
>>>
>>>
>>> Could anyone advise?
>>>
>>> thanks
>>>
>>>  Marco
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jun 11, 2024 at 1:00 AM XQ Hu via user 
>>> wrote:
>>>

 https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies
 is a great example.

 On Mon, Jun 10, 2024 at 4:28 PM Valentyn Tymofieiev via user <
 user@beam.apache.org> wrote:

> In this case the Python version will be defined by the Python version
> installed in the docker image of your flex template. So, you'd have to
> build your flex template from a base image with Python 3.11.
>
> On Mon, Jun 10, 2024 at 12:50 PM Sofia’s World 
> wrote:
>
>> Hello
>>  no i am running my pipelien on  GCP directly via a flex template,
>> configured using a Docker file
>> Any chances to do something in the Dockerfile to force the version at
>> runtime?
>> Thanks
>>
>> On Mon, Jun 10, 2024 at 7:24 PM Anand Inguva via user <
>> user@beam.apache.org> wrote:
>>
>>> Hello,
>>>
>>> Are you running your pipeline from the python 3.11 environment?  If
>>> you are running from a python 3.11 environment and don't use a custom
>>> docker container image, DataflowRunner(Assuming Apache Beam on GCP means
>>> Apache Beam on DataflowRunner), will use Python 3.11.
>>>
>>> Thanks,
>>> Anand
>>>
>>


Re: Apache Bean on GCP / Forcing to use py 3.11

2024-06-13 Thread Sofia’s World
Many thanks Hu, worked like a charm

few qq
so in my reqs.txt i should put all beam requirements PLUS my own?

and in the setup.py, shall i just declare

"apache-beam[gcp]==2.54.0",  # Must match the version in `Dockerfile``.

thanks and kind regards
Marco






On Wed, Jun 12, 2024 at 1:48 PM XQ Hu  wrote:

> Any reason to use this?
>
> RUN pip install avro-python3 pyarrow==0.15.1 apache-beam[gcp]==2.30.0
>  pandas-datareader==0.9.0
>
> It is typically recommended to use the latest Beam and build the docker
> image using the requirements released for each Beam, for example,
> https://github.com/apache/beam/blob/release-2.56.0/sdks/python/container/py311/base_image_requirements.txt
>
> On Wed, Jun 12, 2024 at 1:31 AM Sofia’s World  wrote:
>
>> Sure, apologies, it crossed my mind it would have been useful to refert
>> to it
>>
>> so this is the docker file
>>
>>
>> https://github.com/mmistroni/GCP_Experiments/edit/master/dataflow/shareloader/Dockerfile_tester
>>
>> I was using a setup.py as well, but then i commented out the usage in the
>> dockerfile after checking some flex templates which said it is not needed
>>
>>
>> https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/shareloader/setup_dftester.py
>>
>> thanks in advance
>>  Marco
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Jun 11, 2024 at 10:54 PM XQ Hu  wrote:
>>
>>> Can you share your Dockerfile?
>>>
>>> On Tue, Jun 11, 2024 at 4:43 PM Sofia’s World 
>>> wrote:
>>>
 thanks all,  it seemed to work but now i am getting a different
 problem, having issues in building pyarrow...

 Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
 :36: DeprecationWarning: pkg_resources is deprecated as an API. 
 See https://setuptools.pypa.io/en/latest/pkg_resources.html
 Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
 WARNING setuptools_scm.pyproject_reading toml section missing 
 'pyproject.toml does not contain a tool.setuptools_scm section'
 Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
 Traceback (most recent call last):
 Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
  File 
 "/tmp/pip-build-env-meihcxsp/overlay/lib/python3.11/site-packages/setuptools_scm/_integration/pyproject_reading.py",
  line 36, in read_pyproject
 Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
section = defn.get("tool", {})[tool_name]
 Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
  ^^^
 Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
 KeyError: 'setuptools_scm'
 Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
 running bdist_wheel




 It is somehow getting messed up with a toml ?


 Could anyone advise?

 thanks

  Marco





 On Tue, Jun 11, 2024 at 1:00 AM XQ Hu via user 
 wrote:

>
> https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies
> is a great example.
>
> On Mon, Jun 10, 2024 at 4:28 PM Valentyn Tymofieiev via user <
> user@beam.apache.org> wrote:
>
>> In this case the Python version will be defined by the Python version
>> installed in the docker image of your flex template. So, you'd have to
>> build your flex template from a base image with Python 3.11.
>>
>> On Mon, Jun 10, 2024 at 12:50 PM Sofia’s World 
>> wrote:
>>
>>> Hello
>>>  no i am running my pipelien on  GCP directly via a flex template,
>>> configured using a Docker file
>>> Any chances to do something in the Dockerfile to force the version
>>> at runtime?
>>> Thanks
>>>
>>> On Mon, Jun 10, 2024 at 7:24 PM Anand Inguva via user <
>>> user@beam.apache.org> wrote:
>>>
 Hello,

 Are you running your pipeline from the python 3.11 environment?  If
 you are running from a python 3.11 environment and don't use a custom
 docker container image, DataflowRunner(Assuming Apache Beam on GCP 
 means
 Apache Beam on DataflowRunner), will use Python 3.11.

 Thanks,
 Anand

>>>


Re: Apache Bean on GCP / Forcing to use py 3.11

2024-06-14 Thread Valentyn Tymofieiev via user
I recommend to put all top-level dependencies for your pipeline in setup.py
install_requires section, and autogenerate the requirements.txt, which
would then include all transitive dependencies and ensure reproducible
builds.

For approaches to generate the requirements.txt file from top level
requirements specified in the setup.py file, see:
https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies#optional-update-the-dependencies-in-the-requirements-file-and-rebuild-the-docker-images
.

Valentyn

On Thu, Jun 13, 2024 at 9:52 PM Sofia’s World  wrote:

> Many thanks Hu, worked like a charm
>
> few qq
> so in my reqs.txt i should put all beam requirements PLUS my own?
>
> and in the setup.py, shall i just declare
>
> "apache-beam[gcp]==2.54.0",  # Must match the version in `Dockerfile``.
>
> thanks and kind regards
> Marco
>
>
>
>
>
>
> On Wed, Jun 12, 2024 at 1:48 PM XQ Hu  wrote:
>
>> Any reason to use this?
>>
>> RUN pip install avro-python3 pyarrow==0.15.1 apache-beam[gcp]==2.30.0
>>  pandas-datareader==0.9.0
>>
>> It is typically recommended to use the latest Beam and build the docker
>> image using the requirements released for each Beam, for example,
>> https://github.com/apache/beam/blob/release-2.56.0/sdks/python/container/py311/base_image_requirements.txt
>>
>> On Wed, Jun 12, 2024 at 1:31 AM Sofia’s World 
>> wrote:
>>
>>> Sure, apologies, it crossed my mind it would have been useful to refert
>>> to it
>>>
>>> so this is the docker file
>>>
>>>
>>> https://github.com/mmistroni/GCP_Experiments/edit/master/dataflow/shareloader/Dockerfile_tester
>>>
>>> I was using a setup.py as well, but then i commented out the usage in
>>> the dockerfile after checking some flex templates which said it is not
>>> needed
>>>
>>>
>>> https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/shareloader/setup_dftester.py
>>>
>>> thanks in advance
>>>  Marco
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jun 11, 2024 at 10:54 PM XQ Hu  wrote:
>>>
 Can you share your Dockerfile?

 On Tue, Jun 11, 2024 at 4:43 PM Sofia’s World 
 wrote:

> thanks all,  it seemed to work but now i am getting a different
> problem, having issues in building pyarrow...
>
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
> :36: DeprecationWarning: pkg_resources is deprecated as an API. 
> See https://setuptools.pypa.io/en/latest/pkg_resources.html
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
> WARNING setuptools_scm.pyproject_reading toml section missing 
> 'pyproject.toml does not contain a tool.setuptools_scm section'
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
> Traceback (most recent call last):
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
>   File 
> "/tmp/pip-build-env-meihcxsp/overlay/lib/python3.11/site-packages/setuptools_scm/_integration/pyproject_reading.py",
>  line 36, in read_pyproject
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
> section = defn.get("tool", {})[tool_name]
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
>   ^^^
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
> KeyError: 'setuptools_scm'
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
> running bdist_wheel
>
>
>
>
> It is somehow getting messed up with a toml ?
>
>
> Could anyone advise?
>
> thanks
>
>  Marco
>
>
>
>
>
> On Tue, Jun 11, 2024 at 1:00 AM XQ Hu via user 
> wrote:
>
>>
>> https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies
>> is a great example.
>>
>> On Mon, Jun 10, 2024 at 4:28 PM Valentyn Tymofieiev via user <
>> user@beam.apache.org> wrote:
>>
>>> In this case the Python version will be defined by the Python
>>> version installed in the docker image of your flex template. So, you'd
>>> have to build your flex template from a base image with Python 3.11.
>>>
>>> On Mon, Jun 10, 2024 at 12:50 PM Sofia’s World 
>>> wrote:
>>>
 Hello
  no i am running my pipelien on  GCP directly via a flex template,
 configured using a Docker file
 Any chances to do something in the Dockerfile to force the version
 at runtime?
 Thanks

 On Mon, Jun 10, 2024 at 7:24 PM Anand Inguva via user <
 user@beam.apache.org> wrote:

> Hello,
>
> Are you running your pipeline from the python 3.11 environment?
> If you are running from a python 3.11 environment and don't use a 
> custom
> docker c

Re: Apache Bean on GCP / Forcing to use py 3.11

2024-06-15 Thread Sofia’s World
Sorry, i cheered up too early
i can successfully build the image however, at runtime the code fails
always with this exception and i cannot figure out why

i mimicked the sample directory structure


 mypackage
   --- __init__,py
   dftester.py
   obb_utils.py

dataflow_tester_main.py

this is the content of my dataflow_tester_main.py

from mypackage import dftester
import logging
if __name__ == '__main__':
  logging.getLogger().setLevel(logging.INFO)
  dftester.run()


and this is my dockerfile

https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/shareloader/Dockerfile_tester

and at the bottom if this email my exception
I am puzzled on where the error is coming from as i have almost copied this
sample
https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/dataflow/flex-templates/pipeline_with_dependencies/main.py

thanks and regards
 Marco











Traceback (most recent call last): File
"/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
line 115, in create_harness _load_main_session(semi_persistent_directory)
File
"/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
line 354, in _load_main_session pickler.load_session(session_file) File
"/usr/local/lib/python3.11/site-packages/apache_beam/internal/pickler.py",
line 65, in load_session return desired_pickle_lib.load_session(file_path)
^^ File
"/usr/local/lib/python3.11/site-packages/apache_beam/internal/dill_pickler.py",
line 446, in load_session return dill.load_session(file_path)
 File
"/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 368, in
load_session module = unpickler.load()  File
"/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 472, in load
obj = StockUnpickler.load(self) ^ File
"/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 462, in
find_class return StockUnpickler.find_class(self, module, name)
^ ModuleNotFoundError: No
module named 'modules'







On Fri, Jun 14, 2024 at 5:52 AM Sofia’s World  wrote:

> Many thanks Hu, worked like a charm
>
> few qq
> so in my reqs.txt i should put all beam requirements PLUS my own?
>
> and in the setup.py, shall i just declare
>
> "apache-beam[gcp]==2.54.0",  # Must match the version in `Dockerfile``.
>
> thanks and kind regards
> Marco
>
>
>
>
>
>
> On Wed, Jun 12, 2024 at 1:48 PM XQ Hu  wrote:
>
>> Any reason to use this?
>>
>> RUN pip install avro-python3 pyarrow==0.15.1 apache-beam[gcp]==2.30.0
>>  pandas-datareader==0.9.0
>>
>> It is typically recommended to use the latest Beam and build the docker
>> image using the requirements released for each Beam, for example,
>> https://github.com/apache/beam/blob/release-2.56.0/sdks/python/container/py311/base_image_requirements.txt
>>
>> On Wed, Jun 12, 2024 at 1:31 AM Sofia’s World 
>> wrote:
>>
>>> Sure, apologies, it crossed my mind it would have been useful to refert
>>> to it
>>>
>>> so this is the docker file
>>>
>>>
>>> https://github.com/mmistroni/GCP_Experiments/edit/master/dataflow/shareloader/Dockerfile_tester
>>>
>>> I was using a setup.py as well, but then i commented out the usage in
>>> the dockerfile after checking some flex templates which said it is not
>>> needed
>>>
>>>
>>> https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/shareloader/setup_dftester.py
>>>
>>> thanks in advance
>>>  Marco
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jun 11, 2024 at 10:54 PM XQ Hu  wrote:
>>>
 Can you share your Dockerfile?

 On Tue, Jun 11, 2024 at 4:43 PM Sofia’s World 
 wrote:

> thanks all,  it seemed to work but now i am getting a different
> problem, having issues in building pyarrow...
>
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
> :36: DeprecationWarning: pkg_resources is deprecated as an API. 
> See https://setuptools.pypa.io/en/latest/pkg_resources.html
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
> WARNING setuptools_scm.pyproject_reading toml section missing 
> 'pyproject.toml does not contain a tool.setuptools_scm section'
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
> Traceback (most recent call last):
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
>   File 
> "/tmp/pip-build-env-meihcxsp/overlay/lib/python3.11/site-packages/setuptools_scm/_integration/pyproject_reading.py",
>  line 36, in read_pyproject
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
> section = defn.get("tool", {})[tool_name]
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
>   ^^^
> Step #0 - "build-shareloader-template": Step #4 - "dfteste

Re: Apache Bean on GCP / Forcing to use py 3.11

2024-06-15 Thread Valentyn Tymofieiev via user
Your pipeline launcher refers to a package named 'modules', but this
package is not available in the runtime environment.

On Sat, Jun 15, 2024 at 11:17 AM Sofia’s World  wrote:

> Sorry, i cheered up too early
> i can successfully build the image however, at runtime the code fails
> always with this exception and i cannot figure out why
>
> i mimicked the sample directory structure
>
>
>  mypackage
>--- __init__,py
>dftester.py
>obb_utils.py
>
> dataflow_tester_main.py
>
> this is the content of my dataflow_tester_main.py
>
> from mypackage import dftester
> import logging
> if __name__ == '__main__':
>   logging.getLogger().setLevel(logging.INFO)
>   dftester.run()
>
>
> and this is my dockerfile
>
>
> https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/shareloader/Dockerfile_tester
>
> and at the bottom if this email my exception
> I am puzzled on where the error is coming from as i have almost copied
> this sample
> https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/dataflow/flex-templates/pipeline_with_dependencies/main.py
>
> thanks and regards
>  Marco
>
>
>
>
>
>
>
>
>
>
>
> Traceback (most recent call last): File
> "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
> line 115, in create_harness _load_main_session(semi_persistent_directory)
> File
> "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
> line 354, in _load_main_session pickler.load_session(session_file) File
> "/usr/local/lib/python3.11/site-packages/apache_beam/internal/pickler.py",
> line 65, in load_session return desired_pickle_lib.load_session(file_path)
> ^^ File
> "/usr/local/lib/python3.11/site-packages/apache_beam/internal/dill_pickler.py",
> line 446, in load_session return dill.load_session(file_path)
>  File
> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 368, in
> load_session module = unpickler.load()  File
> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 472, in load
> obj = StockUnpickler.load(self) ^ File
> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 462, in
> find_class return StockUnpickler.find_class(self, module, name)
> ^ ModuleNotFoundError: No
> module named 'modules'
>
>
>
>
>
>
>
> On Fri, Jun 14, 2024 at 5:52 AM Sofia’s World  wrote:
>
>> Many thanks Hu, worked like a charm
>>
>> few qq
>> so in my reqs.txt i should put all beam requirements PLUS my own?
>>
>> and in the setup.py, shall i just declare
>>
>> "apache-beam[gcp]==2.54.0",  # Must match the version in `Dockerfile``.
>>
>> thanks and kind regards
>> Marco
>>
>>
>>
>>
>>
>>
>> On Wed, Jun 12, 2024 at 1:48 PM XQ Hu  wrote:
>>
>>> Any reason to use this?
>>>
>>> RUN pip install avro-python3 pyarrow==0.15.1 apache-beam[gcp]==2.30.0
>>>  pandas-datareader==0.9.0
>>>
>>> It is typically recommended to use the latest Beam and build the docker
>>> image using the requirements released for each Beam, for example,
>>> https://github.com/apache/beam/blob/release-2.56.0/sdks/python/container/py311/base_image_requirements.txt
>>>
>>> On Wed, Jun 12, 2024 at 1:31 AM Sofia’s World 
>>> wrote:
>>>
 Sure, apologies, it crossed my mind it would have been useful to refert
 to it

 so this is the docker file


 https://github.com/mmistroni/GCP_Experiments/edit/master/dataflow/shareloader/Dockerfile_tester

 I was using a setup.py as well, but then i commented out the usage in
 the dockerfile after checking some flex templates which said it is not
 needed


 https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/shareloader/setup_dftester.py

 thanks in advance
  Marco







 On Tue, Jun 11, 2024 at 10:54 PM XQ Hu  wrote:

> Can you share your Dockerfile?
>
> On Tue, Jun 11, 2024 at 4:43 PM Sofia’s World 
> wrote:
>
>> thanks all,  it seemed to work but now i am getting a different
>> problem, having issues in building pyarrow...
>>
>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":  
>>  :36: DeprecationWarning: pkg_resources is deprecated as an API. 
>> See https://setuptools.pypa.io/en/latest/pkg_resources.html
>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":  
>>  WARNING setuptools_scm.pyproject_reading toml section missing 
>> 'pyproject.toml does not contain a tool.setuptools_scm section'
>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":  
>>  Traceback (most recent call last):
>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":  
>>File 
>> "/tmp/pip-build-env-meihcxsp/overlay/lib/python3.11/site-packages/setuptools_scm/_integration/pyproject_re

Re: Apache Bean on GCP / Forcing to use py 3.11

2024-06-16 Thread Sofia’s World
Valentin, many thanks... i actually spotted the reference in teh setup file
However , after correcting it, i am still at square 1 where somehow my
runtime environment does not see it.. so i added some debugging to my
Dockerfile to check if i forgot to copy something,
and here's the output, where i can see the mypackage has been copied

here's my directory structure

 mypackage
__init__.py
obbutils.py
launcher.py
__init__.py
dataflow_tester.py
setup_dftester.py (copied to setup.py)

i can see the directory structure has been maintained when i copy my files
to docker as i added some debug to my dockerfile

Step #0 - "dftester-image": Removing intermediate container 4c4e763289d2
Step #0 - "dftester-image":  ---> cda378f70a9e
Step #0 - "dftester-image": Step 6/23 : COPY requirements.txt .
Step #0 - "dftester-image":  ---> 9a43da08b013
Step #0 - "dftester-image": Step 7/23 : COPY setup_dftester.py setup.py
Step #0 - "dftester-image":  ---> 5a6bf71df052
Step #0 - "dftester-image": Step 8/23 : COPY dataflow_tester.py .
Step #0 - "dftester-image":  ---> 82cfe1f1f9ed
Step #0 - "dftester-image": Step 9/23 : COPY mypackage mypackage
Step #0 - "dftester-image":  ---> d86497b791d0
Step #0 - "dftester-image": Step 10/23 : COPY __init__.py
${WORKDIR}/__init__.py
Step #0 - "dftester-image":  ---> 337d149d64c7
Step #0 - "dftester-image": Step 11/23 : RUN echo '- listing workdir'
Step #0 - "dftester-image":  ---> Running in 9d97d8a64319
Step #0 - "dftester-image": - listing workdir
Step #0 - "dftester-image": Removing intermediate container 9d97d8a64319
Step #0 - "dftester-image":  ---> bc9a6a2aa462
Step #0 - "dftester-image": Step 12/23 : RUN ls -la ${WORKDIR}
Step #0 - "dftester-image":  ---> Running in cf164108f9d6
Step #0 - "dftester-image": total 24
Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 .
Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 ..
Step #0 - "dftester-image": -rw-r--r-- 1 root root0 Jun 16 08:57
__init__.py
Step #0 - "dftester-image": -rw-r--r-- 1 root root  135 Jun 16 08:57
dataflow_tester.py
Step #0 - "dftester-image": drwxr-xr-x 2 root root 4096 Jun 16 08:59
mypackage
Step #0 - "dftester-image": -rw-r--r-- 1 root root   64 Jun 16 08:57
requirements.txt
Step #0 - "dftester-image": -rw-r--r-- 1 root root  736 Jun 16 08:57
setup.py
Step #0 - "dftester-image": Removing intermediate container cf164108f9d6
Step #0 - "dftester-image":  ---> eb1a080b7948
Step #0 - "dftester-image": Step 13/23 : RUN echo '--- listing modules
-'
Step #0 - "dftester-image":  ---> Running in 884f03dd81d6
Step #0 - "dftester-image": --- listing modules -
Step #0 - "dftester-image": Removing intermediate container 884f03dd81d6
Step #0 - "dftester-image":  ---> 9f6f7e27bd2f
Step #0 - "dftester-image": Step 14/23 : RUN ls -la  ${WORKDIR}/mypackage
Step #0 - "dftester-image":  ---> Running in bd74ade37010
Step #0 - "dftester-image": total 16
Step #0 - "dftester-image": drwxr-xr-x 2 root root 4096 Jun 16 08:59 .
Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 ..
Step #0 - "dftester-image": -rw-r--r-- 1 root root0 Jun 16 08:57
__init__.py
Step #0 - "dftester-image": -rw-r--r-- 1 root root 1442 Jun 16 08:57
launcher.py
Step #0 - "dftester-image": -rw-r--r-- 1 root root  607 Jun 16 08:57
obb_utils.py
Step #0 - "dftester-image": Removing intermediate container bd74ade37010


i have this in my setup.py

REQUIRED_PACKAGES = [
'openbb',
"apache-beam[gcp]",  # Must match the version in `Dockerfile``.
'sendgrid',
'pandas_datareader',
'vaderSentiment',
'numpy',
'bs4',
'lxml',
'pandas_datareader',
'beautifulsoup4',
'xlrd',
'openpyxl'
]


setuptools.setup(
name='mypackage',
version='0.0.1',
description='Shres Runner Package.',
install_requires=REQUIRED_PACKAGES,
packages=setuptools.find_packages()
)


and this is my dataflow_tester.py

from mypackage import launcher
import logging
if __name__ == '__main__':
  logging.getLogger().setLevel(logging.INFO)
  launcher.run()



have compared my setup vs
https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies
and all looks the same (apart from my copying the __init__.,py fromo the
directory where the main file(dataflow_tester.py) resides

Would you know how else can i debug what is going on and why my  mypackages
subdirectory is not being seen?

Kind regars
 Marco




On Sat, Jun 15, 2024 at 7:27 PM Valentyn Tymofieiev via user <
user@beam.apache.org> wrote:

> Your pipeline launcher refers to a package named 'modules', but this
> package is not available in the runtime environment.
>
> On Sat, Jun 15, 2024 at 11:17 AM Sofia’s World 
> wrote:
>
>> Sorry, i cheered up too early
>> i can successfully build the image however, at runtime the code fails
>> always with this exception and i cannot figure out why
>>
>> i mimicked the sample directory structure
>>
>>
>>  mypackage

Re: Apache Bean on GCP / Forcing to use py 3.11

2024-06-16 Thread XQ Hu via user
What is the error message now?
You can easily ssh to your docker container and check everything is
installed correctly by:
docker run --rm -it --entrypoint=/bin/bash $CUSTOM_CONTAINER_IMAGE


On Sun, Jun 16, 2024 at 5:18 AM Sofia’s World  wrote:

> Valentin, many thanks... i actually spotted the reference in teh setup file
> However , after correcting it, i am still at square 1 where somehow my
> runtime environment does not see it.. so i added some debugging to my
> Dockerfile to check if i forgot to copy something,
> and here's the output, where i can see the mypackage has been copied
>
> here's my directory structure
>
>  mypackage
> __init__.py
> obbutils.py
> launcher.py
> __init__.py
> dataflow_tester.py
> setup_dftester.py (copied to setup.py)
>
> i can see the directory structure has been maintained when i copy my files
> to docker as i added some debug to my dockerfile
>
> Step #0 - "dftester-image": Removing intermediate container 4c4e763289d2
> Step #0 - "dftester-image":  ---> cda378f70a9e
> Step #0 - "dftester-image": Step 6/23 : COPY requirements.txt .
> Step #0 - "dftester-image":  ---> 9a43da08b013
> Step #0 - "dftester-image": Step 7/23 : COPY setup_dftester.py setup.py
> Step #0 - "dftester-image":  ---> 5a6bf71df052
> Step #0 - "dftester-image": Step 8/23 : COPY dataflow_tester.py .
> Step #0 - "dftester-image":  ---> 82cfe1f1f9ed
> Step #0 - "dftester-image": Step 9/23 : COPY mypackage mypackage
> Step #0 - "dftester-image":  ---> d86497b791d0
> Step #0 - "dftester-image": Step 10/23 : COPY __init__.py
> ${WORKDIR}/__init__.py
> Step #0 - "dftester-image":  ---> 337d149d64c7
> Step #0 - "dftester-image": Step 11/23 : RUN echo '- listing workdir'
> Step #0 - "dftester-image":  ---> Running in 9d97d8a64319
> Step #0 - "dftester-image": - listing workdir
> Step #0 - "dftester-image": Removing intermediate container 9d97d8a64319
> Step #0 - "dftester-image":  ---> bc9a6a2aa462
> Step #0 - "dftester-image": Step 12/23 : RUN ls -la ${WORKDIR}
> Step #0 - "dftester-image":  ---> Running in cf164108f9d6
> Step #0 - "dftester-image": total 24
> Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 .
> Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 ..
> Step #0 - "dftester-image": -rw-r--r-- 1 root root0 Jun 16 08:57
> __init__.py
> Step #0 - "dftester-image": -rw-r--r-- 1 root root  135 Jun 16 08:57
> dataflow_tester.py
> Step #0 - "dftester-image": drwxr-xr-x 2 root root 4096 Jun 16 08:59
> mypackage
> Step #0 - "dftester-image": -rw-r--r-- 1 root root   64 Jun 16 08:57
> requirements.txt
> Step #0 - "dftester-image": -rw-r--r-- 1 root root  736 Jun 16 08:57
> setup.py
> Step #0 - "dftester-image": Removing intermediate container cf164108f9d6
> Step #0 - "dftester-image":  ---> eb1a080b7948
> Step #0 - "dftester-image": Step 13/23 : RUN echo '--- listing modules
> -'
> Step #0 - "dftester-image":  ---> Running in 884f03dd81d6
> Step #0 - "dftester-image": --- listing modules -
> Step #0 - "dftester-image": Removing intermediate container 884f03dd81d6
> Step #0 - "dftester-image":  ---> 9f6f7e27bd2f
> Step #0 - "dftester-image": Step 14/23 : RUN ls -la  ${WORKDIR}/mypackage
> Step #0 - "dftester-image":  ---> Running in bd74ade37010
> Step #0 - "dftester-image": total 16
> Step #0 - "dftester-image": drwxr-xr-x 2 root root 4096 Jun 16 08:59 .
> Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 ..
> Step #0 - "dftester-image": -rw-r--r-- 1 root root0 Jun 16 08:57
> __init__.py
> Step #0 - "dftester-image": -rw-r--r-- 1 root root 1442 Jun 16 08:57
> launcher.py
> Step #0 - "dftester-image": -rw-r--r-- 1 root root  607 Jun 16 08:57
> obb_utils.py
> Step #0 - "dftester-image": Removing intermediate container bd74ade37010
>
>
> i have this in my setup.py
>
> REQUIRED_PACKAGES = [
> 'openbb',
> "apache-beam[gcp]",  # Must match the version in `Dockerfile``.
> 'sendgrid',
> 'pandas_datareader',
> 'vaderSentiment',
> 'numpy',
> 'bs4',
> 'lxml',
> 'pandas_datareader',
> 'beautifulsoup4',
> 'xlrd',
> 'openpyxl'
> ]
>
>
> setuptools.setup(
> name='mypackage',
> version='0.0.1',
> description='Shres Runner Package.',
> install_requires=REQUIRED_PACKAGES,
> packages=setuptools.find_packages()
> )
>
>
> and this is my dataflow_tester.py
>
> from mypackage import launcher
> import logging
> if __name__ == '__main__':
>   logging.getLogger().setLevel(logging.INFO)
>   launcher.run()
>
>
>
> have compared my setup vs
> https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies
> and all looks the same (apart from my copying the __init__.,py fromo the
> directory where the main file(dataflow_tester.py) resides
>
> Would you know how else can i debug what is going on and why my
> mypackages subdirectory is not being seen?
>
> Kind regars
>  Marco
>
>
>
>
> On Sat, Jun 15, 2024 at 7:27 PM Valentyn 

Re: Apache Bean on GCP / Forcing to use py 3.11

2024-06-16 Thread Sofia’s World
Error is same...- see bottom -
i have tried to ssh in the container and the directory is setup as
expected.. so not quite sure where the issue is
i will try to start from the pipeline with dependencies sample and work out
from there  w.o bothering the list

thanks again for following up
 Marco

Could not load main session. Inspect which external dependencies are used
in the main module of your pipeline. Verify that corresponding packages are
installed in the pipeline runtime environment and their installed versions
match the versions used in pipeline submission environment. For more
information, see:
https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/
Traceback (most recent call last): File
"/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
line 115, in create_harness _load_main_session(semi_persistent_directory)
File
"/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
line 354, in _load_main_session pickler.load_session(session_file) File
"/usr/local/lib/python3.11/site-packages/apache_beam/internal/pickler.py",
line 65, in load_session return desired_pickle_lib.load_session(file_path)
^^ File
"/usr/local/lib/python3.11/site-packages/apache_beam/internal/dill_pickler.py",
line 446, in load_session return dill.load_session(file_path)
 File
"/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 368, in
load_session module = unpickler.load()  File
"/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 472, in load
obj = StockUnpickler.load(self) ^ File
"/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 827, in
_import_module return getattr(__import__(module, None, None, [obj]), obj)
^ ModuleNotFoundError: No module named
'mypackage'



On Sun, 16 Jun 2024, 14:50 XQ Hu via user,  wrote:

> What is the error message now?
> You can easily ssh to your docker container and check everything is
> installed correctly by:
> docker run --rm -it --entrypoint=/bin/bash $CUSTOM_CONTAINER_IMAGE
>
>
> On Sun, Jun 16, 2024 at 5:18 AM Sofia’s World  wrote:
>
>> Valentin, many thanks... i actually spotted the reference in teh setup
>> file
>> However , after correcting it, i am still at square 1 where somehow my
>> runtime environment does not see it.. so i added some debugging to my
>> Dockerfile to check if i forgot to copy something,
>> and here's the output, where i can see the mypackage has been copied
>>
>> here's my directory structure
>>
>>  mypackage
>> __init__.py
>> obbutils.py
>> launcher.py
>> __init__.py
>> dataflow_tester.py
>> setup_dftester.py (copied to setup.py)
>>
>> i can see the directory structure has been maintained when i copy my
>> files to docker as i added some debug to my dockerfile
>>
>> Step #0 - "dftester-image": Removing intermediate container 4c4e763289d2
>> Step #0 - "dftester-image":  ---> cda378f70a9e
>> Step #0 - "dftester-image": Step 6/23 : COPY requirements.txt .
>> Step #0 - "dftester-image":  ---> 9a43da08b013
>> Step #0 - "dftester-image": Step 7/23 : COPY setup_dftester.py setup.py
>> Step #0 - "dftester-image":  ---> 5a6bf71df052
>> Step #0 - "dftester-image": Step 8/23 : COPY dataflow_tester.py .
>> Step #0 - "dftester-image":  ---> 82cfe1f1f9ed
>> Step #0 - "dftester-image": Step 9/23 : COPY mypackage mypackage
>> Step #0 - "dftester-image":  ---> d86497b791d0
>> Step #0 - "dftester-image": Step 10/23 : COPY __init__.py
>> ${WORKDIR}/__init__.py
>> Step #0 - "dftester-image":  ---> 337d149d64c7
>> Step #0 - "dftester-image": Step 11/23 : RUN echo '- listing workdir'
>> Step #0 - "dftester-image":  ---> Running in 9d97d8a64319
>> Step #0 - "dftester-image": - listing workdir
>> Step #0 - "dftester-image": Removing intermediate container 9d97d8a64319
>> Step #0 - "dftester-image":  ---> bc9a6a2aa462
>> Step #0 - "dftester-image": Step 12/23 : RUN ls -la ${WORKDIR}
>> Step #0 - "dftester-image":  ---> Running in cf164108f9d6
>> Step #0 - "dftester-image": total 24
>> Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 .
>> Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 ..
>> Step #0 - "dftester-image": -rw-r--r-- 1 root root0 Jun 16 08:57
>> __init__.py
>> Step #0 - "dftester-image": -rw-r--r-- 1 root root  135 Jun 16 08:57
>> dataflow_tester.py
>> Step #0 - "dftester-image": drwxr-xr-x 2 root root 4096 Jun 16 08:59
>> mypackage
>> Step #0 - "dftester-image": -rw-r--r-- 1 root root   64 Jun 16 08:57
>> requirements.txt
>> Step #0 - "dftester-image": -rw-r--r-- 1 root root  736 Jun 16 08:57
>> setup.py
>> Step #0 - "dftester-image": Removing intermediate container cf164108f9d6
>> Step #0 - "dftester-image":  ---> eb1a080b7948
>> Step #0 - "dftester-image": Step 13/23 : RUN echo '--- listing modules
>> -'
>> Step #0 - "dftester-image":  ---> Running in 884f03dd8

Re: Apache Bean on GCP / Forcing to use py 3.11

2024-06-16 Thread Utkarsh Parekh
You have “mypackage” incorrectly built. Please check and confirm that.

Utkarsh

On Sun, Jun 16, 2024 at 12:48 PM Sofia’s World  wrote:

> Error is same...- see bottom -
> i have tried to ssh in the container and the directory is setup as
> expected.. so not quite sure where the issue is
> i will try to start from the pipeline with dependencies sample and work
> out from there  w.o bothering the list
>
> thanks again for following up
>  Marco
>
> Could not load main session. Inspect which external dependencies are used
> in the main module of your pipeline. Verify that corresponding packages are
> installed in the pipeline runtime environment and their installed versions
> match the versions used in pipeline submission environment. For more
> information, see:
> https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/
> Traceback (most recent call last): File
> "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
> line 115, in create_harness _load_main_session(semi_persistent_directory)
> File
> "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
> line 354, in _load_main_session pickler.load_session(session_file) File
> "/usr/local/lib/python3.11/site-packages/apache_beam/internal/pickler.py",
> line 65, in load_session return desired_pickle_lib.load_session(file_path)
> ^^ File
> "/usr/local/lib/python3.11/site-packages/apache_beam/internal/dill_pickler.py",
> line 446, in load_session return dill.load_session(file_path)
>  File
> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 368, in
> load_session module = unpickler.load()  File
> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 472, in load
> obj = StockUnpickler.load(self) ^ File
> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 827, in
> _import_module return getattr(__import__(module, None, None, [obj]), obj)
> ^ ModuleNotFoundError: No module named
> 'mypackage'
>
>
>
> On Sun, 16 Jun 2024, 14:50 XQ Hu via user,  wrote:
>
>> What is the error message now?
>> You can easily ssh to your docker container and check everything is
>> installed correctly by:
>> docker run --rm -it --entrypoint=/bin/bash $CUSTOM_CONTAINER_IMAGE
>>
>>
>> On Sun, Jun 16, 2024 at 5:18 AM Sofia’s World 
>> wrote:
>>
>>> Valentin, many thanks... i actually spotted the reference in teh setup
>>> file
>>> However , after correcting it, i am still at square 1 where somehow my
>>> runtime environment does not see it.. so i added some debugging to my
>>> Dockerfile to check if i forgot to copy something,
>>> and here's the output, where i can see the mypackage has been copied
>>>
>>> here's my directory structure
>>>
>>>  mypackage
>>> __init__.py
>>> obbutils.py
>>> launcher.py
>>> __init__.py
>>> dataflow_tester.py
>>> setup_dftester.py (copied to setup.py)
>>>
>>> i can see the directory structure has been maintained when i copy my
>>> files to docker as i added some debug to my dockerfile
>>>
>>> Step #0 - "dftester-image": Removing intermediate container 4c4e763289d2
>>> Step #0 - "dftester-image":  ---> cda378f70a9e
>>> Step #0 - "dftester-image": Step 6/23 : COPY requirements.txt .
>>> Step #0 - "dftester-image":  ---> 9a43da08b013
>>> Step #0 - "dftester-image": Step 7/23 : COPY setup_dftester.py setup.py
>>> Step #0 - "dftester-image":  ---> 5a6bf71df052
>>> Step #0 - "dftester-image": Step 8/23 : COPY dataflow_tester.py .
>>> Step #0 - "dftester-image":  ---> 82cfe1f1f9ed
>>> Step #0 - "dftester-image": Step 9/23 : COPY mypackage mypackage
>>> Step #0 - "dftester-image":  ---> d86497b791d0
>>> Step #0 - "dftester-image": Step 10/23 : COPY __init__.py
>>> ${WORKDIR}/__init__.py
>>> Step #0 - "dftester-image":  ---> 337d149d64c7
>>> Step #0 - "dftester-image": Step 11/23 : RUN echo '- listing workdir'
>>> Step #0 - "dftester-image":  ---> Running in 9d97d8a64319
>>> Step #0 - "dftester-image": - listing workdir
>>> Step #0 - "dftester-image": Removing intermediate container 9d97d8a64319
>>> Step #0 - "dftester-image":  ---> bc9a6a2aa462
>>> Step #0 - "dftester-image": Step 12/23 : RUN ls -la ${WORKDIR}
>>> Step #0 - "dftester-image":  ---> Running in cf164108f9d6
>>> Step #0 - "dftester-image": total 24
>>> Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 .
>>> Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 ..
>>> Step #0 - "dftester-image": -rw-r--r-- 1 root root0 Jun 16 08:57
>>> __init__.py
>>> Step #0 - "dftester-image": -rw-r--r-- 1 root root  135 Jun 16 08:57
>>> dataflow_tester.py
>>> Step #0 - "dftester-image": drwxr-xr-x 2 root root 4096 Jun 16 08:59
>>> mypackage
>>> Step #0 - "dftester-image": -rw-r--r-- 1 root root   64 Jun 16 08:57
>>> requirements.txt
>>> Step #0 - "dftester-image": -rw-r--r-- 1 root root  736 Jun 16 08:57
>>> s

Re: Apache Bean on GCP / Forcing to use py 3.11

2024-06-16 Thread Sofia’s World
Thanks. It appears that i did not read fully the documentation and i missed
this in my dataflow flex-template run

, '--parameters'
  , 'sdk_container_image=$_SDK_CONTAINER_IMAGE'

All my other jobs use a dodgy docker file which does not require  the
parameter above...
I should be fine for the time being,  at least my pipeline is not plagued
anymore by import errors
 thanks all for help ing out

kind regards
Marco




On Sun, Jun 16, 2024 at 6:27 PM Utkarsh Parekh 
wrote:

> You have “mypackage” incorrectly built. Please check and confirm that.
>
> Utkarsh
>
> On Sun, Jun 16, 2024 at 12:48 PM Sofia’s World 
> wrote:
>
>> Error is same...- see bottom -
>> i have tried to ssh in the container and the directory is setup as
>> expected.. so not quite sure where the issue is
>> i will try to start from the pipeline with dependencies sample and work
>> out from there  w.o bothering the list
>>
>> thanks again for following up
>>  Marco
>>
>> Could not load main session. Inspect which external dependencies are used
>> in the main module of your pipeline. Verify that corresponding packages are
>> installed in the pipeline runtime environment and their installed versions
>> match the versions used in pipeline submission environment. For more
>> information, see:
>> https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/
>> Traceback (most recent call last): File
>> "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>> line 115, in create_harness _load_main_session(semi_persistent_directory)
>> File
>> "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>> line 354, in _load_main_session pickler.load_session(session_file) File
>> "/usr/local/lib/python3.11/site-packages/apache_beam/internal/pickler.py",
>> line 65, in load_session return desired_pickle_lib.load_session(file_path)
>> ^^ File
>> "/usr/local/lib/python3.11/site-packages/apache_beam/internal/dill_pickler.py",
>> line 446, in load_session return dill.load_session(file_path)
>>  File
>> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 368, in
>> load_session module = unpickler.load()  File
>> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 472, in load
>> obj = StockUnpickler.load(self) ^ File
>> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 827, in
>> _import_module return getattr(__import__(module, None, None, [obj]), obj)
>> ^ ModuleNotFoundError: No module named
>> 'mypackage'
>>
>>
>>
>> On Sun, 16 Jun 2024, 14:50 XQ Hu via user,  wrote:
>>
>>> What is the error message now?
>>> You can easily ssh to your docker container and check everything is
>>> installed correctly by:
>>> docker run --rm -it --entrypoint=/bin/bash $CUSTOM_CONTAINER_IMAGE
>>>
>>>
>>> On Sun, Jun 16, 2024 at 5:18 AM Sofia’s World 
>>> wrote:
>>>
 Valentin, many thanks... i actually spotted the reference in teh setup
 file
 However , after correcting it, i am still at square 1 where somehow my
 runtime environment does not see it.. so i added some debugging to my
 Dockerfile to check if i forgot to copy something,
 and here's the output, where i can see the mypackage has been copied

 here's my directory structure

  mypackage
 __init__.py
 obbutils.py
 launcher.py
 __init__.py
 dataflow_tester.py
 setup_dftester.py (copied to setup.py)

 i can see the directory structure has been maintained when i copy my
 files to docker as i added some debug to my dockerfile

 Step #0 - "dftester-image": Removing intermediate container 4c4e763289d2
 Step #0 - "dftester-image":  ---> cda378f70a9e
 Step #0 - "dftester-image": Step 6/23 : COPY requirements.txt .
 Step #0 - "dftester-image":  ---> 9a43da08b013
 Step #0 - "dftester-image": Step 7/23 : COPY setup_dftester.py setup.py
 Step #0 - "dftester-image":  ---> 5a6bf71df052
 Step #0 - "dftester-image": Step 8/23 : COPY dataflow_tester.py .
 Step #0 - "dftester-image":  ---> 82cfe1f1f9ed
 Step #0 - "dftester-image": Step 9/23 : COPY mypackage mypackage
 Step #0 - "dftester-image":  ---> d86497b791d0
 Step #0 - "dftester-image": Step 10/23 : COPY __init__.py
 ${WORKDIR}/__init__.py
 Step #0 - "dftester-image":  ---> 337d149d64c7
 Step #0 - "dftester-image": Step 11/23 : RUN echo '- listing
 workdir'
 Step #0 - "dftester-image":  ---> Running in 9d97d8a64319
 Step #0 - "dftester-image": - listing workdir
 Step #0 - "dftester-image": Removing intermediate container 9d97d8a64319
 Step #0 - "dftester-image":  ---> bc9a6a2aa462
 Step #0 - "dftester-image": Step 12/23 : RUN ls -la ${WORKDIR}
 Step #0 - "dftester-image":  ---> Running in cf164108f9d6
 Step #0 - "dftester-image": total 24
>>>