Re: Apache Bean on GCP / Forcing to use py 3.11

2024-06-14 Thread Valentyn Tymofieiev via user
I recommend to put all top-level dependencies for your pipeline in setup.py
install_requires section, and autogenerate the requirements.txt, which
would then include all transitive dependencies and ensure reproducible
builds.

For approaches to generate the requirements.txt file from top level
requirements specified in the setup.py file, see:
https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies#optional-update-the-dependencies-in-the-requirements-file-and-rebuild-the-docker-images
.

Valentyn

On Thu, Jun 13, 2024 at 9:52 PM Sofia’s World  wrote:

> Many thanks Hu, worked like a charm
>
> few qq
> so in my reqs.txt i should put all beam requirements PLUS my own?
>
> and in the setup.py, shall i just declare
>
> "apache-beam[gcp]==2.54.0",  # Must match the version in `Dockerfile``.
>
> thanks and kind regards
> Marco
>
>
>
>
>
>
> On Wed, Jun 12, 2024 at 1:48 PM XQ Hu  wrote:
>
>> Any reason to use this?
>>
>> RUN pip install avro-python3 pyarrow==0.15.1 apache-beam[gcp]==2.30.0
>>  pandas-datareader==0.9.0
>>
>> It is typically recommended to use the latest Beam and build the docker
>> image using the requirements released for each Beam, for example,
>> https://github.com/apache/beam/blob/release-2.56.0/sdks/python/container/py311/base_image_requirements.txt
>>
>> On Wed, Jun 12, 2024 at 1:31 AM Sofia’s World 
>> wrote:
>>
>>> Sure, apologies, it crossed my mind it would have been useful to refert
>>> to it
>>>
>>> so this is the docker file
>>>
>>>
>>> https://github.com/mmistroni/GCP_Experiments/edit/master/dataflow/shareloader/Dockerfile_tester
>>>
>>> I was using a setup.py as well, but then i commented out the usage in
>>> the dockerfile after checking some flex templates which said it is not
>>> needed
>>>
>>>
>>> https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/shareloader/setup_dftester.py
>>>
>>> thanks in advance
>>>  Marco
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jun 11, 2024 at 10:54 PM XQ Hu  wrote:
>>>
 Can you share your Dockerfile?

 On Tue, Jun 11, 2024 at 4:43 PM Sofia’s World 
 wrote:

> thanks all,  it seemed to work but now i am getting a different
> problem, having issues in building pyarrow...
>
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
> :36: DeprecationWarning: pkg_resources is deprecated as an API. 
> See https://setuptools.pypa.io/en/latest/pkg_resources.html
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
> WARNING setuptools_scm.pyproject_reading toml section missing 
> 'pyproject.toml does not contain a tool.setuptools_scm section'
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
> Traceback (most recent call last):
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
>   File 
> "/tmp/pip-build-env-meihcxsp/overlay/lib/python3.11/site-packages/setuptools_scm/_integration/pyproject_reading.py",
>  line 36, in read_pyproject
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
> section = defn.get("tool", {})[tool_name]
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
>   ^^^
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
> KeyError: 'setuptools_scm'
> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":   
> running bdist_wheel
>
>
>
>
> It is somehow getting messed up with a toml ?
>
>
> Could anyone advise?
>
> thanks
>
>  Marco
>
>
>
>
>
> On Tue, Jun 11, 2024 at 1:00 AM XQ Hu via user 
> wrote:
>
>>
>> https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies
>> is a great example.
>>
>> On Mon, Jun 10, 2024 at 4:28 PM Valentyn Tymofieiev via user <
>> user@beam.apache.org> wrote:
>>
>>> In this case the Python version will be defined by the Python
>>> version installed in the docker image of your flex template. So, you'd
>>> have to build your flex template from a base image with Python 3.11.
>>>
>>> On Mon, Jun 10, 2024 at 12:50 PM Sofia’s World 
>>> wrote:
>>>
 Hello
  no i am running my pipelien on  GCP directly via a flex template,
 configured using a Docker file
 Any chances to do something in the Dockerfile to force the version
 at runtime?
 Thanks

 On Mon, Jun 10, 2024 at 7:24 PM Anand Inguva via user <
 user@beam.apache.org> wrote:

> Hello,
>
> Are you running your pipeline from the python 3.11 environment?
> If you are running from a python 3.11 environment and don't use a 
> custom
> docker c

Re: How windowing is implemented on Flink runner

2024-06-14 Thread Wiśniowski Piotr

Hi,

Wanted to follow up as I did have similar case.

So this means it is ok for Beam to use Sliding window of 1 day with 1 
sec period (with using different trigger than after watermark to avoid 
outputting data from every window) and there is no additional 
performance penalty (duplicating input messages for storage or cpu for 
resolving windows)? Interesting from both Flink and Dataflow perspective 
(both Python and Java).


I ended up implementing the logic with Beam state and timers (which is 
quite performant and readable) but also interested in other possibilities.


Best

Wiśniowski Piort

On 12.06.2024 21:50, Ruben Vargas wrote:

I imagined it but wasn't sure!

Thanks for the clarification!

On Wed, Jun 12, 2024 at 1:42 PM Robert Bradshaw via user
 wrote:

Beam implements Windowing itself (via state and timers) rather than
deferring to Flink's implementation.

On Wed, Jun 12, 2024 at 11:55 AM Ruben Vargas  wrote:

Hello guys

May be a silly question,

But in the Flink runner, the window implementation uses the Flink
windowing? Does that mean the runner will have performance issues like
Flink itself? see this:
https://issues.apache.org/jira/browse/FLINK-7001

I'm asking because I see the issue, it mentions different concepts
that Beam already handles at the API level. So my suspicion is that
the Beam model handles windowing a little differently from the pure
Flink app. But I'm not sure..


Regards.