Re: Apache Bean on GCP / Forcing to use py 3.11

2024-06-10 Thread Anand Inguva via user
Hello,

Are you running your pipeline from the python 3.11 environment?  If you are
running from a python 3.11 environment and don't use a custom docker
container image, DataflowRunner(Assuming Apache Beam on GCP means Apache
Beam on DataflowRunner), will use Python 3.11.

Thanks,
Anand


Re: Dataflow not able to find a module specified using extra_package

2023-12-19 Thread Anand Inguva via user
Can you try passing `extra_packages` instead of `extra_package` when
passing pipeline options as a dict?

On Tue, Dec 19, 2023 at 12:26 PM Sumit Desai via user 
wrote:

> Hi all,
> I have created a Dataflow pipeline in batch mode using Apache beam Python
> SDK. I am using one non-public dependency 'uplight-telemetry'. I have
> specified it using parameter extra_package while creating pipeline_options
> object. However, the pipeline loading is failing with an error *No module
> named 'uplight_telemetry'*.
> The code to create pipeline_options is as following-
>
> def __create_pipeline_options_dataflow(job_name):
> # Set up the Dataflow runner options
> gcp_project_id = os.environ.get(GCP_PROJECT_ID)
> current_dir = os.path.dirname(os.path.abspath(__file__))
> print("current_dir=", current_dir)
> setup_file_path = os.path.join(current_dir, '..', '..', 'setup.py')
> print("Set-up file path=", setup_file_path)
> #TODO:Move file to proper location
> uplight_telemetry_tar_file_path=os.path.join(current_dir, '..', 
> '..','..','non-public-dependencies', 'uplight-telemetry-1.0.0.tar.gz')
> # TODO:Move to environmental variables
> pipeline_options = {
> 'project': gcp_project_id,
> 'region': "us-east1",
> 'job_name': job_name,  # Provide a unique job name
> 'temp_location': 
> f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
> 'staging_location': 
> f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
> 'runner': 'DataflowRunner',
> 'save_main_session': True,
> 'service_account_email': os.environ.get(SERVICE_ACCOUNT),
> # 'network': f'projects/{gcp_project_id}/global/networks/default',
> 'subnetwork': os.environ.get(SUBNETWORK_URL),
> 'setup_file': setup_file_path,
> 'extra_package': uplight_telemetry_tar_file_path
> # 'template_location': 
> 'gcr.io/dataflow-templates-base/python310-template-launcher-base'
> }
> print("Pipeline created for job-name", job_name)
> logger.debug(f"pipeline_options created as {pipeline_options}")
> return pipeline_options
>
> Why is it not trying to install this package from extra_package?
>


Update on Protobuf and GCP packages for Apache Beam Python SDK

2023-03-15 Thread Anand Inguva via user
Hi,

For Apache Beam Python SDK, we updated the protobuf version
to 'protobuf>=4.21.1,<4.23.0' from 'protobuf>3.12.2,<4' as Protobuf had a
major upgrade in May 2022 https://protobuf.dev/news/2022-05-06/. This will
take effect on the Beam 2.47.0 release.

A tighter bound was placed on protobuf to make sure there is no unintended
behavior even for the minor releases of protobuf.

Also, we update GCP dependencies defined at
https://github.com/apache/beam/blob/14ca840f3462aa9ba3ebbe773b498d4a914d50aa/sdks/python/setup.py#L292
to support latest versions as well.

Just a heads up: If you have Apache Beam as a dependency and you have
protobuf<4 as a requirement in your package/project, please update protobuf
to 4.x.x and resolve any breaking changes/blockers related to it.

Thanks,
Anand


Re: Timeline of support for Python 3.10?

2022-11-29 Thread Anand Inguva via user
Hi,
Circling back on this.

Python 3.10 is available with Apache Beam 2.43.0[1].

[1] https://beam.apache.org/blog/beam-2.43.0/

Thanks

On Thu, Jul 21, 2022 at 5:48 PM Lina Mårtensson  wrote:

> Great, thanks!
>
> On Thu, Jul 21, 2022 at 1:07 PM Anand Inguva 
> wrote:
> >
> > Hi,
> >
> > Yes, we are in the middle of adding support for Python 3.10 to the Beam
> SDK. The ideal deadline would be to support it by the end of September.
> There are some blockers on type hints[1] that we are working on as of now.
> >
> > You can track the Python 3.10 issue here:
> https://github.com/apache/beam/issues/21585 and also WIP PR here
> https://github.com/apache/beam/pull/17700.
> >
> > Anand
> >
> > [1] https://github.com/apache/beam/issues/21671
> >
> >
> > On Fri, Jul 22, 2022 at 1:12 AM Lina Mårtensson via user <
> user@beam.apache.org> wrote:
> >>
> >> Hi Beam,
> >>
> >> We've successfully introduced Beam at our company and transitioned
> >> some of our jobs from running for a week to running for a few hours.
> >> But, our current situation is a mess where we've hacked Blaze to
> >> support both Python 3.9 (for Beam) and 3.10 (everything else), and
> >> various obstacles keep coming up over time. Once Beam works with 3.10
> >> we can go back to sanity with a single version in our repository, and
> >> accept that everything else can fall behind the latest Python version,
> >> but until then, we have a mess. We're even considering downgrading
> >> everything else in our repo to 3.9 which would probably make our
> >> current non-Beam-users unhappy (we're working on converting them,
> >> eventually! ;).
> >>
> >> So - is there any estimate on a timeline for when Beam might support
> >> Python 3.10? In the next month or two? In a year? Having some sort of
> >> estimate would make it a lot easier for us to decide what kind of
> >> effort might be worthwhile on our part.
> >>
> >> Thanks!
> >> -Lina
>


Benchmark tests for the Beam RunInference API

2022-08-16 Thread Anand Inguva via user
Hi,

I created a doc
[1]
which outlines the plan for the RunInference API[2] benchmark/performance
tests. I would appreciate feedback on the following,

   - Models used for the benchmark tests.
   - Metrics calculated as part of the benchmark tests.


If you have any inputs or any suggestions on additional metrics/models that
would be helpful for the Beam ML community as part of the benchmark tests,
please let us know.

[1]
https://docs.google.com/document/d/1xmh9D_904H-6X19Mi0-tDACwCCMvP4_MFA9QT0TOym8/edit#
[2]
 
https://github.com/apache/beam/blob/67cb87ecc2d01b88f8620ed6821bcf71376d9849/sdks/python/apache_beam/ml/inference/base.py#L269



Thanks,
Anand


Re: Timeline of support for Python 3.10?

2022-07-21 Thread Anand Inguva via user
Hi,

Yes, we are in the middle of adding support for Python 3.10 to the Beam
SDK. The ideal deadline would be to support it by the end of September.
There are some blockers on type hints[1] that we are working on as of now.

You can track the Python 3.10 issue here:
https://github.com/apache/beam/issues/21585 and also WIP PR here
https://github.com/apache/beam/pull/17700.

Anand

[1] https://github.com/apache/beam/issues/21671


On Fri, Jul 22, 2022 at 1:12 AM Lina Mårtensson via user <
user@beam.apache.org> wrote:

> Hi Beam,
>
> We've successfully introduced Beam at our company and transitioned
> some of our jobs from running for a week to running for a few hours.
> But, our current situation is a mess where we've hacked Blaze to
> support both Python 3.9 (for Beam) and 3.10 (everything else), and
> various obstacles keep coming up over time. Once Beam works with 3.10
> we can go back to sanity with a single version in our repository, and
> accept that everything else can fall behind the latest Python version,
> but until then, we have a mess. We're even considering downgrading
> everything else in our repo to 3.9 which would probably make our
> current non-Beam-users unhappy (we're working on converting them,
> eventually! ;).
>
> So - is there any estimate on a timeline for when Beam might support
> Python 3.10? In the next month or two? In a year? Having some sort of
> estimate would make it a lot easier for us to decide what kind of
> effort might be worthwhile on our part.
>
> Thanks!
> -Lina
>