[DISCUSS] Dependency management in Apache Beam Python SDK

2022-08-23 Thread Valentyn Tymofieiev via dev
Hi everyone,

Recently, several issues [1-3]  have highlighted outage risks and developer
inconveniences due to  dependency management practices in Beam Python.

With dependabot and other tooling  that we have integrated with Beam, one
of the missing pieces seems to be having a clear guideline of how we should
be specifying requirements for our dependencies and when and how we should
be updating them to have a sustainable process.

As a conversation starter, I put together a retrospective
[4]
covering a recent incident and would like to get community opinions on the
open questions.

In particular, if you have experience managing dependencies for other
Python libraries with rich dependency chains, knowledge of available
tooling or first hand experience dealing with other dependency issues in
Beam, your input would be greatly appreciated.

Thanks,
Valentyn

[1] https://github.com/apache/beam/issues/22218
[2] https://github.com/apache/beam/pull/22550#issuecomment-1217348455
[3] https://github.com/apache/beam/issues/22533
[4]
https://docs.google.com/document/d/1gxQF8mciRYgACNpCy1wlR7TBa8zN-Tl6PebW-U8QvBk/edit


Re: 2.41.0 Release PMC Finalization

2022-08-23 Thread Chamikara Jayalath via dev
Should be done.

Thanks,
Cham

On Tue, Aug 23, 2022 at 4:53 PM Chamikara Jayalath 
wrote:

> I can look into it.
>
> On Tue, Aug 23, 2022 at 2:02 PM Kiley Sok via dev 
> wrote:
>
>> Could a PMC member help with the PMC-only finalization steps for 2.41.0
>> [1]? Specifically:
>>
>> - Deploy source release to dist.apache.org
>> - Recordkeeping with ASF
>>
>> Once those steps are done all that's left is to promote the release [2].
>>
>> Thank you!
>> Kiley
>>
>> [1] https://beam.apache.org/contribute/release-guide/#pmc
>> -only-finalization
>> [2]
>> https://beam.apache.org/contribute/release-guide/#12-promote-the-release
>>
>


Re: 2.41.0 Release PMC Finalization

2022-08-23 Thread Chamikara Jayalath via dev
I can look into it.

On Tue, Aug 23, 2022 at 2:02 PM Kiley Sok via dev 
wrote:

> Could a PMC member help with the PMC-only finalization steps for 2.41.0
> [1]? Specifically:
>
> - Deploy source release to dist.apache.org
> - Recordkeeping with ASF
>
> Once those steps are done all that's left is to promote the release [2].
>
> Thank you!
> Kiley
>
> [1] https://beam.apache.org/contribute/release-guide/#pmc
> -only-finalization
> [2]
> https://beam.apache.org/contribute/release-guide/#12-promote-the-release
>


2.41.0 Release PMC Finalization

2022-08-23 Thread Kiley Sok via dev
Could a PMC member help with the PMC-only finalization steps for 2.41.0
[1]? Specifically:

- Deploy source release to dist.apache.org
- Recordkeeping with ASF

Once those steps are done all that's left is to promote the release [2].

Thank you!
Kiley

[1] https://beam.apache.org/contribute/release-guide/#pmc-only-finalization
[2] https://beam.apache.org/contribute/release-guide/#12-promote-the-release


Re: Java & Python Project Starter / Example Using Terraform to build Dataflow Custom Templates

2022-08-23 Thread Kenneth Knowles
Nice!

On Tue, Aug 23, 2022 at 11:00 AM Ahmet Altay via dev 
wrote:

> Thank you for sharing Damon.
>
> On Tue, Aug 23, 2022 at 9:34 AM Damon Douglas via dev 
> wrote:
>
>> Hello Everyone,
>>
>> You can ignore this email if you do not use the Dataflow runner.
>>
>> We just published an example / project starter that shows how to build
>> Dataflow Custom Templates provisioned using terraform on Cloud Build.  The
>> README also links to an "open in cloud shell walkthrough" that allows one
>> to apply the terraform modules without installing or downloading any code
>> on your local machine.
>>
>>
>> https://github.com/GoogleCloudPlatform/professional-services/tree/main/examples/dataflow-custom-templates
>>
>> Best,
>>
>> Damon
>>
>> --
>>
>>
>> *Damon Douglas*
>>
>> Strategic Cloud Engineer, Data & Analytics, Google Cloud
>>
>> damondoug...@google.com
>>
>


Re: Representation of logical type beam:logical_type:datetime:v1

2022-08-23 Thread Yi Hu via dev
Hi,

It now appears that if we want a clean solution then we have to add a fixed
size primitive type to Beam atomic types. Or, we then have a millis_instant
logical type that does not have to_language_type and to_representation_type
implementations. Any suggestions are welcome!

Best,
Yi

On Thu, Aug 18, 2022 at 10:07 AM Yi Hu  wrote:

>
>
> On Wed, Aug 17, 2022 at 5:14 PM Chamikara Jayalath 
> wrote:
>
>>
>> I think this is fine (even though it would add a small perf hit to
>> JdbcIO.Read). We also probably should make this conversion a utility method
>> that can be used elsewhere when we need to encode datetime fields.
>> We should also document that "beam:logical_type:datetime:v1" is not
>> portable (till we fix the incompatibility).
>>
>>
> +1 for utility method and documentation.
> If we were to change JDBC instead of make  millis_instant compatible to
> InstantCoder, this would only fix JDBC cross-language timestamps. I expect
> for other IO connectors this is still a problem and that is why I would
> like to take a generic approach. In general, inside each sdk we would like
> to follow the language specific convention of that sdk. I remember a
> related  discussion about the timestamp types:
> https://github.com/apache/beam/pull/17380#discussion_r852422314 which
> reached a conclusion that follows the language convention on timestamp
> values, e.g. use milli precision (long backed) Instant in Java; micro
> precision (float backed) timestamp in python.
>
> Best,
> Yi
>


[GitHub] [beam-site] kileys merged pull request #633: Publish 2.41.0 release

2022-08-23 Thread GitBox


kileys merged PR #633:
URL: https://github.com/apache/beam-site/pull/633


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Fwd: New mailing list queued for creation: stackoverf...@beam.apache.org

2022-08-23 Thread Kenneth Knowles
FYI

-- Forwarded message -
From: ASF Self-Service Platform 
Date: Tue, Aug 23, 2022 at 11:40 AM
Subject: New mailing list queued for creation: stackoverf...@beam.apache.org
To: , ASF Infrastructure 



Hi there,
As requested by k...@apache.org, a new mailing list has been queued for
creation:
stackoverf...@beam.apache.org


This request will automatically be processed within 24 hours.


With regards,
ASF Self-Service Platform, https://selfserve.apache.org
For inquiries, please contact: us...@infra.apache.org


Re: Java & Python Project Starter / Example Using Terraform to build Dataflow Custom Templates

2022-08-23 Thread Ahmet Altay via dev
Thank you for sharing Damon.

On Tue, Aug 23, 2022 at 9:34 AM Damon Douglas via dev 
wrote:

> Hello Everyone,
>
> You can ignore this email if you do not use the Dataflow runner.
>
> We just published an example / project starter that shows how to build
> Dataflow Custom Templates provisioned using terraform on Cloud Build.  The
> README also links to an "open in cloud shell walkthrough" that allows one
> to apply the terraform modules without installing or downloading any code
> on your local machine.
>
>
> https://github.com/GoogleCloudPlatform/professional-services/tree/main/examples/dataflow-custom-templates
>
> Best,
>
> Damon
>
> --
>
>
> *Damon Douglas*
>
> Strategic Cloud Engineer, Data & Analytics, Google Cloud
>
> damondoug...@google.com
>


[RESULT] [VOTE] Release 2.41.0, release candidate #2

2022-08-23 Thread Kiley Sok via dev
I'm happy to announce that we have unanimously approved this release.

There are 5 approving votes, 3 of which are binding:
* Ahmet Altay
* Alexey Romanenko
* Chamikara Jayalath

There are no disapproving votes.

Thanks everyone!


Java & Python Project Starter / Example Using Terraform to build Dataflow Custom Templates

2022-08-23 Thread Damon Douglas via dev
Hello Everyone,

You can ignore this email if you do not use the Dataflow runner.

We just published an example / project starter that shows how to build
Dataflow Custom Templates provisioned using terraform on Cloud Build.  The
README also links to an "open in cloud shell walkthrough" that allows one
to apply the terraform modules without installing or downloading any code
on your local machine.

https://github.com/GoogleCloudPlatform/professional-services/tree/main/examples/dataflow-custom-templates

Best,

Damon

-- 


*Damon Douglas*

Strategic Cloud Engineer, Data & Analytics, Google Cloud

damondoug...@google.com


Re: Benchmark tests for the Beam RunInference API

2022-08-23 Thread Andy Ye via dev
Appreciate the super clear summary of the different benchmark experiments!
This will add lots of value to potential users, especially when we
integrate GPU benchmarks. Thanks Anand!

Best,
Andy

On Thu, Aug 18, 2022 at 10:22 AM Danny McCormick via dev <
dev@beam.apache.org> wrote:

> I left a few comments, but overall this sounds like a good plan to me -
> thanks for the writeup!
>
> On Tue, Aug 16, 2022 at 9:36 AM Anand Inguva via dev 
> wrote:
>
>> Hi,
>>
>> I created a doc
>> [1]
>> which outlines the plan for the RunInference API[2] benchmark/performance
>> tests. I would appreciate feedback on the following,
>>
>>- Models used for the benchmark tests.
>>- Metrics calculated as part of the benchmark tests.
>>
>>
>> If you have any inputs or any suggestions on additional metrics/models
>> that would be helpful for the Beam ML community as part of the benchmark
>> tests, please let us know.
>>
>> [1]
>> https://docs.google.com/document/d/1xmh9D_904H-6X19Mi0-tDACwCCMvP4_MFA9QT0TOym8/edit#
>> [2]
>>  
>> https://github.com/apache/beam/blob/67cb87ecc2d01b88f8620ed6821bcf71376d9849/sdks/python/apache_beam/ml/inference/base.py#L269
>> 
>>
>>
>> Thanks,
>> Anand
>>
>


Beam High Priority Issue Report (72)

2022-08-23 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/22808 [Bug]: Dataflow job stuckness 
related to big query storage api
https://github.com/apache/beam/issues/22779 [Bug]: SpannerIO.readChangeStream() 
stops forwarding change records and starts continuously throwing (large number) 
of Operation ongoing errors 
https://github.com/apache/beam/issues/22773 [Bug]: ElasticsearchIO.Write fails 
when calling outputWithTimestamp()
https://github.com/apache/beam/issues/22749 [Bug]: Bytebuddy version update 
causes Invisible parameter type error
https://github.com/apache/beam/issues/22743 [Bug]: Test flake: 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImplTest.testInsertWithinRowCountLimits
https://github.com/apache/beam/issues/22642 [Bug]: Dataflow fails to drain a 
job when using BigQuery (java sdk v.2.38)
https://github.com/apache/beam/issues/22440 [Bug]: Python Batch Dataflow 
SideInput LoadTests failing
https://github.com/apache/beam/issues/22321 
PortableRunnerTestWithExternalEnv.test_pardo_large_input is regularly failing 
on jenkins
https://github.com/apache/beam/issues/22303 [Task]: Add tests to Kafka SDF and 
fix known and discovered issues
https://github.com/apache/beam/issues/22299 [Bug]: JDBCIO Write freeze at 
getConnection() in WriteFn
https://github.com/apache/beam/issues/22283 [Bug]: Python Lots of fn runner 
test items cost exactly 5 seconds to run
https://github.com/apache/beam/issues/21794 Dataflow runner creates a new timer 
whenever the output timestamp is change
https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output 
to Failed Inserts PCollection
https://github.com/apache/beam/issues/21704 beam_PostCommit_Java_DataflowV2 
failures parent bug
https://github.com/apache/beam/issues/21703 pubsublite.ReadWriteIT failing in 
beam_PostCommit_Java_DataflowV1 and V2
https://github.com/apache/beam/issues/21702 SpannerWriteIT failing in beam 
PostCommit Java V1
https://github.com/apache/beam/issues/21701 beam_PostCommit_Java_DataflowV1 
failing with a variety of flakes and errors
https://github.com/apache/beam/issues/21700 
--dataflowServiceOptions=use_runner_v2 is broken
https://github.com/apache/beam/issues/21696 Flink Tests failure :  
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.beam.runners.core.construction.SerializablePipelineOptions 
https://github.com/apache/beam/issues/21695 DataflowPipelineResult does not 
raise exception for unsuccessful states.
https://github.com/apache/beam/issues/21694 BigQuery Storage API insert with 
writeResult retry and write to error table
https://github.com/apache/beam/issues/21480 flake: 
FlinkRunnerTest.testEnsureStdoutStdErrIsRestored
https://github.com/apache/beam/issues/21472 Dataflow streaming tests failing 
new AfterSynchronizedProcessingTime test
https://github.com/apache/beam/issues/21471 Flakes: Failed to load cache entry
https://github.com/apache/beam/issues/21470 Test flake: test_split_half_sdf
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21468 
beam_PostCommit_Python_Examples_Dataflow failing
https://github.com/apache/beam/issues/21467 GBK and CoGBK streaming Java load 
tests failing
https://github.com/apache/beam/issues/21465 Kafka commit offset drop data on 
failure for runners that have non-checkpointing shuffle
https://github.com/apache/beam/issues/21463 NPE in Flink Portable 
ValidatesRunner streaming suite
https://github.com/apache/beam/issues/21462 Flake in 
org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use
https://github.com/apache/beam/issues/21271 pubsublite.ReadWriteIT flaky in 
beam_PostCommit_Java_DataflowV2  
https://github.com/apache/beam/issues/21270 
org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView
 flaky on Dataflow Runner V2
https://github.com/apache/beam/issues/21268 Race between member variable being 
accessed due to leaking uninitialized state via OutboundObserverFactory
https://github.com/apache/beam/issues/21267 WriteToBigQuery submits a duplicate 
BQ load job if a 503 error code is returned from googleapi
https://github.com/apache/beam/issues/21266 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
 is flaky in Java ValidatesRunner Flink suite.
https://github.com/apache/beam/issues/21265 
apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
https://github.com/apache/beam/issues/21263 (Broken Pipe induced) Bricked 
Dataflow Pipeline 
https://github.com/apache/beam/issues/21262 Python AfterAny,