Re: Extending 2.41.0 Java snapshot TTL

2022-07-21 Thread Evan Galpin
Admittedly this is potentially self-serving, but I feel there could be
mutual benefit.

I have a similar situation where I want to use pre-release version of
beam-sdks-java-io-google-cloud-platform. Though I’ve been having trouble
doing so, a possible alternative solution to using the nightly snapshots
might be building beam-sdks-java-io-google-cloud-platform from source and
including the resulting jar as part of the pipeline deployment. I’ve
successfully done this for direct runner, but not Dataflow runner.

Perhaps some others on the thread might be able to shed light on this
technique (only if applicable to solving the original problem, as I don’t
intend to thread-hijack).

- Evan

On Thu, Jul 21, 2022 at 19:45 Byron Ellis via dev 
wrote:

> I think you could change the TTL on the Jenkins side (That sound right to
> you Danny?) but I'm not sure we could preserve a specific snapshot without
> keeping all of them...
>
> On Thu, Jul 21, 2022 at 4:16 PM Ahmet Altay  wrote:
>
>> Thank you for the email Daniel.
>>
>> Adding people who could help: @Kenneth Knowles  @Danny
>> McCormick  @Chamikara Jayalath
>>  @John Casey  @Byron Ellis
>> 
>>
>> On Thu, Jul 21, 2022, 4:14 PM Daniel Thevessen via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hi all,
>>>
>>> The Java Firestore connector had a bug
>>>  recently that needed to
>>> be fixed. Until the fix can be released as part of 2.41.0, we need one of
>>> the daily snapshot builds
>>> 
>>> as a safe version to use. This is the one from Jul 15
>>> (2.41.0-20220715.201105-31), which we have checked to be working correctly
>>> and some have already switched to.
>>> Unfortunately it looks like these builds get cleared out after a while
>>> for storage reasons. Would it be possible to extend the TTL on just that
>>> build, at least until 2.41.0 is released? I'm guessing this would just be a
>>> settings change for whoever owns the Snapshots repository.
>>> The change is in beam-sdks-java-io-google-cloud-platform, but I'm
>>> fairly certain the other Java packages need to be kept as well for
>>> compatibility.
>>>
>>> This is relatively urgent, it looks like the TTL might be weekly so that
>>> build will be deleted on Saturday.
>>>
>>> Thanks,
>>> Daniel Thevessen
>>>
>>>
>>>


Re: Extending 2.41.0 Java snapshot TTL

2022-07-21 Thread Byron Ellis via dev
I think you could change the TTL on the Jenkins side (That sound right to
you Danny?) but I'm not sure we could preserve a specific snapshot without
keeping all of them...

On Thu, Jul 21, 2022 at 4:16 PM Ahmet Altay  wrote:

> Thank you for the email Daniel.
>
> Adding people who could help: @Kenneth Knowles  @Danny
> McCormick  @Chamikara Jayalath
>  @John Casey  @Byron Ellis
> 
>
> On Thu, Jul 21, 2022, 4:14 PM Daniel Thevessen via dev <
> dev@beam.apache.org> wrote:
>
>> Hi all,
>>
>> The Java Firestore connector had a bug
>>  recently that needed to be
>> fixed. Until the fix can be released as part of 2.41.0, we need one of the 
>> daily
>> snapshot builds
>> 
>> as a safe version to use. This is the one from Jul 15
>> (2.41.0-20220715.201105-31), which we have checked to be working correctly
>> and some have already switched to.
>> Unfortunately it looks like these builds get cleared out after a while
>> for storage reasons. Would it be possible to extend the TTL on just that
>> build, at least until 2.41.0 is released? I'm guessing this would just be a
>> settings change for whoever owns the Snapshots repository.
>> The change is in beam-sdks-java-io-google-cloud-platform, but I'm fairly
>> certain the other Java packages need to be kept as well for compatibility.
>>
>> This is relatively urgent, it looks like the TTL might be weekly so that
>> build will be deleted on Saturday.
>>
>> Thanks,
>> Daniel Thevessen
>>
>>
>>


Re: Extending 2.41.0 Java snapshot TTL

2022-07-21 Thread Ahmet Altay via dev
Thank you for the email Daniel.

Adding people who could help: @Kenneth Knowles  @Danny
McCormick  @Chamikara Jayalath
 @John Casey  @Byron Ellis


On Thu, Jul 21, 2022, 4:14 PM Daniel Thevessen via dev 
wrote:

> Hi all,
>
> The Java Firestore connector had a bug
>  recently that needed to be
> fixed. Until the fix can be released as part of 2.41.0, we need one of the 
> daily
> snapshot builds
> 
> as a safe version to use. This is the one from Jul 15
> (2.41.0-20220715.201105-31), which we have checked to be working correctly
> and some have already switched to.
> Unfortunately it looks like these builds get cleared out after a while for
> storage reasons. Would it be possible to extend the TTL on just that build,
> at least until 2.41.0 is released? I'm guessing this would just be a
> settings change for whoever owns the Snapshots repository.
> The change is in beam-sdks-java-io-google-cloud-platform, but I'm fairly
> certain the other Java packages need to be kept as well for compatibility.
>
> This is relatively urgent, it looks like the TTL might be weekly so that
> build will be deleted on Saturday.
>
> Thanks,
> Daniel Thevessen
>
>
>


Extending 2.41.0 Java snapshot TTL

2022-07-21 Thread Daniel Thevessen via dev
Hi all,

The Java Firestore connector had a bug
 recently that needed to be
fixed. Until the fix can be released as part of 2.41.0, we need one of
the daily
snapshot builds

as a safe version to use. This is the one from Jul 15
(2.41.0-20220715.201105-31), which we have checked to be working correctly
and some have already switched to.
Unfortunately it looks like these builds get cleared out after a while for
storage reasons. Would it be possible to extend the TTL on just that build,
at least until 2.41.0 is released? I'm guessing this would just be a
settings change for whoever owns the Snapshots repository.
The change is in beam-sdks-java-io-google-cloud-platform, but I'm fairly
certain the other Java packages need to be kept as well for compatibility.

This is relatively urgent, it looks like the TTL might be weekly so that
build will be deleted on Saturday.

Thanks,
Daniel Thevessen


Re: [Dataflow][Guidance] Replacing beam-sdks-java-io-google-cloud-platform with local jar

2022-07-21 Thread Evan Galpin
Thanks Tomo, I'll check that out too as a good safeguard!  Are you familiar
with any process to build pre-release artifacts?  I suppose that's really
what I'm after is building a pre-release version of pubsubIO to validate in
Dataflow.

- Evan


On Thu, Jul 21, 2022 at 4:21 PM Tomo Suzuki via dev 
wrote:

> I don't come up with a solution (I'm not familiar with the method
> you're using). However I often use "getProtectionDomain()"
> https://stackoverflow.com/a/56000383/975074 to find the JAR file from a
> class. This ensures the class you modified is actually used.
>
> On Thu, Jul 21, 2022 at 3:35 PM Evan Galpin  wrote:
>
>> Spoke too soon... still can't seem to get the new behaviour to appear in
>> dataflow, possibly something is being overridden?
>>
>> On Thu, Jul 21, 2022 at 3:15 PM Evan Galpin  wrote:
>>
>>> Making a shadowJar from "beam-sdks-java-io-google-cloud-platform" looks
>>> to be working. Added `  id 'com.github.johnrengelman.shadow'` to
>>> `build.gradle` for "beam-sdks-java-io-google-cloud-platform" in the beam
>>> source and used the resulting jar as a dependency replacement when
>>> deploying the job to dataflow.  Looks ok.
>>>
>>> On Thu, Jul 21, 2022 at 3:02 PM Evan Galpin  wrote:
>>>
 I believe I have the dependencySubstitution working, but it seems as
 though the substitution is removing transitive deps of
 "beam-sdks-java-io-google-cloud-platform", hmm...

 On Thu, Jul 21, 2022 at 1:15 PM Evan Galpin  wrote:

> Hi all,
>
> I'm trying to test a change I've made locally, but by validating it on
> Dataflow.  It works locally, but I want to validate on Dataflow.  I've
> tried a few different attempts at module substitution in the build.gradle
> config file for the pipeline I'm trying to deploy, but I haven't had any
> success yet.
>
> How might I be able to replace the
> beam-sdks-java-io-google-cloud-platform module usually installed from 
> maven
> with a local jar generated from running:
>
> "./gradlew :sdk:java:io:google-cloud-platform:jar"
>
> Thanks,
> Evan
>

>
> --
> Regards,
> Tomo
>


Re: [Dataflow][Guidance] Replacing beam-sdks-java-io-google-cloud-platform with local jar

2022-07-21 Thread Tomo Suzuki via dev
I don't come up with a solution (I'm not familiar with the method
you're using). However I often use "getProtectionDomain()"
https://stackoverflow.com/a/56000383/975074 to find the JAR file from a
class. This ensures the class you modified is actually used.

On Thu, Jul 21, 2022 at 3:35 PM Evan Galpin  wrote:

> Spoke too soon... still can't seem to get the new behaviour to appear in
> dataflow, possibly something is being overridden?
>
> On Thu, Jul 21, 2022 at 3:15 PM Evan Galpin  wrote:
>
>> Making a shadowJar from "beam-sdks-java-io-google-cloud-platform" looks
>> to be working. Added `  id 'com.github.johnrengelman.shadow'` to
>> `build.gradle` for "beam-sdks-java-io-google-cloud-platform" in the beam
>> source and used the resulting jar as a dependency replacement when
>> deploying the job to dataflow.  Looks ok.
>>
>> On Thu, Jul 21, 2022 at 3:02 PM Evan Galpin  wrote:
>>
>>> I believe I have the dependencySubstitution working, but it seems as
>>> though the substitution is removing transitive deps of
>>> "beam-sdks-java-io-google-cloud-platform", hmm...
>>>
>>> On Thu, Jul 21, 2022 at 1:15 PM Evan Galpin  wrote:
>>>
 Hi all,

 I'm trying to test a change I've made locally, but by validating it on
 Dataflow.  It works locally, but I want to validate on Dataflow.  I've
 tried a few different attempts at module substitution in the build.gradle
 config file for the pipeline I'm trying to deploy, but I haven't had any
 success yet.

 How might I be able to replace the
 beam-sdks-java-io-google-cloud-platform module usually installed from maven
 with a local jar generated from running:

 "./gradlew :sdk:java:io:google-cloud-platform:jar"

 Thanks,
 Evan

>>>

-- 
Regards,
Tomo


Re: [Dataflow][Guidance] Replacing beam-sdks-java-io-google-cloud-platform with local jar

2022-07-21 Thread Evan Galpin
Spoke too soon... still can't seem to get the new behaviour to appear in
dataflow, possibly something is being overridden?

On Thu, Jul 21, 2022 at 3:15 PM Evan Galpin  wrote:

> Making a shadowJar from "beam-sdks-java-io-google-cloud-platform" looks to
> be working. Added `  id 'com.github.johnrengelman.shadow'` to
> `build.gradle` for "beam-sdks-java-io-google-cloud-platform" in the beam
> source and used the resulting jar as a dependency replacement when
> deploying the job to dataflow.  Looks ok.
>
> On Thu, Jul 21, 2022 at 3:02 PM Evan Galpin  wrote:
>
>> I believe I have the dependencySubstitution working, but it seems as
>> though the substitution is removing transitive deps of
>> "beam-sdks-java-io-google-cloud-platform", hmm...
>>
>> On Thu, Jul 21, 2022 at 1:15 PM Evan Galpin  wrote:
>>
>>> Hi all,
>>>
>>> I'm trying to test a change I've made locally, but by validating it on
>>> Dataflow.  It works locally, but I want to validate on Dataflow.  I've
>>> tried a few different attempts at module substitution in the build.gradle
>>> config file for the pipeline I'm trying to deploy, but I haven't had any
>>> success yet.
>>>
>>> How might I be able to replace the
>>> beam-sdks-java-io-google-cloud-platform module usually installed from maven
>>> with a local jar generated from running:
>>>
>>> "./gradlew :sdk:java:io:google-cloud-platform:jar"
>>>
>>> Thanks,
>>> Evan
>>>
>>


Re: [Dataflow][Guidance] Replacing beam-sdks-java-io-google-cloud-platform with local jar

2022-07-21 Thread Evan Galpin
Making a shadowJar from "beam-sdks-java-io-google-cloud-platform" looks to
be working. Added `  id 'com.github.johnrengelman.shadow'` to
`build.gradle` for "beam-sdks-java-io-google-cloud-platform" in the beam
source and used the resulting jar as a dependency replacement when
deploying the job to dataflow.  Looks ok.

On Thu, Jul 21, 2022 at 3:02 PM Evan Galpin  wrote:

> I believe I have the dependencySubstitution working, but it seems as
> though the substitution is removing transitive deps of
> "beam-sdks-java-io-google-cloud-platform", hmm...
>
> On Thu, Jul 21, 2022 at 1:15 PM Evan Galpin  wrote:
>
>> Hi all,
>>
>> I'm trying to test a change I've made locally, but by validating it on
>> Dataflow.  It works locally, but I want to validate on Dataflow.  I've
>> tried a few different attempts at module substitution in the build.gradle
>> config file for the pipeline I'm trying to deploy, but I haven't had any
>> success yet.
>>
>> How might I be able to replace the
>> beam-sdks-java-io-google-cloud-platform module usually installed from maven
>> with a local jar generated from running:
>>
>> "./gradlew :sdk:java:io:google-cloud-platform:jar"
>>
>> Thanks,
>> Evan
>>
>


Re: [Dataflow][Guidance] Replacing beam-sdks-java-io-google-cloud-platform with local jar

2022-07-21 Thread Evan Galpin
I believe I have the dependencySubstitution working, but it seems as though
the substitution is removing transitive deps of
"beam-sdks-java-io-google-cloud-platform", hmm...

On Thu, Jul 21, 2022 at 1:15 PM Evan Galpin  wrote:

> Hi all,
>
> I'm trying to test a change I've made locally, but by validating it on
> Dataflow.  It works locally, but I want to validate on Dataflow.  I've
> tried a few different attempts at module substitution in the build.gradle
> config file for the pipeline I'm trying to deploy, but I haven't had any
> success yet.
>
> How might I be able to replace the beam-sdks-java-io-google-cloud-platform
> module usually installed from maven with a local jar generated from
> running:
>
> "./gradlew :sdk:java:io:google-cloud-platform:jar"
>
> Thanks,
> Evan
>


Re: [ANNOUNCE] New committer: Steven Niemitz

2022-07-21 Thread Steve Niemitz via dev
Thanks everyone!

On Thu, Jul 21, 2022 at 2:23 AM Moritz Mack  wrote:

> Congrats, Steven!
>
>
>
> On 21.07.22, 05:25, "Evan Galpin"  wrote:
>
>
>
> Congrats! Well deserved! On Wed, Jul 20, 2022 at 15:⁠​17 Chamikara
> Jayalath via dev  wrote:⁠​ Congrats, Steve! On
> Wed, Jul 20, 2022, 9:⁠​16 AM Austin Bennett 
> wrote:⁠​ Great!   ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍
>
> ZjQcmQRYFpfptBannerStart
>
> ZjQcmQRYFpfptBannerEnd
>
> Congrats! Well deserved!
>
>
>
> On Wed, Jul 20, 2022 at 15:17 Chamikara Jayalath via dev <
> dev@beam.apache.org> wrote:
>
> Congrats, Steve!
>
>
>
> On Wed, Jul 20, 2022, 9:16 AM Austin Bennett 
> wrote:
>
> Great!
>
>
>
> On Wed, Jul 20, 2022 at 10:11 AM Aizhamal Nurmamat kyzy <
> aizha...@apache.org> wrote:
>
> Congrats, Steve!
>
>
>
> On Wed, Jul 20, 2022 at 3:10 AM Jan Lukavský  wrote:
>
> Congrats Steve!
>
> On 7/20/22 06:20, Reuven Lax via dev wrote:
>
> Welcome Steve!
>
>
>
> On Tue, Jul 19, 2022 at 1:05 PM Connell O'Callaghan via dev <
> dev@beam.apache.org> wrote:
>
>
> +++1 Woohoo! Congratulations Steven (and to the BEAM community) on this
> announcement!!!
>
>
>
> Thank you Luke for this update
>
>
>
>
>
> On Tue, Jul 19, 2022 at 12:34 PM Robert Burke  wrote:
>
> Woohoo! Welcome and congratulations Steven!
>
>
>
> On Tue, Jul 19, 2022, 12:40 PM Luke Cwik via dev 
> wrote:
>
> Hi all,
>
> Please join me and the rest of the Beam PMC in welcoming a new committer:
> Steven Niemitz (sniemitz@)
>
> Steven started contributing to Beam in 2017 fixing bugs and improving
> logging and usability. Stevens most recent focus has been on performance
> optimizations within the Java SDK.
>
>
>
> Considering the time span and number of contributions, the Beam PMC trusts
> Steven with the responsibilities of a Beam committer. [1]
>
>
> Thank you Steven! And we are looking to see more of your contributions!
>
>
>
> Luke, on behalf of the Apache Beam PMC
>
> [1]
> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
> 
>
> *As a recipient of an email from Talend, your contact personal data will
> be on our systems. Please see our privacy notice.
> *
>
>
>


[Dataflow][Guidance] Replacing beam-sdks-java-io-google-cloud-platform with local jar

2022-07-21 Thread Evan Galpin
Hi all,

I'm trying to test a change I've made locally, but by validating it on
Dataflow.  It works locally, but I want to validate on Dataflow.  I've
tried a few different attempts at module substitution in the build.gradle
config file for the pipeline I'm trying to deploy, but I haven't had any
success yet.

How might I be able to replace the beam-sdks-java-io-google-cloud-platform
module usually installed from maven with a local jar generated from
running:

"./gradlew :sdk:java:io:google-cloud-platform:jar"

Thanks,
Evan


Beam Dependency Check Report (2022-07-21)

2022-07-21 Thread Apache Jenkins Server

High Priority Dependency Updates Of Beam Python SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  
cachetools
4.2.4
5.2.0
2021-12-27
2022-06-02
chromedriver-binary
100.0.4896.60.0
104.0.5112.29.0
2022-05-05
2022-07-14
dill
0.3.1.1
0.3.5.1
2019-10-07
2022-05-26
distlib
0.3.1
0.3.5
2021-05-31
2022-07-14
google-api-core
1.32.0
2.8.2
2022-07-14
2022-06-16
google-auth
1.35.0
2.9.1
2021-08-23
2022-07-14
google-cloud-bigquery
2.34.4
3.2.0
2022-06-09
2022-06-09
google-cloud-bigtable
1.7.2
2.10.1
2022-06-09
2022-06-16
google-cloud-dataproc
3.1.1
5.0.0
2022-02-21
2022-07-21
google-cloud-datastore
1.15.5
2.8.0
2022-06-16
2022-07-21
google-cloud-language
1.3.2
2.5.1
2022-06-16
2022-07-14
google-cloud-recommendations-ai
0.2.0
0.7.0
2021-07-05
2022-07-14
google-cloud-spanner
1.19.3
3.17.0
2022-06-16
2022-07-21
google-cloud-videointelligence
1.16.3
2.8.0
2022-06-16
2022-07-21
google-cloud-vision
1.0.2
3.0.0
2022-06-16
2022-07-21
grpcio-tools
1.37.0
1.47.0
2021-04-12
2022-06-23
jupyter-client
6.1.12
7.3.4
2021-04-12
2022-06-09
mistune
0.8.4
2.0.4
2021-12-06
2022-07-21
mock
2.0.0
4.0.3
2022-06-02
2020-12-14
mypy-protobuf
1.18
3.2.0
2020-03-24
2022-01-24
Pillow
7.2.0
9.2.0
2020-10-19
2022-07-07
pluggy
0.13.1
1.0.0
2021-08-30
2021-08-30
protobuf
3.20.1
4.21.2
2022-06-02
2022-06-30
pyarrow
7.0.0
8.0.0
2022-02-07
2022-05-12
PyHamcrest
1.10.1
2.0.3
2020-01-20
2021-12-13
pymongo
3.12.3
4.2.0
2021-12-13
2022-07-21
pytest
4.6.11
7.1.2
2020-07-08
2022-05-05
pytest-timeout
1.4.2
2.1.0
2021-10-11
2022-01-24
pytest-xdist
1.34.0
2.5.0
2020-08-17
2021-12-13
selenium
3.141.0
4.3.0
2021-11-18
2022-06-30
setuptools
62.6.0
63.2.0
2022-06-23
2022-07-14
tenacity
5.1.5
8.0.1
2019-11-11
2021-07-19
High Priority Dependency Updates Of Beam Java SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  
biz.aQute:bndlib
1.50.0
2.0.0.20130123-133441
2011-11-04
2013-02-27
com.alibaba:fastjson
1.2.69
2.0.9
2020-05-31
2022-07-10
com.amazonaws:amazon-kinesis-client
1.14.2
1.14.8
2021-02-24
2022-02-24
com.amazonaws:amazon-kinesis-producer
0.14.1
0.14.12
2020-07-31
2022-03-16
com.azure:azure-core
1.9.0
1.30.0
2020-10-02
2022-06-30
com.azure:azure-identity
1.0.8
1.5.3
2020-07-07
2022-06-30
com.azure:azure-storage-blob
12.10.0
12.18.0
2021-01-15
2022-07-07
com.azure:azure-storage-common
12.10.0
12.17.0
2021-01-14
2022-07-07
com.carrotsearch.randomizedtesting:randomizedtesting-runner
2.7.8
2.8.0
2020-07-07
2022-06-28
com.clearspring.analytics:stream
2.9.5
2.9.8
2016-08-01
2019-08-27
com.datastax.cassandra:cassandra-driver-core
3.10.2
4.0.0
2020-08-26
2019-03-18
com.datastax.cassandra:cassandra-driver-mapping
3.10.2
3.11.2
2020-08-26
2022-04-28
com.esotericsoftware:kryo
4.0.2
5.3.0
2018-03-20
2022-02-11
com.esotericsoftware.kryo:kryo
2.21
2.24.0
2013-02-27
2014-05-04
com.github.ben-manes.versions:com.github.ben-manes.versions.gradle.plugin
0.33.0
0.42.0
2020-09-14
2022-02-07

Beam High Priority Issue Report

2022-07-21 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/22321 
PortableRunnerTestWithExternalEnv.test_pardo_large_input is regularly failing 
on jenkins
https://github.com/apache/beam/issues/22303 [Task]: Add tests to Kafka SDF and 
fix known and discovered issues
https://github.com/apache/beam/issues/22299 [Bug]: JDBCIO Write freeze at 
getConnection() in WriteFn
https://github.com/apache/beam/issues/22188 BigQuery Storage API sink sometimes 
gets stuck outputting to an invalid timestamp
https://github.com/apache/beam/issues/21935 [Bug]: Reject illformed GBK Coders
https://github.com/apache/beam/issues/21794 Dataflow runner creates a new timer 
whenever the output timestamp is change
https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output 
to Failed Inserts PCollection
https://github.com/apache/beam/issues/21704 beam_PostCommit_Java_DataflowV2 
failures parent bug
https://github.com/apache/beam/issues/21703 pubsublite.ReadWriteIT failing in 
beam_PostCommit_Java_DataflowV1 and V2
https://github.com/apache/beam/issues/21702 SpannerWriteIT failing in beam 
PostCommit Java V1
https://github.com/apache/beam/issues/21701 beam_PostCommit_Java_DataflowV1 
failing with a variety of flakes and errors
https://github.com/apache/beam/issues/21700 
--dataflowServiceOptions=use_runner_v2 is broken
https://github.com/apache/beam/issues/21696 Flink Tests failure :  
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.beam.runners.core.construction.SerializablePipelineOptions 
https://github.com/apache/beam/issues/21695 DataflowPipelineResult does not 
raise exception for unsuccessful states.
https://github.com/apache/beam/issues/21694 BigQuery Storage API insert with 
writeResult retry and write to error table
https://github.com/apache/beam/issues/21480 flake: 
FlinkRunnerTest.testEnsureStdoutStdErrIsRestored
https://github.com/apache/beam/issues/21472 Dataflow streaming tests failing 
new AfterSynchronizedProcessingTime test
https://github.com/apache/beam/issues/21471 Flakes: Failed to load cache entry
https://github.com/apache/beam/issues/21470 Test flake: test_split_half_sdf
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21468 
beam_PostCommit_Python_Examples_Dataflow failing
https://github.com/apache/beam/issues/21467 GBK and CoGBK streaming Java load 
tests failing
https://github.com/apache/beam/issues/21465 Kafka commit offset drop data on 
failure for runners that have non-checkpointing shuffle
https://github.com/apache/beam/issues/21463 NPE in Flink Portable 
ValidatesRunner streaming suite
https://github.com/apache/beam/issues/21462 Flake in 
org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use
https://github.com/apache/beam/issues/21271 pubsublite.ReadWriteIT flaky in 
beam_PostCommit_Java_DataflowV2  
https://github.com/apache/beam/issues/21270 
org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView
 flaky on Dataflow Runner V2
https://github.com/apache/beam/issues/21268 Race between member variable being 
accessed due to leaking uninitialized state via OutboundObserverFactory
https://github.com/apache/beam/issues/21267 WriteToBigQuery submits a duplicate 
BQ load job if a 503 error code is returned from googleapi
https://github.com/apache/beam/issues/21266 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
 is flaky in Java ValidatesRunner Flink suite.
https://github.com/apache/beam/issues/21265 
apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
https://github.com/apache/beam/issues/21264 beam_PostCommit_Python36 - 
CrossLanguageSpannerIOTest - flakey failing
https://github.com/apache/beam/issues/21263 (Broken Pipe induced) Bricked 
Dataflow Pipeline 
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21261 
org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
 is flaky
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21257 Either Create or DirectRunner fails 
to produce all elements to the following transform
https://github.com/apache/beam/issues/21123 Multiple jobs running on Flink 
session cluster reuse the persistent Python environment.
https://github.com/apache/beam/issues/21121 

Re: [ANNOUNCE] New committer: Steven Niemitz

2022-07-21 Thread Moritz Mack
Congrats, Steven!

On 21.07.22, 05:25, "Evan Galpin"  wrote:

Congrats! Well deserved! On Wed, Jul 20, 2022 at 15:⁠​17 Chamikara Jayalath via 
dev  wrote:⁠​ Congrats, Steve! On Wed, Jul 20, 2022, 
9:⁠​16 AM Austin Bennett  wrote:⁠​ Great!   ‍ 
‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍
ZjQcmQRYFpfptBannerStart
ZjQcmQRYFpfptBannerEnd
Congrats! Well deserved!

On Wed, Jul 20, 2022 at 15:17 Chamikara Jayalath via dev 
mailto:dev@beam.apache.org>> wrote:
Congrats, Steve!

On Wed, Jul 20, 2022, 9:16 AM Austin Bennett 
mailto:whatwouldausti...@gmail.com>> wrote:
Great!

On Wed, Jul 20, 2022 at 10:11 AM Aizhamal Nurmamat kyzy 
mailto:aizha...@apache.org>> wrote:
Congrats, Steve!

On Wed, Jul 20, 2022 at 3:10 AM Jan Lukavský 
mailto:je...@seznam.cz>> wrote:

Congrats Steve!
On 7/20/22 06:20, Reuven Lax via dev wrote:
Welcome Steve!

On Tue, Jul 19, 2022 at 1:05 PM Connell O'Callaghan via dev 
mailto:dev@beam.apache.org>> wrote:

+++1 Woohoo! Congratulations Steven (and to the BEAM community) on this 
announcement!!!

Thank you Luke for this update


On Tue, Jul 19, 2022 at 12:34 PM Robert Burke 
mailto:rob...@frantil.com>> wrote:
Woohoo! Welcome and congratulations Steven!

On Tue, Jul 19, 2022, 12:40 PM Luke Cwik via dev 
mailto:dev@beam.apache.org>> wrote:
Hi all,

Please join me and the rest of the Beam PMC in welcoming a new committer: 
Steven Niemitz (sniemitz@)

Steven started contributing to Beam in 2017 fixing bugs and improving logging 
and usability. Stevens most recent focus has been on performance optimizations 
within the Java SDK.

Considering the time span and number of contributions, the Beam PMC trusts 
Steven with the responsibilities of a Beam committer. [1]

Thank you Steven! And we are looking to see more of your contributions!

Luke, on behalf of the Apache Beam PMC

[1] 
https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer

As a recipient of an email from Talend, your contact personal data will be on 
our systems. Please see our privacy notice. 




Re: Using unbounded source as a side input for a DoFn

2022-07-21 Thread Cristian Constantinescu
Disclaimer: I am not an expert, but I kinda worked on something similar.

A few points I'd like to bring up:
- Side inputs do not trigger the processElement function when new elements
are added to the input. That means that if your side input doesn't have the
desired other item in the side input at the time it's processed, too bad.
Unless you use timers to reprocess it at a later time when the side input
might have more data.

- If your goal to somewhat combine two PCollections, I would suggest you
look into CoGroupByKey[1] and it's schema aware brother CoGroup [2], you
can then use a global window that triggers on every element
(AfterProcessingTime.pastFirstElementInPane).

- The input of a PTransform can also be a PCollectionTuple and in your
inner ParDo, you can loop through items of either collection. Something
like this. Pseudocode:

class FooTransfrom extends PTransform>{
private TupleTag aTag = new TupleTag() {} <- curly brackets
are important as far as I know
private TupleTag bTag = new TupleTag() {}
// getters for the above fields

public  PCollection  expand(PCollectionTuple input){
return input.apply(new FooDoFn(aTag, bTag));
}

private static class FooDoFn extends DoFn{
constructor(TupleTag aTag, TupleTag bTag){
// set fields
}
public void processElement(Context ctx){
var itemFromA = ctx.element().get(this.aTag);
if(itemFromA != null) { logic }

var itemsFromB = ctx.element().get(this.bTag);
if(itemFromB != null) { logic } // adding these to a state variable would
effectively be an unbounded side input
}
}
}

Hope it helps,
Cristian

[1]
https://beam.apache.org/documentation/transforms/java/aggregation/cogroupbykey/
[2]
https://beam.apache.org/releases/javadoc/2.36.0/org/apache/beam/sdk/schemas/transforms/CoGroup.html

On Thu, Jul 21, 2022 at 1:45 AM Sahil Modak 
wrote:

> Hi,
>
> We are looking to use the side input feature for one of our DoFns. The
> side input has to be a PCollection which is being constructed from a
> subscription using PubsubIO.read
>
> We want our primary DoFn which operates in a global window KV pair to
> access this side input.
> The goal is to have all the messages of this unbounded source (side input)
> to be available across all the KV pairs in our input DoFn which will use
> this side input.
>
> Is it possible to have an unbounded source (like pubsub) as a side input?
>
> Thanks,
> Sahil
>


Re: Using unbounded source as a side input for a DoFn

2022-07-21 Thread Reuven Lax via dev
How do you want to use the side input?

On Wed, Jul 20, 2022 at 10:45 PM Sahil Modak 
wrote:

> Hi,
>
> We are looking to use the side input feature for one of our DoFns. The
> side input has to be a PCollection which is being constructed from a
> subscription using PubsubIO.read
>
> We want our primary DoFn which operates in a global window KV pair to
> access this side input.
> The goal is to have all the messages of this unbounded source (side input)
> to be available across all the KV pairs in our input DoFn which will use
> this side input.
>
> Is it possible to have an unbounded source (like pubsub) as a side input?
>
> Thanks,
> Sahil
>