Re: unvendoring bytebuddy

2022-03-17 Thread Ismaël Mejía
+1

Probably worth to check if we have dependencies that rely on Byte
Buddy that can produce conflicts but I doubt it.
My only worry was ASM leaking into the classpath, but it seems that
Byte Buddy already shades ASM so that should not be an issue.


Ismaël

On Thu, Mar 17, 2022 at 5:09 PM Liam Miller-Cushon  wrote:
>
> Hello,
>
> I wanted to raise the possibility of using the upstream version of bytebuddy 
> in beam, instead of vendoring it, from BEAM-14117.
>
> Vendoring the bytebuddy dep was introduced in BEAM-1019:
>
>> We encountered backward incompatible changes in bytebuddy during upgrading 
>> to Mockito 2.0. Shading bytebuddy helps to address them and future issues.
>
>
> Vendoring makes it harder to upgrade the bytebuddy version, and upgrading 
> bytebuddy is one of the first things that needs to happen to support new Java 
> versions (e.g. BEAM-14065, BEAM-12241).
>
> Vendoring or shading bytebuddy is discouraged by the upstream owners of the 
> library, see e.g. https://github.com/assertj/assertj-core/issues/2470 where 
> assertj was migrated off a shaded version:
>
>> As Byte Buddy retains compatibility, not shading the library would allow 
>> running recent JVMs without an update of assertj but only BB. Other 
>> libraries like Mockito or Hibernate do not shade BB and there are no known 
>> issues with this approach.
>
>
> Does anyone have additional context about the issues encountered during the 
> mockito 2.0 upgrade, or concerns with trying to unvendor bytebuddy?
>
> Thanks,
> Liam


Flaky test issue report (51)

2022-03-17 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20labels%20%3D%20flake)

These are P1 issues because they have a major negative impact on the community 
and make it hard to determine the quality of the software.

https://issues.apache.org/jira/browse/BEAM-13998: Build flakes - Timeout 
waiting to lock daemon addresses registry (created 2022-02-24)
https://issues.apache.org/jira/browse/BEAM-13952: Dataflow streaming tests 
failing new AfterSynchronizedProcessingTime test (created 2022-02-15)
https://issues.apache.org/jira/browse/BEAM-13859: Test flake: 
test_split_half_sdf (created 2022-02-09)
https://issues.apache.org/jira/browse/BEAM-13858: Failure of 
:sdks:go:examples:wordCount in check "Mac run local environment shell script" 
(created 2022-02-08)
https://issues.apache.org/jira/browse/BEAM-13850: 
beam_PostCommit_Python_Examples_Dataflow failing (created 2022-02-08)
https://issues.apache.org/jira/browse/BEAM-13822: GBK and CoGBK streaming 
Java load tests failing (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13810: Flaky tests: Gradle build 
daemon disappeared unexpectedly (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13797: Flakes: Failed to load 
cache entry (created 2022-02-01)
https://issues.apache.org/jira/browse/BEAM-13783: 
apache_beam.transforms.combinefn_lifecycle_test.LocalCombineFnLifecycleTest.test_combine
 is flaky (created 2022-02-01)
https://issues.apache.org/jira/browse/BEAM-13741: 
:sdks:java:extensions:sql:hcatalog:compileJava failing in 
beam_Release_NightlySnapshot  (created 2022-01-25)
https://issues.apache.org/jira/browse/BEAM-13708: flake: 
FlinkRunnerTest.testEnsureStdoutStdErrIsRestored (created 2022-01-20)
https://issues.apache.org/jira/browse/BEAM-13575: Flink 
testParDoRequiresStableInput flaky (created 2021-12-28)
https://issues.apache.org/jira/browse/BEAM-13519: Java precommit flaky 
(timing out) (created 2021-12-22)
https://issues.apache.org/jira/browse/BEAM-13500: NPE in Flink Portable 
ValidatesRunner streaming suite (created 2021-12-21)
https://issues.apache.org/jira/browse/BEAM-13453: Flake in 
org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use 
(created 2021-12-13)
https://issues.apache.org/jira/browse/BEAM-13393: GroupIntoBatchesTest is 
failing (created 2021-12-07)
https://issues.apache.org/jira/browse/BEAM-13367: 
[beam_PostCommit_Python36] [ 
apache_beam.io.gcp.experimental.spannerio_read_it_test] Failure summary 
(created 2021-12-01)
https://issues.apache.org/jira/browse/BEAM-13312: 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
 is flaky in Java Spark ValidatesRunner suite  (created 2021-11-23)
https://issues.apache.org/jira/browse/BEAM-13311: 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
 is flaky in Java ValidatesRunner Flink suite. (created 2021-11-23)
https://issues.apache.org/jira/browse/BEAM-13234: Flake in 
StreamingWordCountIT.test_streaming_wordcount_it (created 2021-11-12)
https://issues.apache.org/jira/browse/BEAM-13025: pubsublite.ReadWriteIT 
flaky in beam_PostCommit_Java_DataflowV2   (created 2021-10-08)
https://issues.apache.org/jira/browse/BEAM-12928: beam_PostCommit_Python36 
- CrossLanguageSpannerIOTest - flakey failing (created 2021-09-21)
https://issues.apache.org/jira/browse/BEAM-12859: 
org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
 is flaky (created 2021-09-08)
https://issues.apache.org/jira/browse/BEAM-12858: 
org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler 
is flaky (created 2021-09-08)
https://issues.apache.org/jira/browse/BEAM-12809: 
testTwoTimersSettingEachOtherWithCreateAsInputBounded flaky (created 2021-08-26)
https://issues.apache.org/jira/browse/BEAM-12794: 
PortableRunnerTestWithExternalEnv.test_pardo_timers flaky (created 2021-08-24)
https://issues.apache.org/jira/browse/BEAM-12793: 
beam_PostRelease_NightlySnapshot failed (created 2021-08-24)
https://issues.apache.org/jira/browse/BEAM-12766: Already Exists: Dataset 
apache-beam-testing:python_bq_file_loads_NNN (created 2021-08-16)
https://issues.apache.org/jira/browse/BEAM-12673: 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey (created 2021-07-28)
https://issues.apache.org/jira/browse/BEAM-12515: Python PreCommit flaking 
in PipelineOptionsTest.test_display_data (created 2021-06-18)
https://issues.apache.org/jira/browse/BEAM-12322: Python precommit flaky: 
Failed to read inputs in the data plane (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12320: 
PubsubTableProviderIT.tes

P1 issues report (71)

2022-03-17 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky 
tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20priority%20%3D%20P1%20AND%20(labels%20is%20EMPTY%20OR%20labels%20!%3D%20flake).

See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the 
meaning and expectations around P1 issues.

https://issues.apache.org/jira/browse/BEAM-14118: beam-vendor-grpc-1_43_2 
shades vulnerable Netty version (created 2022-03-16)
https://issues.apache.org/jira/browse/BEAM-14099: Pipeline with large 
number of PTransforms fails with StackOverflowError  (created 2022-03-14)
https://issues.apache.org/jira/browse/BEAM-14064: ElasticSearchIO#Write 
buffering and outputting across windows (created 2022-03-07)
https://issues.apache.org/jira/browse/BEAM-14017: 
beam_PreCommit_CommunityMetrics_Cron is failing. (created 2022-03-01)
https://issues.apache.org/jira/browse/BEAM-13959: Unable to write to 
BigQuery tables with column named 'f' (created 2022-02-16)
https://issues.apache.org/jira/browse/BEAM-13953: Document BigQueryIO 
Storage Write API methods (created 2022-02-15)
https://issues.apache.org/jira/browse/BEAM-13952: Dataflow streaming tests 
failing new AfterSynchronizedProcessingTime test (created 2022-02-15)
https://issues.apache.org/jira/browse/BEAM-13950: PVR_Spark2_Streaming 
perma-red (created 2022-02-15)
https://issues.apache.org/jira/browse/BEAM-13920: Beam x-lang Dataflow 
tests failing due to _InactiveRpcError (created 2022-02-10)
https://issues.apache.org/jira/browse/BEAM-13858: Failure of 
:sdks:go:examples:wordCount in check "Mac run local environment shell script" 
(created 2022-02-08)
https://issues.apache.org/jira/browse/BEAM-13850: 
beam_PostCommit_Python_Examples_Dataflow failing (created 2022-02-08)
https://issues.apache.org/jira/browse/BEAM-13830: XVR Direct/Spark/Flink 
tests are timing out (created 2022-02-04)
https://issues.apache.org/jira/browse/BEAM-13822: GBK and CoGBK streaming 
Java load tests failing (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13809: beam_PostCommit_XVR_Flink 
flaky: Connection refused (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13805: Simplify version override 
for Dev versions of the Go SDK. (created 2022-02-02)
https://issues.apache.org/jira/browse/BEAM-13798: Upgrade Kubernetes 
Clusters (created 2022-02-01)
https://issues.apache.org/jira/browse/BEAM-13747: Add integration testing 
for BQ Storage API  write modes (created 2022-01-26)
https://issues.apache.org/jira/browse/BEAM-13741: 
:sdks:java:extensions:sql:hcatalog:compileJava failing in 
beam_Release_NightlySnapshot  (created 2022-01-25)
https://issues.apache.org/jira/browse/BEAM-13715: Kafka commit offset drop 
data on failure for runners that have non-checkpointing shuffle (created 
2022-01-21)
https://issues.apache.org/jira/browse/BEAM-13582: Beam website precommit 
mentions broken links, but passes. (created 2021-12-30)
https://issues.apache.org/jira/browse/BEAM-13487: WriteToBigQuery Dynamic 
table destinations returns wrong tableId (created 2021-12-17)
https://issues.apache.org/jira/browse/BEAM-13393: GroupIntoBatchesTest is 
failing (created 2021-12-07)
https://issues.apache.org/jira/browse/BEAM-13237: 
org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView
 flaky on Dataflow Runner V2 (created 2021-11-12)
https://issues.apache.org/jira/browse/BEAM-13164: Race between member 
variable being accessed due to leaking uninitialized state via 
OutboundObserverFactory (created 2021-11-01)
https://issues.apache.org/jira/browse/BEAM-13132: WriteToBigQuery submits a 
duplicate BQ load job if a 503 error code is returned from googleapi (created 
2021-10-27)
https://issues.apache.org/jira/browse/BEAM-13087: 
apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible 
(created 2021-10-20)
https://issues.apache.org/jira/browse/BEAM-13078: Python DirectRunner does 
not emit data at GC time (created 2021-10-18)
https://issues.apache.org/jira/browse/BEAM-13076: Python AfterAny, AfterAll 
do not follow spec (created 2021-10-18)
https://issues.apache.org/jira/browse/BEAM-13010: Delete orphaned files 
(created 2021-10-06)
https://issues.apache.org/jira/browse/BEAM-12995: Consumer group with 
random prefix (created 2021-10-04)
https://issues.apache.org/jira/browse/BEAM-12959: Dataflow error in 
CombinePerKey operation (created 2021-09-26)
https://issues.apache.org/jira/browse/BEAM-12867: Either Create or 
DirectRunner fails to produce all elements to the following transform (created 
2021-09-09)
https://issues.apache.org/jira/browse/BEAM-12843: (Broken Pipe induced) 
Bricked Dataflow Pi

Access to the apache-beam-testing project

2022-03-17 Thread Elizaveta Lomteva
Hi community,

We are working on the new Apache CDAP IO connector. So could anyone please add 
us to the GCP project apache-beam-testing to be able to test our Beam pipelines 
there with the Dataflow runner?

email addresses of the development team:
igor.krasa...@akvelon.com

vitaly.terent...@akvelon.com
elizaveta.lomt...@akvelon.com

Many thanks for considering my request,
Elizaveta


gRPC OOM Error

2022-03-17 Thread Bjorn De Bakker
Hey all

I'm trying to run an Apache Beam pipeline on Flink (hosted on GKE).  I'm
using a custom container image that was built to run trained TensorFlow
models on a series of images, but the base Docker image is
apache/beam_python3.7_sdk:2.30.0, with a few libraries installed on top of
that.

I'm using the Apache Flink operator
 which is currently being
maintained by Spotify.  Because the pods are processing a lot of data, we
had to significantly increase memory usage of the TaskManagers.  However,
internally the code is throwing an OOM exception at one point, linked to a
gRPC call that is taking too long to complete:

*Output channel stalled for 1023s, outbound thread CHAIN MapPartition
(MapPartition at [1]PerformInference) -> FlatMap (FlatMap at
ExtractOutput[0]) -> Map (Key Extractor) -> GroupCombine (GroupCombine at
GroupCombine:
PerformInferenceAndCombineResults_dep_049/GroupPredictionsByImage) -> Map
(Key Extractor) (1/1). See: https://issues.apache.org/jira/browse/BEAM-4280
 for the history for this
issue.*








*Feb 18, 2022 11:51:05 AM
org.apache.beam.vendor.grpc.v1p36p0.io.grpc.netty.NettyServerTransport
notifyTerminatedINFO: Transport
failedorg.apache.beam.vendor.grpc.v1p36p0.io.netty.util.internal.OutOfDirectMemoryError:
failed to allocate 2097152 byte(s) of direct memory (used: 1205862679, max:
1207959552) at
org.apache.beam.vendor.grpc.v1p36p0.io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:754)
at
org.apache.beam.vendor.grpc.v1p36p0.io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:709)
at
org.apache.beam.vendor.grpc.v1p36p0.io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:645)
at
org.apache.beam.vendor.grpc.v1p36p0.io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:621)
at
org.apache.beam.vendor.grpc.v1p36p0.io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:204)
at
org.apache.beam.vendor.grpc.v1p36p0.io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:188)*

I've tried to assign more memory to either the TaskManager or JobManager in
Flink, but the error stays the same and the numbers don't change either,
which tells me that this is not an issue with Flink itself, but with an
underlying process.

I've done some googling, but I couldn't find more information.  As this
isn't anything we touch, but an underlying library, I'm not sure what we
can change to solve the issue.

Any idea what the problem could be?

Thanks.

Kr,

Bjorn


Re: Why Go Generics Can't Replace the Code Generator (Yet!)

2022-03-17 Thread Robert Burke
Apologies to Danny, i haven't read your considered reply yet, as i need to
address Kerry's question on priorities immediately.



I feel it's entirely critical to have before native streaming, because it
serves users more in the long term.

The cost of waiting is that we can't establish the convention of
registering things with the SDK to get performance boosts and instead ask
people to fiddle with their code again a few months down the road. This
leads to proliferation of code that will be unable to set the precedent of
efficiency for organizations who aren't able to keep up with the latest and
greatest in Beam.

The cost of waiting is having users burn through CPU unnecessarily
contributing to the ongoing boiling of the planet.

The cost of waiting is making users spend more on said CPU and memory, when
they could have faster completing jobs more cheaply now, rather than later.
That does more for all Go SDK users now than streaming would.

Setting conventions, and having a good experience with the SDK is something
we should be doing sooner rather than later, so we aren't constantly
changing things on our users with New Ways To Do Old Things, vs unlocking
abilities to do things in another SDK, that have Options in the other SDKs
already.


On Thu, Mar 17, 2022, 8:01 AM Kerry Donny-Clark  wrote:

> My thoughts:
> What are the benefits of doing this now/the costs of waiting?
> Are generics something that gives us more value than streaming features?
> It makes sense to me to punt this work until the SDK is more developed, so
> that we can observe the consequences of streaming instead of imagining
> them.
> I'm the least expert Go developer on this thread, so please let me know if
> I'm missing something obvious. I really appreciate the clear back and forth
> :)
>
> On Thu, Mar 17, 2022 at 8:36 AM Danny McCormick 
> wrote:
>
>> Robert and I have been catching up offline, I did still want to respond
>> here for the broader audience, both to clarify some of the questions Robert
>> brought up and to share some of the progress we've made towards getting on
>> the same page. Structurally, I'll walk through the questions/assertions
>> brought up above to establish common ground and where we have seen the
>> world differently. Some of that has been covered in our offline
>> conversation, some has not. At the end I'll try to give a brief summary of
>> what we've talked about offline. Robert, please feel free to correct me if
>> I've misrepresented any of that conversation!
>>
>> *Robert's tl;dr*
>>
>> > I'd also like to emphasize that we gain 99.9...% of the code
>> generator benefit by covering the ProcessElement calls.
>> > No other method is called similarly often by the system (on every
>> element). Perfect is the enemy of Good Enough.
>>
>> I think this is actually fundamentally where we diverged (more on that
>> below) - I've been treating coverage of all structural DoFn methods as a
>> requirement for a code generator replacement, Robert has not. More on that
>> below.
>>
>> *1. Problem: Users have to determine which Registration function to use.*
>>
>> *1.a)* I think that perhaps there has been a breakdown of terminology
>> here. When I say on the order of 1000s of DoFn variants, I'm not just
>> talking about DoFn1x0 to DoFn7x4. If that is all we need to worry about I
>> agree that's not a problem. I'm referring to the section of my doc where I
>> discuss the need for a proliferation of interfaces to support each
>> structural DoFn method such that a user can pass in a DoFn of the exact
>> right interface. See
>> https://docs.google.com/document/d/1imYbBeu2FNJkwPNm6E9GEJkjpHnHscvFoKAE6AISvFA/edit#heading=h.e24tsm6ev6f0
>> where I discuss the problem - specifically, it culminates in "To use
>> this function, the user would be required to know which of the many DoFn
>> variants they should using...".
>>
>> It's fine if you disagree with that (and I think you have brought up some
>> valid points about ways to reduce the complexity by composing interfaces),
>> but I'm heavily relying on shared understanding of that doc for
>> contextualizing this conversation.
>>
>> *1.b) *Sorry that was confusing, you are correct that I've been abusing
>> the term cast when really what I've meant in most cases is a user forcing
>> type inference (e.g. `foo.(Bar)`)
>>
>> *1.c) *Agreed if there are ~30 types of DoFn reasonably named, I think
>> this becomes exceedingly inconvenient if there is indeed an explosion of
>> types into the hundreds or thousands. I don't think we disagree here, this
>> point mostly reduces to (1.a) - please correct me if I'm wrong.
>>
>> *1.d) *That's true, we could also have it print out arbitrary
>> registration code though - I don't think that's a particularly compelling
>> user experience either way.
>>
>> *2. Feasibility.*
>>
>> *2.a) *To be honest, I just flat out disagree here - like I said above,
>> your example doesn't touch the actual user facing function which is the
>> hard part of the pr