Flaky test issue report (29)

2021-10-22 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20labels%20%3D%20flake)

These are P1 issues because they have a major negative impact on the community 
and make it hard to determine the quality of the software.

https://issues.apache.org/jira/browse/BEAM-13025: 
beam_PostCommit_Java_DataflowV2 failing pubsublite.ReadWriteIT (created 
2021-10-08)
https://issues.apache.org/jira/browse/BEAM-12928: beam_PostCommit_Python36 
- CrossLanguageSpannerIOTest - flakey failing (created 2021-09-21)
https://issues.apache.org/jira/browse/BEAM-12859: 
org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
 is flaky (created 2021-09-08)
https://issues.apache.org/jira/browse/BEAM-12809: 
testTwoTimersSettingEachOtherWithCreateAsInputBounded flaky (created 2021-08-26)
https://issues.apache.org/jira/browse/BEAM-12794: 
PortableRunnerTestWithExternalEnv.test_pardo_timers flaky (created 2021-08-24)
https://issues.apache.org/jira/browse/BEAM-12766: Already Exists: Dataset 
apache-beam-testing:python_bq_file_loads_NNN (created 2021-08-16)
https://issues.apache.org/jira/browse/BEAM-12540: 
beam_PostRelease_NightlySnapshot - Task 
:runners:direct-java:runMobileGamingJavaDirect FAILED (created 2021-06-25)
https://issues.apache.org/jira/browse/BEAM-12515: Python PreCommit flaking 
in PipelineOptionsTest.test_display_data (created 2021-06-18)
https://issues.apache.org/jira/browse/BEAM-12322: Python precommit flaky: 
Failed to read inputs in the data plane (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12320: 
PubsubTableProviderIT.testSQLSelectsArrayAttributes[0] failing in SQL 
PostCommit (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12291: 
org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming: 
false] is flaky (created 2021-05-05)
https://issues.apache.org/jira/browse/BEAM-12200: 
SamzaStoreStateInternalsTest is flaky (created 2021-04-20)
https://issues.apache.org/jira/browse/BEAM-12163: Python GHA PreCommits 
flake with grpc.FutureTimeoutError on SDK harness startup (created 2021-04-13)
https://issues.apache.org/jira/browse/BEAM-12061: beam_PostCommit_SQL 
failing on KafkaTableProviderIT.testFakeNested (created 2021-03-27)
https://issues.apache.org/jira/browse/BEAM-11837: Java build flakes: 
"Memory constraints are impeding performance" (created 2021-02-18)
https://issues.apache.org/jira/browse/BEAM-11661: hdfsIntegrationTest 
flake: network not found (py38 postcommit) (created 2021-01-19)
https://issues.apache.org/jira/browse/BEAM-11641: Bigquery Read tests are 
flaky on Flink runner in Python PostCommit suites (created 2021-01-15)
https://issues.apache.org/jira/browse/BEAM-11541: 
testTeardownCalledAfterExceptionInProcessElement flakes on direct runner. 
(created 2020-12-30)
https://issues.apache.org/jira/browse/BEAM-10955: Flink Java Runner test 
flake: Could not find Flink job (FlinkJobNotFoundException) (created 2020-09-23)
https://issues.apache.org/jira/browse/BEAM-10866: 
PortableRunnerTestWithSubprocesses.test_register_finalizations flaky on macOS 
(created 2020-09-09)
https://issues.apache.org/jira/browse/BEAM-10485: Failure / flake: 
ElasticsearchIOTest > testWriteWithIndexFn (created 2020-07-14)
https://issues.apache.org/jira/browse/BEAM-9649: 
beam_python_mongoio_load_test started failing due to mismatched results 
(created 2020-03-31)
https://issues.apache.org/jira/browse/BEAM-8101: Flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful for 
Direct, Spark, Flink (created 2019-08-27)
https://issues.apache.org/jira/browse/BEAM-8035: 
WatchTest.testMultiplePollsWithManyResults flake: Outputs must be in timestamp 
order (sickbayed) (created 2019-08-22)
https://issues.apache.org/jira/browse/BEAM-7827: 
MetricsTest$AttemptedMetricTests.testAllAttemptedMetrics is flaky on 
DirectRunner (created 2019-07-26)
https://issues.apache.org/jira/browse/BEAM-7752: Java Validates 
DirectRunner: testTeardownCalledAfterExceptionInFinishBundleStateful flaky 
(created 2019-07-16)
https://issues.apache.org/jira/browse/BEAM-6804: [beam_PostCommit_Java] 
[PubsubReadIT.testReadPublicData] Timeout waiting on Sub (created 2019-03-11)
https://issues.apache.org/jira/browse/BEAM-5286: 
[beam_PostCommit_Java_GradleBuild][org.apache.beam.examples.subprocess.ExampleEchoPipelineTest.testExampleEchoPipeline][Flake]
 .sh script: text file busy. (created 2018-09-01)
https://issues.apache.org/jira/browse/BEAM-5172: 
org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest is flaky (created 
2018-08-20)


Refusing to split Position of last group processed was b'w\xd5\xdd\x82\x00\x01'." python sdk

2021-10-22 Thread Lucas de Castro Magalhães
Hi guys.

Anyone could help me with a problem on my pipeline. The pipeline stuck and
doesn't do anything.

The logging that I received is

jsonPayload: {
job: "2021-10-20_21_00_01-5873629180911776129"
logger: "root:shuffle.py:try_split"
message: "Refusing to split
 at b'w\xd5\xdd\x83\x00\x01': proposed split position is out
of range [b'qni\xad\x00\x01', b'w\xd5\xdd\x83\x00\x01'). Position of last
group processed was b'w\xd5\xdd\x82\x00\x01'."
thread: "63:139803304449792"
worker: "template-calendarapi-bigq-10202100-p8n3-harness-lr2k"


my pipeline is and in red is the part that the job stuck. Sometimes the job
is completed and sometimes not.

[image: image.png]









-- 

Lucas de Castro Magalhães

Innovation Manager

+55 (11) 99420-4667
 Agende sua reunião comigo aqui


santodigital.com.br 
[image: LinkedIn]  [image:
Instagram]  [image: YouTube]
 [image: Facebook]
 [image: Twitter]

  


P1 issues report (50)

2021-10-22 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky 
tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20priority%20%3D%20P1%20AND%20(labels%20is%20EMPTY%20OR%20labels%20!%3D%20flake).

See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the 
meaning and expectations around P1 issues.

https://issues.apache.org/jira/browse/BEAM-13078: Python DirectRunner does 
not emit data at GC time (created 2021-10-18)
https://issues.apache.org/jira/browse/BEAM-13076: Python AfterAny, AfterAll 
do not follow spec (created 2021-10-18)
https://issues.apache.org/jira/browse/BEAM-13074: Metrics are not reported 
by the Flink runner (created 2021-10-18)
https://issues.apache.org/jira/browse/BEAM-13060: Daily Python SDK build is 
not publicly accessible (created 2021-10-15)
https://issues.apache.org/jira/browse/BEAM-13059: Migrate GKE workloads to 
Containerd (created 2021-10-15)
https://issues.apache.org/jira/browse/BEAM-13058: Upgrade Kubernetes APIs 
(created 2021-10-15)
https://issues.apache.org/jira/browse/BEAM-13056: Add method to fetch 
ProcessContext FieldAccess (created 2021-10-14)
https://issues.apache.org/jira/browse/BEAM-13025: 
beam_PostCommit_Java_DataflowV2 failing pubsublite.ReadWriteIT (created 
2021-10-08)
https://issues.apache.org/jira/browse/BEAM-13010: Delete orphaned files 
(created 2021-10-06)
https://issues.apache.org/jira/browse/BEAM-12995: Consumer group with 
random prefix (created 2021-10-04)
https://issues.apache.org/jira/browse/BEAM-12959: Dataflow error in 
CombinePerKey operation (created 2021-09-26)
https://issues.apache.org/jira/browse/BEAM-12867: Either Create or 
DirectRunner fails to produce all elements to the following transform (created 
2021-09-09)
https://issues.apache.org/jira/browse/BEAM-12843: (Broken Pipe induced) 
Bricked Dataflow Pipeline  (created 2021-09-06)
https://issues.apache.org/jira/browse/BEAM-12818: When writing to GCS, 
spread prefix of temporary files and reuse autoscaling of the temporary 
directory (created 2021-08-30)
https://issues.apache.org/jira/browse/BEAM-12807: Java creates an incorrect 
pipeline proto when core-construction-java jar is not in the CLASSPATH (created 
2021-08-26)
https://issues.apache.org/jira/browse/BEAM-12792: Beam worker only installs 
--extra_package once (created 2021-08-24)
https://issues.apache.org/jira/browse/BEAM-12766: Already Exists: Dataset 
apache-beam-testing:python_bq_file_loads_NNN (created 2021-08-16)
https://issues.apache.org/jira/browse/BEAM-12632: ElasticsearchIO: Enabling 
both User/Pass auth and SSL overwrites User/Pass (created 2021-07-16)
https://issues.apache.org/jira/browse/BEAM-12540: 
beam_PostRelease_NightlySnapshot - Task 
:runners:direct-java:runMobileGamingJavaDirect FAILED (created 2021-06-25)
https://issues.apache.org/jira/browse/BEAM-12525: SDF BoundedSource seems 
to execute significantly slower than 'normal' BoundedSource (created 2021-06-22)
https://issues.apache.org/jira/browse/BEAM-12505: codecov/patch has poor 
behavior (created 2021-06-17)
https://issues.apache.org/jira/browse/BEAM-12500: Dataflow SocketException 
(SSLException) error while trying to send message from Cloud Pub/Sub to 
BigQuery (created 2021-06-16)
https://issues.apache.org/jira/browse/BEAM-12484: JdbcIO date conversion is 
sensitive to OS (created 2021-06-14)
https://issues.apache.org/jira/browse/BEAM-12467: 
java.io.InvalidClassException With Flink Kafka (created 2021-06-09)
https://issues.apache.org/jira/browse/BEAM-12279: Implement 
destination-dependent sharding in FileIO.writeDynamic (created 2021-05-04)
https://issues.apache.org/jira/browse/BEAM-12256: 
PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode some 
Avro logical types (created 2021-04-29)
https://issues.apache.org/jira/browse/BEAM-11959: Python Beam SDK Harness 
hangs when installing pip packages (created 2021-03-11)
https://issues.apache.org/jira/browse/BEAM-11906: No trigger early 
repeatedly for session windows (created 2021-03-01)
https://issues.apache.org/jira/browse/BEAM-11875: XmlIO.Read does not 
handle XML encoding per spec (created 2021-02-26)
https://issues.apache.org/jira/browse/BEAM-11828: JmsIO is not 
acknowledging messages correctly (created 2021-02-17)
https://issues.apache.org/jira/browse/BEAM-11755: Cross-language 
consistency (RequiresStableInputs) is quietly broken (at least on portable 
flink runner) (created 2021-02-05)
https://issues.apache.org/jira/browse/BEAM-11578: `dataflow_metrics` 
(python) fails with TypeError (when int overflowing?) (created 2021-01-06)
https://issues.apache.org/jira/browse/BEAM-11148: Kafka 
commitOffsetsInFinalize OOM on Flink (created 2020-10-28)
https://issues.apache.org/jira/browse/BEAM-11017: Timer with dataflow 
runner can be set multiple times (datafl

Re: Intro

2021-10-22 Thread Kenneth Knowles
Welcome! Thank you for saying hello!

Kenn

On Fri, Oct 22, 2021 at 9:10 AM Jean-Baptiste Onofre 
wrote:

> Welcome ;)
>
> Regards
> JB
>
> > Le 22 oct. 2021 à 15:52, Moritz Mack  a écrit :
> >
> > Hi all,
> >
> > I’m very much looking forward to start contributing to Beam and just
> want to briefly introduce myself.
> >
> > My name is Moritz (mosche) and I’m working together with Alexey and
> Etienne. Having worked mostly with Spark in the past, I’m excited to dive
> deeper into Beam 😊
> >
> > Looking forward to working with all of you!
> >
> > Kind regards from Munich,
> > Moritz
> >
> > As a recipient of an email from Talend, your contact personal data will
> be on our systems. Please see our privacy notice (updated August 2020) at
> Talend, Inc.
> >
>
>


Re: Intro

2021-10-22 Thread Jean-Baptiste Onofre
Welcome ;)

Regards
JB

> Le 22 oct. 2021 à 15:52, Moritz Mack  a écrit :
> 
> Hi all,
>  
> I’m very much looking forward to start contributing to Beam and just want to 
> briefly introduce myself.
>  
> My name is Moritz (mosche) and I’m working together with Alexey and Etienne. 
> Having worked mostly with Spark in the past, I’m excited to dive deeper into 
> Beam 😊
>  
> Looking forward to working with all of you!
>  
> Kind regards from Munich,
> Moritz
>  
> As a recipient of an email from Talend, your contact personal data will be on 
> our systems. Please see our privacy notice (updated August 2020) at Talend, 
> Inc. 
> 



Re: Intro

2021-10-22 Thread Alexey Romanenko
Welcome to Beam, Moritz!

—
Alexey

> On 22 Oct 2021, at 17:44, Etienne Chauchot  wrote:
> 
> Welcome onboard Moritz !
> 
> Best
> 
> Etienne
> 
> On 22/10/2021 15:52, Moritz Mack wrote:
>> Hi all,
>>  
>> I’m very much looking forward to start contributing to Beam and just want to 
>> briefly introduce myself.
>>  
>> My name is Moritz (mosche) and I’m working together with Alexey and Etienne. 
>> Having worked mostly with Spark in the past, I’m excited to dive deeper into 
>> Beam 😊
>>  
>> Looking forward to working with all of you!
>>  
>> Kind regards from Munich,
>> Moritz
>>  
>> As a recipient of an email from Talend, your contact personal data will be 
>> on our systems. Please see our privacy notice (updated August 2020) at 
>> Talend, Inc.  


Re: Intro

2021-10-22 Thread Etienne Chauchot

Welcome onboard Moritz !

Best

Etienne

On 22/10/2021 15:52, Moritz Mack wrote:


Hi all,

I’m very much looking forward to start contributing to Beam and just 
want to briefly introduce myself.


My name is Moritz (mosche) and I’m working together with Alexey and 
Etienne. Having worked mostly with Spark in the past, I’m excited to 
dive deeper into Beam 😊


Looking forward to working with all of you!

Kind regards from Munich,

Moritz

*As a recipient of an email from Talend, your contact personal data 
will be on our systems. Please see our privacy notice (updated August 
2020) at Talend, Inc. *






Intro

2021-10-22 Thread Moritz Mack
Hi all,

I’m very much looking forward to start contributing to Beam and just want to 
briefly introduce myself.

My name is Moritz (mosche) and I’m working together with Alexey and Etienne. 
Having worked mostly with Spark in the past, I’m excited to dive deeper into 
Beam 😊

Looking forward to working with all of you!

Kind regards from Munich,
Moritz


As a recipient of an email from Talend, your contact personal data will be on 
our systems. Please see our privacy notice (updated August 2020) at Talend, 
Inc. 




Re: Why is Avro Date field using InstantCoder?

2021-10-22 Thread Cristian Constantinescu
Hi everyone,

I'm still trying to figure out what the best path forward is on my project.
While doing some research, I came across the confluent interoperability
page [1]. Beam is currently using version 5.3.2 of the confluent libs. They
have an end of support on July 19, 2022. It's the last version that
supports Avro 1.8 (actually 1.8.1). The 5.4.x confluent libs passes to Avro
1.9.1 and 6.2.x (latest) goes to Avro 1.10.1 (latest being 1.10.2).

What's the plan in this case, where dependencies reach their end of life?
Does the Beam plan on staying on the 5.3.2 version even if they are not
supported anymore?

Thanks,
Cristian

[1]
https://docs.confluent.io/platform/current/installation/versions-interoperability.html

On Mon, Oct 18, 2021 at 2:42 PM Brian Hulette  wrote:

> Note there was some work done to make Beam work with Avro 1.9 [1]
> (presumably this only works with joda time though? certainly the bytebuddy
> generated code would). Avro 1.9 compatibility is not verified continuously
> though, there was an effort a while ago to test Beam against multiple
> versions of Avro [2] but I don't think it went anywhere. There's also some
> discussion about API compatibility concerns in [2].
>
> Brian
>
> [1] https://issues.apache.org/jira/browse/BEAM-9144
> [2]
> https://lists.apache.org/thread.html/r2739eb540ea2b8d8c50db71850361b8b1347df66e4357001b334987b%40%3Cdev.beam.apache.org%3E
>
> On Mon, Oct 18, 2021 at 11:19 AM Daniel Collins 
> wrote:
>
>> > I see a lot of static calls
>>
>> A lot of these calls bottom out in usage of ServiceLoader
>> ,
>> effectively acting as an ergonomic API on top of it, with AutoService
>>  used to register
>> new handlers. If they're not currently, and there's some extension point
>> which would be helpful to you, its quite likely that you could get buy in
>> to adding another such extension point.
>>
>> > after I find a workaround
>>
>> I think if the issue is that AvroUtils currently uses an old version of
>> avro, forking it into your own package and version bumping it might be a
>> good place to start for a workaround. Any changes committed to the beam
>> repo will take 1-2 months to make it to a non-snapshot build even if you do
>> find a long term solution acceptable to all interested parties.
>>
>> -Daniel
>>
>> On Mon, Oct 18, 2021 at 1:46 PM Cristian Constantinescu 
>> wrote:
>>
>>> I will have a look after I find a workaround as I really need to deliver
>>> some things and using Avro 1.8 isn't really an option.
>>>
>>> But once that's done, I'd love to find ways to make Beam less dependent
>>> on Avro 1.8 considering it was released in 2017.
>>>
>>> On Mon, Oct 18, 2021 at 12:34 PM Reuven Lax  wrote:
>>>
 Do you know if it's easy to detect which version of Avro is being used?

 On Sun, Oct 17, 2021 at 10:20 PM Cristian Constantinescu <
 zei...@gmail.com> wrote:

> If I had to change things, I would:
>
> 1. When deriving the SCHEMA add a few new types (JAVA_TIME, JAVA_DATE
> or something along those lines).
> 2. RowCoderGenerator around line 159 calls
> "SchemaCoder.coderForFieldType(schema.getField(rowIndex).getType().withNullable(false));"
> which eventually gets to SchemaCoderHelpers.coderForFieldType. There,
> CODER_MAP has a hard reference on InstantCoder for DATETIME. Maybe that 
> map
> can be augmented (possibly dynamically) with new
> fieldtypes-coder combinations to take care of the new types from #1.
>
> I would also like to ask. Looking through the Beam code, I see a lot
> of static calls. Just wondering why it's done this way. I'm used to
> projects having some form of dependency injection involved and static 
> calls
> being frowned upon (lack of mockability, hidden dependencies, tight
> coupling etc). The only reason I can think of is serializability given
> Beam's multi-node processing?
>
>
> On Sat, Oct 16, 2021 at 3:11 AM Reuven Lax  wrote:
>
>> Is the Schema inference the only reason we can't upgrade Avro, or are
>> there other blockers? Is there any way we can tell at runtime which 
>> version
>> of Avro is running? Since we generate the conversion code at runtime with
>> ByteBuddy, we could potentially just generate different conversions
>> depending on the Avro version.
>>
>> On Fri, Oct 15, 2021 at 11:56 PM Cristian Constantinescu <
>> zei...@gmail.com> wrote:
>>
>>> Those are fair points. However please consider that there might be
>>> new users who will decide that Beam isn't suitable because of things 
>>> like
>>> requiring Avro 1.8, Joda time, old Confluent libraries, and, when I 
>>> started
>>> using Beam about a year ago, Java 8 (I think we're okay with Java 11 
>>> now).
>>>
>>> I guess what I'm saying is that there's definitely a