P1 issues report (46)

2021-09-25 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky 
tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20priority%20%3D%20P1%20AND%20(labels%20is%20EMPTY%20OR%20labels%20!%3D%20flake).

See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the 
meaning and expectations around P1 issues.

https://issues.apache.org/jira/browse/BEAM-12958: All "Wordcount Dataflow" 
GHA checks failing with PERMISSION_DENIED (created 2021-09-24)
https://issues.apache.org/jira/browse/BEAM-12950: Missing events when using 
Python WriteToFiles in streaming pipeline (created 2021-09-24)
https://issues.apache.org/jira/browse/BEAM-12867: Either Create or 
DirectRunner fails to produce all elements to the following transform (created 
2021-09-09)
https://issues.apache.org/jira/browse/BEAM-12843: (Broken Pipe induced) 
Bricked Dataflow Pipeline  (created 2021-09-06)
https://issues.apache.org/jira/browse/BEAM-12807: Java creates an incorrect 
pipeline proto when core-construction-java jar is not in the CLASSPATH (created 
2021-08-26)
https://issues.apache.org/jira/browse/BEAM-12794: 
PortableRunnerTestWithExternalEnv.test_pardo_timers flaky (created 2021-08-24)
https://issues.apache.org/jira/browse/BEAM-12792: Beam worker only installs 
--extra_package once (created 2021-08-24)
https://issues.apache.org/jira/browse/BEAM-12781: SDFBoundedSourceReader 
behaves much slower compared with the original behavior of BoundedSource 
(created 2021-08-20)
https://issues.apache.org/jira/browse/BEAM-12766: Already Exists: Dataset 
apache-beam-testing:python_bq_file_loads_NNN (created 2021-08-16)
https://issues.apache.org/jira/browse/BEAM-12632: ElasticsearchIO: Enabling 
both User/Pass auth and SSL overwrites User/Pass (created 2021-07-16)
https://issues.apache.org/jira/browse/BEAM-12607: Copy Code Snippet copies 
html tags (created 2021-07-13)
https://issues.apache.org/jira/browse/BEAM-12540: 
beam_PostRelease_NightlySnapshot - Task 
:runners:direct-java:runMobileGamingJavaDirect FAILED (created 2021-06-25)
https://issues.apache.org/jira/browse/BEAM-12525: SDF BoundedSource seems 
to execute significantly slower than 'normal' BoundedSource (created 2021-06-22)
https://issues.apache.org/jira/browse/BEAM-12505: codecov/patch has poor 
behavior (created 2021-06-17)
https://issues.apache.org/jira/browse/BEAM-12500: Dataflow SocketException 
(SSLException) error while trying to send message from Cloud Pub/Sub to 
BigQuery (created 2021-06-16)
https://issues.apache.org/jira/browse/BEAM-12484: JdbcIO date conversion is 
sensitive to OS (created 2021-06-14)
https://issues.apache.org/jira/browse/BEAM-12467: 
java.io.InvalidClassException With Flink Kafka (created 2021-06-09)
https://issues.apache.org/jira/browse/BEAM-12380: Go SDK Kafka IO Transform 
implemented via XLang (created 2021-05-21)
https://issues.apache.org/jira/browse/BEAM-12310: 
beam_PostCommit_Java_DataflowV2 failing (created 2021-05-07)
https://issues.apache.org/jira/browse/BEAM-12279: Implement 
destination-dependent sharding in FileIO.writeDynamic (created 2021-05-04)
https://issues.apache.org/jira/browse/BEAM-12256: 
PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode some 
Avro logical types (created 2021-04-29)
https://issues.apache.org/jira/browse/BEAM-11959: Python Beam SDK Harness 
hangs when installing pip packages (created 2021-03-11)
https://issues.apache.org/jira/browse/BEAM-11906: No trigger early 
repeatedly for session windows (created 2021-03-01)
https://issues.apache.org/jira/browse/BEAM-11875: XmlIO.Read does not 
handle XML encoding per spec (created 2021-02-26)
https://issues.apache.org/jira/browse/BEAM-11828: JmsIO is not 
acknowledging messages correctly (created 2021-02-17)
https://issues.apache.org/jira/browse/BEAM-11755: Cross-language 
consistency (RequiresStableInputs) is quietly broken (at least on portable 
flink runner) (created 2021-02-05)
https://issues.apache.org/jira/browse/BEAM-11578: `dataflow_metrics` 
(python) fails with TypeError (when int overflowing?) (created 2021-01-06)
https://issues.apache.org/jira/browse/BEAM-11148: Kafka 
commitOffsetsInFinalize OOM on Flink (created 2020-10-28)
https://issues.apache.org/jira/browse/BEAM-11017: Timer with dataflow 
runner can be set multiple times (dataflow runner) (created 2020-10-05)
https://issues.apache.org/jira/browse/BEAM-10670: Make non-portable 
Splittable DoFn the only option when executing Java "Read" transforms (created 
2020-08-10)
https://issues.apache.org/jira/browse/BEAM-10617: python 
CombineGlobally().with_fanout() cause duplicate combine results for sliding 
windows (created 2020-07-31)
https://issues.apache.org/jira/browse/BEAM-10569: SpannerIO tests don't 
actually assert anything. (created 2020-07-23)
https://issues.apache.org/jira/browse

Flaky test issue report (30)

2021-09-25 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20labels%20%3D%20flake)

These are P1 issues because they have a major negative impact on the community 
and make it hard to determine the quality of the software.

https://issues.apache.org/jira/browse/BEAM-12928: beam_PostCommit_Python36 
- CrossLanguageSpannerIOTest - flakey failing (created 2021-09-21)
https://issues.apache.org/jira/browse/BEAM-12912: 
:runners:direct-java:runMobileGamingJavaDirect flakey failure in 
PostRelease_NightlySnapshot (created 2021-09-17)
https://issues.apache.org/jira/browse/BEAM-12861: 
apache_beam.ml.gcp.recommendations_ai_test_it.RecommendationAIIT.test_create_catalog_item
  is flaky (created 2021-09-09)
https://issues.apache.org/jira/browse/BEAM-12859: 
org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
 is flaky (created 2021-09-08)
https://issues.apache.org/jira/browse/BEAM-12809: 
testTwoTimersSettingEachOtherWithCreateAsInputBounded flaky (created 2021-08-26)
https://issues.apache.org/jira/browse/BEAM-12694: DICOMIoIntegrationTest 
flaky due to store ID (Python PreCommit) (created 2021-07-30)
https://issues.apache.org/jira/browse/BEAM-12515: Python PreCommit flaking 
in PipelineOptionsTest.test_display_data (created 2021-06-18)
https://issues.apache.org/jira/browse/BEAM-12322: Python precommit flaky: 
Failed to read inputs in the data plane (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12320: 
PubsubTableProviderIT.testSQLSelectsArrayAttributes[0] failing in SQL 
PostCommit (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12291: 
org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming: 
false] is flaky (created 2021-05-05)
https://issues.apache.org/jira/browse/BEAM-12200: 
SamzaStoreStateInternalsTest is flaky (created 2021-04-20)
https://issues.apache.org/jira/browse/BEAM-12163: Python GHA PreCommits 
flake with grpc.FutureTimeoutError on SDK harness startup (created 2021-04-13)
https://issues.apache.org/jira/browse/BEAM-12061: beam_PostCommit_SQL 
failing on KafkaTableProviderIT.testFakeNested (created 2021-03-27)
https://issues.apache.org/jira/browse/BEAM-11837: Java build flakes: 
"Memory constraints are impeding performance" (created 2021-02-18)
https://issues.apache.org/jira/browse/BEAM-11661: hdfsIntegrationTest 
flake: network not found (py38 postcommit) (created 2021-01-19)
https://issues.apache.org/jira/browse/BEAM-11645: beam_PostCommit_XVR_Flink 
failing (created 2021-01-15)
https://issues.apache.org/jira/browse/BEAM-11641: Bigquery Read tests are 
flaky on Flink runner in Python PostCommit suites (created 2021-01-15)
https://issues.apache.org/jira/browse/BEAM-11541: 
testTeardownCalledAfterExceptionInProcessElement flakes on direct runner. 
(created 2020-12-30)
https://issues.apache.org/jira/browse/BEAM-10955: Flink Java Runner test 
flake: Could not find Flink job (FlinkJobNotFoundException) (created 2020-09-23)
https://issues.apache.org/jira/browse/BEAM-10866: 
PortableRunnerTestWithSubprocesses.test_register_finalizations flaky on macOS 
(created 2020-09-09)
https://issues.apache.org/jira/browse/BEAM-10485: Failure / flake: 
ElasticsearchIOTest > testWriteWithIndexFn (created 2020-07-14)
https://issues.apache.org/jira/browse/BEAM-9649: 
beam_python_mongoio_load_test started failing due to mismatched results 
(created 2020-03-31)
https://issues.apache.org/jira/browse/BEAM-8453: Failure in 
org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMarkSafety (created 
2019-10-21)
https://issues.apache.org/jira/browse/BEAM-8101: Flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful for 
Direct, Spark, Flink (created 2019-08-27)
https://issues.apache.org/jira/browse/BEAM-8035: 
WatchTest.testMultiplePollsWithManyResults flake: Outputs must be in timestamp 
order (sickbayed) (created 2019-08-22)
https://issues.apache.org/jira/browse/BEAM-7827: 
MetricsTest$AttemptedMetricTests.testAllAttemptedMetrics is flaky on 
DirectRunner (created 2019-07-26)
https://issues.apache.org/jira/browse/BEAM-7752: Java Validates 
DirectRunner: testTeardownCalledAfterExceptionInFinishBundleStateful flaky 
(created 2019-07-16)
https://issues.apache.org/jira/browse/BEAM-6804: [beam_PostCommit_Java] 
[PubsubReadIT.testReadPublicData] Timeout waiting on Sub (created 2019-03-11)
https://issues.apache.org/jira/browse/BEAM-5286: 
[beam_PostCommit_Java_GradleBuild][org.apache.beam.examples.subprocess.ExampleEchoPipelineTest.testExampleEchoPipeline][Flake]
 .sh script: text file busy. (created 2018-09-01)
https://issues.apache.org/jira/browse/BEAM-5172: 
org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest is flaky (created 
2018-08-20)


Re: Documenting per-key delivery semantics for various runners

2021-09-25 Thread Jan Lukavský
+1 to adding a Pipeline requirement for this, if business logic relies 
on a specific feature runner might/might not have, then Pipeline should 
be rejected on runners that do not support it. Do we have a list runners 
that have or lack this semantics? Just for clarification - sorry my 
ignorance, if this has been already described - do we have a description 
of the use-cases that drive this effort?


Thanks,

 Jan

On 9/24/21 10:58 PM, Robert Bradshaw wrote:

Thanks for writing this up. Rather than just documenting it, should we
have a way of asserting/requesting it (like time sorted inputs) such
that a pipeline author that needs to rely on this property can be
rejected on runners that don't provide it?

On Fri, Sep 24, 2021 at 12:25 PM Kenneth Knowles  wrote:

Took a look. I definitely agree that something like this is useful, and 
well-motivated by the use cases you raise.

Kenn

On Thu, Sep 23, 2021 at 4:30 PM Pablo Estrada  wrote:

Hi all,
I've been spending some time thinking about CDC use cases on Beam. One valuable 
piece to enable these use cases is to define how Beam deals with ordering of 
elements in streaming pipelines.
With that in mind, I wrote a document[1] that proposes a definition of the 
ordering semantics supported by most Beam runners, and a pull request [2] with 
ValidatesRunner tests and documentation updates.

Would you please review these, add your comments and thoughts, and let me know 
if they make sense?

Thanks!
-P.

[1] 
https://docs.google.com/document/d/1_7WRJznXlOtWuVaHl_dpy8OZcx_M8BUmeWVA4G0-wEc/edit#
[2] https://github.com/apache/beam/pull/15378