Re: Beam + Google Summer of Code 2024
Welcome and welcome (back)! Svetak Sundhar Data Engineer s vetaksund...@google.com On Thu, May 2, 2024 at 2:06 AM Reeba Qureshi wrote: > Hello everyone! > > I'm really excited to be working with Apache Beam again! Looking forward > to it! > > Thanks, > Reeba > > On Thu, 2 May, 2024, 10:04 Ayush Pandey, wrote: > >> Hi Danny, >> >> Thank you for the kind introduction. I really look forward to >> collaborating with and learning from this amazing community. >> >> >> Best Regards, >> Ayush >> >> >> On Wed, 1 May 2024 at 14:40, XQ Hu wrote: >> >>> Welcome to Beam! >>> >>> On Wed, May 1, 2024 at 4:13 PM Danny McCormick via dev < >>> dev@beam.apache.org> wrote: >>> Hey everyone, It's my pleasure to announce 2 contributors have been accepted as GSoC students for Beam this year! Ayush Pandey will be working on a project to implement RAG example pipelines using Beam [1]. This will be a really valuable addition to Beam's ML offering, showing how users can leverage things like MLTransform and Enrichment for interacting with LLMs. @Jack McCluskey and I will be mentoring Ayush for this project. Reeba Qureshi will be working on adding new features to Beam Yaml, including onboarding new IOs and ML transforms [2]. This will help more fully round out our growing Yaml offering and should make low code pipelines even more attainable. Reeba also was a GSoC contributor last year [3] and we're really excited to have her back! @Jeff Kinard and I will be mentoring Reeba for this project. Welcome to the community Ayush, and welcome back Reeba! Thanks, Danny [1] https://docs.google.com/document/d/1M_8fvqKVBi68hQo_x1AMQ8iEkzeXTcSl0CwTH00cr80/edit#heading=h.mp9iumh7r8v [2] https://docs.google.com/document/d/1vXj1qhy0Asiosn3gFDgYVKYQs3Lsyj972klSv5_hfG8/edit [3] https://lists.apache.org/thread/5yb0jr41xg1xonlxr97p0o06mnk3ktbb >>>
Beam High Priority Issue Report (54)
This is your daily summary of Beam's current high priority issues that may need attention. See https://beam.apache.org/contribute/issue-priorities for the meaning and expectations around issue priorities. Unassigned P1 Issues: https://github.com/apache/beam/issues/31122 The PostCommit Go VR Flink job is flaky https://github.com/apache/beam/issues/31074 The StressTests Java KafkaIO job is flaky https://github.com/apache/beam/issues/31047 [Feature Request]: Support Type Inference on Python 3.12 https://github.com/apache/beam/issues/31040 [Bug]: ReadAllFiles does not fully read gzipped files from GCS https://github.com/apache/beam/issues/30757 [Bug]: Beam Playground scio examples cannot run https://github.com/apache/beam/issues/30737 [Failing Test]: Playground PreCommit failing goLint https://github.com/apache/beam/issues/30644 The Inference Python Benchmarks Dataflow job is flaky https://github.com/apache/beam/issues/30612 The Playground CI Nightly job is flaky https://github.com/apache/beam/issues/30606 The PostCommit Java Nexmark Dataflow job is flaky https://github.com/apache/beam/issues/30529 The PostCommit Java Sickbay job is flaky https://github.com/apache/beam/issues/30527 The PostCommit Java IO Performance Tests job is flaky https://github.com/apache/beam/issues/30526 The PerformanceTests xlang KafkaIO Python job is flaky https://github.com/apache/beam/issues/30525 The PostCommit Python ValidatesContainer Dataflow With RC job is flaky https://github.com/apache/beam/issues/30521 The LoadTests Go Combine Flink Batch job is flaky https://github.com/apache/beam/issues/30520 The LoadTests Python Combine Flink Streaming job is flaky https://github.com/apache/beam/issues/30519 The PostCommit XVR GoUsingJava Dataflow job is flaky https://github.com/apache/beam/issues/30517 The PostCommit XVR Direct job is flaky https://github.com/apache/beam/issues/30513 The PostCommit Python job is flaky https://github.com/apache/beam/issues/30507 The LoadTests Go GBK Flink Batch job is flaky https://github.com/apache/beam/issues/30506 The TypeScript Tests job is flaky https://github.com/apache/beam/issues/30503 The PostCommit Java ValidatesRunner Flink Java11 job is flaky https://github.com/apache/beam/issues/30502 The LoadTests Go CoGBK Flink Batch job is flaky https://github.com/apache/beam/issues/29971 [Bug]: FixedWindows not working for large Kafka topic https://github.com/apache/beam/issues/29926 [Bug]: FileIO: lack of timeouts may cause the pipeline to get stuck indefinitely https://github.com/apache/beam/issues/29099 [Bug]: FnAPI Java SDK Harness doesn't update user counters in OnTimer callback functions https://github.com/apache/beam/issues/28760 [Bug]: EFO Kinesis IO reader provided by apache beam does not pick the event time for watermarking https://github.com/apache/beam/issues/28383 [Failing Test]: org.apache.beam.runners.dataflow.worker.StreamingDataflowWorkerTest.testMaxThreadMetric https://github.com/apache/beam/issues/28326 Bug: apache_beam.io.gcp.pubsublite.ReadFromPubSubLite not working https://github.com/apache/beam/issues/27892 [Bug]: ignoreUnknownValues not working when using CreateDisposition.CREATE_IF_NEEDED https://github.com/apache/beam/issues/27616 [Bug]: Unable to use applyRowMutations() in bigquery IO apache beam java https://github.com/apache/beam/issues/27486 [Bug]: Read from datastore with inequality filters https://github.com/apache/beam/issues/27314 [Failing Test]: bigquery.StorageApiSinkCreateIfNeededIT.testCreateManyTables[1] https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when using Kafka and GroupByKey on Dataflow Runner https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested ROW (described below) https://github.com/apache/beam/issues/26343 [Bug]: apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is flaky https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not propagate a Coder to AvroSource https://github.com/apache/beam/issues/26041 [Bug]: Unable to create exactly-once Flink pipeline with stream source and file sink https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK Harness ProcessBundleProgress https://github.com/apache/beam/issues/24313 [Flaky]: apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder will drop message id and orderingKey https://github.com/apache/beam/issues/22913 [Bug]: beam_PostCommit_Java_ValidatesRunner_Flink is flakes in org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState