Re: Beam + Google Summer of Code 2024

2024-05-02 Thread Svetak Sundhar via dev
Welcome and welcome (back)!


Svetak Sundhar

  Data Engineer
s vetaksund...@google.com



On Thu, May 2, 2024 at 2:06 AM Reeba Qureshi  wrote:

> Hello everyone!
>
> I'm really excited to be working with Apache Beam again! Looking forward
> to it!
>
> Thanks,
> Reeba
>
> On Thu, 2 May, 2024, 10:04 Ayush Pandey,  wrote:
>
>> Hi Danny,
>>
>> Thank you for the kind introduction. I really look forward to
>> collaborating with and learning from this amazing community.
>>
>>
>> Best Regards,
>> Ayush
>>
>>
>> On Wed, 1 May 2024 at 14:40, XQ Hu  wrote:
>>
>>> Welcome to Beam!
>>>
>>> On Wed, May 1, 2024 at 4:13 PM Danny McCormick via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Hey everyone,

 It's my pleasure to announce 2 contributors have been accepted as GSoC
 students for Beam this year!

 Ayush Pandey will be working on a project to implement RAG example
 pipelines using Beam [1]. This will be a really valuable addition to Beam's
 ML offering, showing how users can leverage things like MLTransform and
 Enrichment for interacting with LLMs. @Jack McCluskey
  and I will be mentoring Ayush for this
 project.

 Reeba Qureshi will be working on adding new features to Beam Yaml,
 including onboarding new IOs and ML transforms [2]. This will help more
 fully round out our growing Yaml offering and should make low code
 pipelines even more attainable. Reeba also was a GSoC contributor last year
 [3] and we're really excited to have her back! @Jeff Kinard
  and I will be mentoring Reeba for this project.

 Welcome to the community Ayush, and welcome back Reeba!

 Thanks,
 Danny

 [1]
 https://docs.google.com/document/d/1M_8fvqKVBi68hQo_x1AMQ8iEkzeXTcSl0CwTH00cr80/edit#heading=h.mp9iumh7r8v
 [2]
 https://docs.google.com/document/d/1vXj1qhy0Asiosn3gFDgYVKYQs3Lsyj972klSv5_hfG8/edit
 [3] https://lists.apache.org/thread/5yb0jr41xg1xonlxr97p0o06mnk3ktbb

>>>


Beam High Priority Issue Report (54)

2024-05-02 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/31122 The PostCommit Go VR Flink job is 
flaky
https://github.com/apache/beam/issues/31074 The StressTests Java KafkaIO job is 
flaky
https://github.com/apache/beam/issues/31047 [Feature Request]: Support Type 
Inference on Python 3.12
https://github.com/apache/beam/issues/31040 [Bug]: ReadAllFiles does not fully 
read gzipped files from GCS
https://github.com/apache/beam/issues/30757 [Bug]: Beam Playground scio 
examples cannot run
https://github.com/apache/beam/issues/30737 [Failing Test]: Playground 
PreCommit failing goLint
https://github.com/apache/beam/issues/30644 The Inference Python Benchmarks 
Dataflow job is flaky
https://github.com/apache/beam/issues/30612 The Playground CI Nightly job is 
flaky
https://github.com/apache/beam/issues/30606 The PostCommit Java Nexmark 
Dataflow job is flaky
https://github.com/apache/beam/issues/30529 The PostCommit Java Sickbay job is 
flaky
https://github.com/apache/beam/issues/30527 The PostCommit Java IO Performance 
Tests job is flaky
https://github.com/apache/beam/issues/30526 The PerformanceTests xlang KafkaIO 
Python job is flaky
https://github.com/apache/beam/issues/30525 The PostCommit Python 
ValidatesContainer Dataflow With RC job is flaky
https://github.com/apache/beam/issues/30521 The LoadTests Go Combine Flink 
Batch job is flaky
https://github.com/apache/beam/issues/30520 The LoadTests Python Combine Flink 
Streaming job is flaky
https://github.com/apache/beam/issues/30519 The PostCommit XVR GoUsingJava 
Dataflow job is flaky
https://github.com/apache/beam/issues/30517 The PostCommit XVR Direct job is 
flaky
https://github.com/apache/beam/issues/30513 The PostCommit Python job is flaky
https://github.com/apache/beam/issues/30507 The LoadTests Go GBK Flink Batch 
job is flaky
https://github.com/apache/beam/issues/30506 The TypeScript Tests job is flaky
https://github.com/apache/beam/issues/30503 The PostCommit Java ValidatesRunner 
Flink Java11 job is flaky
https://github.com/apache/beam/issues/30502 The LoadTests Go CoGBK Flink Batch 
job is flaky
https://github.com/apache/beam/issues/29971 [Bug]: FixedWindows not working for 
large Kafka topic
https://github.com/apache/beam/issues/29926 [Bug]: FileIO: lack of timeouts may 
cause the pipeline to get stuck indefinitely
https://github.com/apache/beam/issues/29099 [Bug]: FnAPI Java SDK Harness 
doesn't update user counters in OnTimer callback functions
https://github.com/apache/beam/issues/28760 [Bug]: EFO Kinesis IO reader 
provided by apache beam does not pick the event time for watermarking
https://github.com/apache/beam/issues/28383 [Failing Test]: 
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorkerTest.testMaxThreadMetric
https://github.com/apache/beam/issues/28326 Bug: 
apache_beam.io.gcp.pubsublite.ReadFromPubSubLite not working
https://github.com/apache/beam/issues/27892 [Bug]: ignoreUnknownValues not 
working when using CreateDisposition.CREATE_IF_NEEDED 
https://github.com/apache/beam/issues/27616 [Bug]: Unable to use 
applyRowMutations() in bigquery IO apache beam java
https://github.com/apache/beam/issues/27486 [Bug]: Read from datastore with 
inequality filters
https://github.com/apache/beam/issues/27314 [Failing Test]: 
bigquery.StorageApiSinkCreateIfNeededIT.testCreateManyTables[1]
https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when 
using Kafka and GroupByKey on Dataflow Runner
https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested 
ROW (described below)
https://github.com/apache/beam/issues/26343 [Bug]: 
apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is 
flaky
https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not 
propagate a Coder to AvroSource
https://github.com/apache/beam/issues/26041 [Bug]: Unable to create 
exactly-once Flink pipeline with stream source and file sink
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState