Re: [question] Good Course to learn beam

2022-08-30 Thread Evan Galpin
+dev  for additional visibility/input

On Mon, Aug 29, 2022 at 11:10 AM Leandro Nahabedian via user <
u...@beam.apache.org> wrote:

> Hi community!
>
> I'm looking for a good course to learn apache beam and I saw this one
> 
> which I believe is good, since it has very good feedback from students.
> What do you think?
>
> Thanks in advance for your help
>
> Cheers,
> Leandro
>
> --
> [image: dialpad] 
> __
> *Leandro Nahabedian *
> Data Engineer
> O: 908.883.4369
>
>


Re: [idea] A new IO connector named DataLakeIO, which support to connect Beam and data lake, such as Delta Lake, Apache Hudi, Apache iceberg.

2022-08-30 Thread Sachin Agarwal via dev
I would posit that something is better than nothing - did we ever see that
generic implementation?

On Tue, Aug 30, 2022 at 10:22 AM Austin Bennett 
wrote:

> Is there enough commonality across Delta, Hudi, Iceberg for this generic
> solution?  I imagined we'd potentially have individual IOs for each.  A
> generic one seems possible, but certainly would like to learn more.
>
> Also, are others in the community working on connectors for ANY of those
> Delta Lake, Hudi, or Iceberg IOs?  Would hope for some form of coordination
> and/or at least awareness between people addressing
> complementary/overlapping areas.
>
> On Mon, Aug 29, 2022 at 4:15 PM Neil Kolban via dev 
> wrote:
>
>> Howdy,
>> I have a client who would be interested to use this.  Is there a link to
>> a GitHub repo or other place I can read more?
>>
>> Neil  (kol...@google.com)
>>
>> On 2022/08/05 07:23:31 张涛 wrote:
>> >
>> > Hi, we developed a new IO connector named DataLakeIO, to connect Beam
>> and data lake, such as Delta Lake, Apache Hudi, Apache iceberg. Beam can
>> use DataLakeIO to read data from data lake, and write data to data lake. We
>> did not find data lake IO on
>> https://beam.apache.org/documentation/io/built-in/, we want to
>> contribute this new IO connector to Beam, what should we do next? Thank you
>> very much!
>>
>


Re: [idea] A new IO connector named DataLakeIO, which support to connect Beam and data lake, such as Delta Lake, Apache Hudi, Apache iceberg.

2022-08-30 Thread Austin Bennett
Is there enough commonality across Delta, Hudi, Iceberg for this generic
solution?  I imagined we'd potentially have individual IOs for each.  A
generic one seems possible, but certainly would like to learn more.

Also, are others in the community working on connectors for ANY of those
Delta Lake, Hudi, or Iceberg IOs?  Would hope for some form of coordination
and/or at least awareness between people addressing
complementary/overlapping areas.

On Mon, Aug 29, 2022 at 4:15 PM Neil Kolban via dev 
wrote:

> Howdy,
> I have a client who would be interested to use this.  Is there a link to a
> GitHub repo or other place I can read more?
>
> Neil  (kol...@google.com)
>
> On 2022/08/05 07:23:31 张涛 wrote:
> >
> > Hi, we developed a new IO connector named DataLakeIO, to connect Beam
> and data lake, such as Delta Lake, Apache Hudi, Apache iceberg. Beam can
> use DataLakeIO to read data from data lake, and write data to data lake. We
> did not find data lake IO on
> https://beam.apache.org/documentation/io/built-in/, we want to contribute
> this new IO connector to Beam, what should we do next? Thank you very much!
>


Beam High Priority Issue Report (69)

2022-08-30 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is failing
https://github.com/apache/beam/issues/22779 [Bug]: SpannerIO.readChangeStream() 
stops forwarding change records and starts continuously throwing (large number) 
of Operation ongoing errors 
https://github.com/apache/beam/issues/22749 [Bug]: Bytebuddy version update 
causes Invisible parameter type error
https://github.com/apache/beam/issues/22743 [Bug]: Test flake: 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImplTest.testInsertWithinRowCountLimits
https://github.com/apache/beam/issues/22440 [Bug]: Python Batch Dataflow 
SideInput LoadTests failing
https://github.com/apache/beam/issues/22321 
PortableRunnerTestWithExternalEnv.test_pardo_large_input is regularly failing 
on jenkins
https://github.com/apache/beam/issues/22303 [Task]: Add tests to Kafka SDF and 
fix known and discovered issues
https://github.com/apache/beam/issues/22299 [Bug]: JDBCIO Write freeze at 
getConnection() in WriteFn
https://github.com/apache/beam/issues/22283 [Bug]: Python Lots of fn runner 
test items cost exactly 5 seconds to run
https://github.com/apache/beam/issues/21794 Dataflow runner creates a new timer 
whenever the output timestamp is change
https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output 
to Failed Inserts PCollection
https://github.com/apache/beam/issues/21704 beam_PostCommit_Java_DataflowV2 
failures parent bug
https://github.com/apache/beam/issues/21703 pubsublite.ReadWriteIT failing in 
beam_PostCommit_Java_DataflowV1 and V2
https://github.com/apache/beam/issues/21702 SpannerWriteIT failing in beam 
PostCommit Java V1
https://github.com/apache/beam/issues/21701 beam_PostCommit_Java_DataflowV1 
failing with a variety of flakes and errors
https://github.com/apache/beam/issues/21700 
--dataflowServiceOptions=use_runner_v2 is broken
https://github.com/apache/beam/issues/21696 Flink Tests failure :  
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.beam.runners.core.construction.SerializablePipelineOptions 
https://github.com/apache/beam/issues/21695 DataflowPipelineResult does not 
raise exception for unsuccessful states.
https://github.com/apache/beam/issues/21694 BigQuery Storage API insert with 
writeResult retry and write to error table
https://github.com/apache/beam/issues/21480 flake: 
FlinkRunnerTest.testEnsureStdoutStdErrIsRestored
https://github.com/apache/beam/issues/21472 Dataflow streaming tests failing 
new AfterSynchronizedProcessingTime test
https://github.com/apache/beam/issues/21471 Flakes: Failed to load cache entry
https://github.com/apache/beam/issues/21470 Test flake: test_split_half_sdf
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21468 
beam_PostCommit_Python_Examples_Dataflow failing
https://github.com/apache/beam/issues/21467 GBK and CoGBK streaming Java load 
tests failing
https://github.com/apache/beam/issues/21465 Kafka commit offset drop data on 
failure for runners that have non-checkpointing shuffle
https://github.com/apache/beam/issues/21463 NPE in Flink Portable 
ValidatesRunner streaming suite
https://github.com/apache/beam/issues/21462 Flake in 
org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use
https://github.com/apache/beam/issues/21271 pubsublite.ReadWriteIT flaky in 
beam_PostCommit_Java_DataflowV2  
https://github.com/apache/beam/issues/21270 
org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView
 flaky on Dataflow Runner V2
https://github.com/apache/beam/issues/21268 Race between member variable being 
accessed due to leaking uninitialized state via OutboundObserverFactory
https://github.com/apache/beam/issues/21267 WriteToBigQuery submits a duplicate 
BQ load job if a 503 error code is returned from googleapi
https://github.com/apache/beam/issues/21266 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
 is flaky in Java ValidatesRunner Flink suite.
https://github.com/apache/beam/issues/21265 
apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
https://github.com/apache/beam/issues/21263 (Broken Pipe induced) Bricked 
Dataflow Pipeline 
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21261 
org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
 is flaky