[Question] RelNode to PTransform possible?
Hello Team, I have a use case where I need Apache Calcite to parse, validate the query and then apply some planner rules. I was hoping to leverage the capabilities of the Apache Beam on top of Calcite so that I can build on top of what we have already done, and apply it at scale. One approach that came to mind was to either convert the RelNode to something equivalent in Beam (like PTransform maybe? https://beam.apache.org/releases/javadoc/2.29.0/org/apache/beam/sdk/extensions/sql/impl/rel/BeamRelNode.html ). Any opinion/suggestion on this will be really appreciated! Thanks for your time and consideration! With regards, Soumyadeep Mukhopadhyay.
Beam Go now has a v2.48.2 release.
Hi Beam Dev List! This is to report on an issue that occurred with the v2.48.0 Go SDK release and it's resolution. While generally poor form, Ritesh, Jack, and I independently resolved the issue instead of first mailing the dev list about it first. We decided that fixing the error for the Go SDK was better for the community than delaying such a fix through a discussion and vote. We do believe the issue is resolved, and there are now sufficient guard rails against a recurrence. However, it's still critical we email the community about it, so here it is. tl;dr; Due to an error in tagging, the v2.48.0 release was trying to use the wrong SDK container, and it was still trying to use the ".dev" version. We had to add a new Go SDK specific tag of `sdks/v2.48.2` to resolve the issue and ensure that tag was on the right RC commit. This was tracked in https://github.com/apache/beam/issues/27064, The longer story: This morning a user filed an issue [0]. Due to Go's unique package release strategy, it's not possible to simply "move the tag to a new commit", since the module proxy and similar would already have distributed the previous versions of the source. This property enables robust "supply chain" security, and avoids mismatches or maliciousness. The only resolution to a bad is to release a patch version, which for Go, is as simple as adding an appropriate tag. The Go SDK has its own "tag series" prefixed with "sdks/" since that folder is where the SDK's go.mod file lives. We judged that the cost of the Go SDK version being slightly out of sync with the main line version to be acceptable, given that Beam doesn't presently do Patch releases. No other changes were done to avoid a full container build. Adding a tag version of v2.48.0 to a working commit would unbreak the Go SDK release. The error occurred because with 2.48.0, the release manager was using the new Github Action to get the RC tags instead of the manual script. The action worked fine however and did that job correctly. Since the RC_TAG variable in the release guide [1] is unspecified in the guide, the Release Manager ended up running `git tag -s "sdks/v2.48.0"` which adds the tag to the HEAD commit of the current branch, instead of to the commit associated with the RC tag. So, the Release Manager ended up running the command again, leading to the same result. A bit of investigation showed that it was possible for Tags to get out of sync in the local branch, vs what the Github action did. However this burned the sdks/v2.48.1 tag. The sync issue was resolved by a ` git fetch --all --tags` and the RC tag commit confirmed `git rev-list ${RC_TAG} -n 1`, leading to the 2nd fix attempt with v2.48.2, which has resolved the Go SDK issue. The Release Guide has been updated [2] to make checking this explicit, though hopefully, this step will be obsolete when it's moved to github actions. But until then, we may as well avoid the error. The 2.48.0 release blog and notes have been updated to note the discrepancy as well. Thank you for your understanding and time, Robert Burke Beam Go Busybody [0] https://github.com/apache/beam/issues/27064 [1] https://beam.apache.org/contribute/release-guide/#git-tag [2] https://github.com/apache/beam/pull/27070
Beam High Priority Issue Report (37)
This is your daily summary of Beam's current high priority issues that may need attention. See https://beam.apache.org/contribute/issue-priorities for the meaning and expectations around issue priorities. Unassigned P1 Issues: https://github.com/apache/beam/issues/27019 [Failing Test]: Azure Integration test is failing in python 3.7 PostCommit https://github.com/apache/beam/issues/27012 [Bug]: Beam Website cannot run locally on Mac https://github.com/apache/beam/issues/26981 [Bug]: Getting an error related to SchemaCoder after upgrading to 2.48 https://github.com/apache/beam/issues/26969 [Failing Test]: Python PostCommit is failing due to exceeded rate limits https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested ROW (described below) https://github.com/apache/beam/issues/26547 [Failing Test]: beam_PostCommit_Java_DataflowV2 https://github.com/apache/beam/issues/26354 [Bug]: BigQueryIO direct read not reading all rows when set --setEnableBundling=true https://github.com/apache/beam/issues/26343 [Bug]: apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is flaky https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not propagate a Coder to AvroSource https://github.com/apache/beam/issues/26272 [Failing Test]: Python 3.7 postcommit is red https://github.com/apache/beam/issues/26041 [Bug]: Unable to create exactly-once Flink pipeline with stream source and file sink https://github.com/apache/beam/issues/25975 [Bug]: Reducing parallelism in FlinkRunner leads to a data loss https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK Harness ProcessBundleProgress https://github.com/apache/beam/issues/24389 [Failing Test]: HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError ContainerFetchException https://github.com/apache/beam/issues/24313 [Flaky]: apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder https://github.com/apache/beam/issues/23944 beam_PreCommit_Python_Cron regularily failing - test_pardo_large_input flaky https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle https://github.com/apache/beam/issues/22913 [Bug]: beam_PostCommit_Java_ValidatesRunner_Flink is flakes in org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it https://github.com/apache/beam/issues/21714 PulsarIOTest.testReadFromSimpleTopic is very flaky https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, testBigQueryStorageWrite30MProto failing consistently https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit test action StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial (order 1000 elements) numpy input flakes in non-cython environment https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table destinations returns wrong tableId https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: Connection refused https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) failing: ParDoTest$TimestampTests/OnWindowExpirationTests https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not follow spec https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit data at GC time https://github.com/apache/beam/issues/21121 apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it flakey https://github.com/apache/beam/issues/21104 Flaky: apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers https://github.com/apache/beam/issues/20976 apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics is flaky https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit empty pane when it should https://github.com/apache/beam/issues/19814 Flink streaming flakes in ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful and ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful https://github.com/apache/beam/issues/19465 Explore possibilities to lower in-use IP address quota footprint. P1 Issues with no update in the last week: https://github.com/apache/beam/issues/26723 [Failing Test]: Tour of Beam Frontend Test suite is perma-red on master https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder will drop message id and orderingKey https://github.