[Question] RelNode to PTransform possible?

2023-06-08 Thread Soumyadeep Mukhopadhyay
Hello Team,

I have a use case where I need Apache Calcite to parse, validate the query
and then apply some planner rules.

I was hoping to leverage the capabilities of the Apache Beam on top of
Calcite so that I can build on top of what we have already done, and apply
it at scale. One approach that came to mind was to either convert the
RelNode to something equivalent in Beam (like PTransform maybe?
https://beam.apache.org/releases/javadoc/2.29.0/org/apache/beam/sdk/extensions/sql/impl/rel/BeamRelNode.html
).

Any opinion/suggestion on this will be really appreciated! Thanks for your
time and consideration!

With regards,
Soumyadeep Mukhopadhyay.


Beam Go now has a v2.48.2 release.

2023-06-08 Thread Robert Burke
Hi Beam Dev List!

This is to report on an issue that occurred with the v2.48.0 Go SDK release
and it's resolution. While generally poor form, Ritesh, Jack, and I
independently resolved the issue instead of first mailing the dev list
about it first. We decided that fixing the error for the Go SDK was better
for the community than delaying such a fix through a discussion and vote.
We do believe the issue is resolved, and there are now sufficient guard
rails against a recurrence.

However, it's still critical we email the community about it, so here it is.

tl;dr;
Due to an error in tagging, the v2.48.0 release was trying to use the wrong
SDK container, and it was still trying to use the ".dev" version. We had to
add a new Go SDK specific tag of `sdks/v2.48.2` to resolve the issue and
ensure that tag was on the right RC commit.

This was tracked in https://github.com/apache/beam/issues/27064,

The longer story:

This morning a user filed  an issue [0]. Due to Go's unique package release
strategy, it's not possible to simply "move the tag to a new commit", since
the module proxy and similar would already have distributed the previous
versions of the source.  This property enables robust "supply chain"
security, and avoids mismatches or maliciousness.

The only resolution to a bad is to release a patch version, which for Go,
is as simple as adding an appropriate tag. The Go SDK has its own "tag
series" prefixed with "sdks/" since that folder is where the SDK's go.mod
file lives. We judged that the cost of the Go SDK version being slightly
out of sync with the main line version to be acceptable, given that Beam
doesn't presently do Patch releases. No other changes were done to avoid a
full container build. Adding a tag version of v2.48.0 to a working commit
would unbreak the Go SDK release.

The error occurred because with 2.48.0, the release manager was using the
new Github Action to get the RC tags instead of the manual script. The
action worked fine however and did that job correctly.

Since the RC_TAG variable in the release guide [1] is unspecified in the
guide, the Release Manager ended up running `git tag -s "sdks/v2.48.0"`
which adds the tag to the HEAD commit of the current branch, instead of to
the commit associated with the RC tag.

So, the Release Manager ended up running the command again, leading to the
same result. A bit of investigation showed that it was possible for Tags to
get out of sync in the local branch, vs what the Github action did. However
this burned the sdks/v2.48.1 tag.

The sync issue was resolved by a ` git fetch --all --tags` and the RC tag
commit confirmed `git rev-list ${RC_TAG}  -n 1`, leading to the 2nd fix
attempt with v2.48.2, which has resolved the Go SDK issue.

The Release Guide has been updated [2] to make checking this explicit,
though hopefully, this step will be obsolete when it's moved to github
actions. But until then, we may as well avoid the error.

The 2.48.0 release blog and notes have been updated to note the discrepancy
as well.

Thank you for your understanding and time,
Robert Burke
Beam Go Busybody

[0] https://github.com/apache/beam/issues/27064
[1] https://beam.apache.org/contribute/release-guide/#git-tag
[2] https://github.com/apache/beam/pull/27070


Beam High Priority Issue Report (37)

2023-06-08 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/27019 [Failing Test]: Azure Integration 
test is failing in python 3.7 PostCommit
https://github.com/apache/beam/issues/27012 [Bug]: Beam Website cannot run 
locally on Mac
https://github.com/apache/beam/issues/26981 [Bug]: Getting an error related to 
SchemaCoder after upgrading to 2.48
https://github.com/apache/beam/issues/26969 [Failing Test]: Python PostCommit 
is failing due to exceeded rate limits
https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested 
ROW (described below)
https://github.com/apache/beam/issues/26547 [Failing Test]: 
beam_PostCommit_Java_DataflowV2
https://github.com/apache/beam/issues/26354 [Bug]: BigQueryIO direct read not 
reading all rows when set --setEnableBundling=true
https://github.com/apache/beam/issues/26343 [Bug]: 
apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is 
flaky
https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not 
propagate a Coder to AvroSource
https://github.com/apache/beam/issues/26272 [Failing Test]: Python 3.7 
postcommit is red
https://github.com/apache/beam/issues/26041 [Bug]: Unable to create 
exactly-once Flink pipeline with stream source and file sink
https://github.com/apache/beam/issues/25975 [Bug]: Reducing parallelism in 
FlinkRunner leads to a data loss
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, 
testBigQueryStorageWrite30MProto failing consistently
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table 
destinations returns wrong tableId
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey
https://github.com/apache/beam/issues/21104 Flaky: 
apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers
https://github.com/apache/beam/issues/20976 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky
https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit 
empty pane when it should
https://github.com/apache/beam/issues/19814 Flink streaming flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
https://github.com/apache/beam/issues/19465 Explore possibilities to lower 
in-use IP address quota footprint.


P1 Issues with no update in the last week:

https://github.com/apache/beam/issues/26723 [Failing Test]: Tour of Beam 
Frontend Test suite is perma-red on master
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.