[GitHub] [beam] youngoli commented on a change in pull request #11791: [BEAM-9935] Respect allowed split points and fraction in Go.
youngoli commented on a change in pull request #11791: URL: https://github.com/apache/beam/pull/11791#discussion_r431615906 ## File path: sdks/go/pkg/beam/core/runtime/exec/datasource.go ## @@ -266,33 +267,85 @@ func (n *DataSource) Progress() ProgressReportSnapshot { return ProgressReportSnapshot{PID: n.outputPID, ID: n.SID.PtransformID, Name: n.Name, Count: c} } -// Split takes a sorted set of potential split indices, selects and actuates -// split on an appropriate split index, and returns the selected split index -// if successful. Returns an error when unable to split. +// Split takes a sorted set of potential split indices and a fraction of the +// remainder to split at, selects and actuates a split on an appropriate split +// index, and returns the selected split index if successful. Returns an error +// when unable to split. func (n *DataSource) Split(splits []int64, frac float64) (int64, error) { - if splits == nil { - return 0, fmt.Errorf("failed to split: requested splits were empty") - } if n == nil { return 0, fmt.Errorf("failed to split at requested splits: {%v}, DataSource not initialized", splits) } + if frac > 1.0 { + frac = 1.0 + } else if frac < 0.0 { + frac = 0.0 + } + n.mu.Lock() - c := n.index - // Find the smallest split index that we haven't yet processed, and set - // the promised split index to this value. - for _, s := range splits { - // // Never split on the first element, or the current element. - if s > 0 && s > c && s <= n.splitIdx { - n.splitIdx = s - fs := n.splitIdx - n.mu.Unlock() - return fs, nil - } + s, err := splitHelper(n.index, n.splitIdx, splits, frac) + if err != nil { + n.mu.Unlock() + return 0, err } + n.splitIdx = s + fs := n.splitIdx n.mu.Unlock() - // If we can't find a suitable split index from the requested choices, - // return an error. - return 0, fmt.Errorf("failed to split at requested splits: {%v}, DataSource at index: %v", splits, c) + return fs, nil +} + +// splitHelper is a helper function that finds a split point in a range. +// currIdx and splitIdx should match the DataSource's index and splitIdx fields, +// and represent the start and end of the splittable range respectively. splits +// is an optional slice of valid split indices, and if nil then all indices are +// considered valid split points. frac must be between [0, 1], and represents +// a fraction of the remaining work that the split point aims to be as close +// as possible to. +func splitHelper(currIdx, splitIdx int64, splits []int64, frac float64) (int64, error) { + // Get split index from fraction. Find the closest index to the fraction of + // the remainder. + var start int64 = 0 + if currIdx > start { + start = currIdx + } + // This is the first valid split index, since we should never split at 0 or + // at the current element. + safeStart := start + 1 + // The remainder starts at our actual progress (i.e. start), but our final + // split index has to be >= our safeStart. + fracIdx := start + int64(math.Round(frac*float64(splitIdx-start))) + if fracIdx < safeStart { + fracIdx = safeStart + } + if splits == nil { Review comment: Done. I missed that in the original. Added that behavior and a test for it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] youngoli commented on a change in pull request #11791: [BEAM-9935] Respect allowed split points and fraction in Go.
youngoli commented on a change in pull request #11791: URL: https://github.com/apache/beam/pull/11791#discussion_r431615426 ## File path: sdks/go/pkg/beam/core/runtime/exec/datasource.go ## @@ -266,33 +267,85 @@ func (n *DataSource) Progress() ProgressReportSnapshot { return ProgressReportSnapshot{PID: n.outputPID, ID: n.SID.PtransformID, Name: n.Name, Count: c} } -// Split takes a sorted set of potential split indices, selects and actuates -// split on an appropriate split index, and returns the selected split index -// if successful. Returns an error when unable to split. +// Split takes a sorted set of potential split indices and a fraction of the +// remainder to split at, selects and actuates a split on an appropriate split +// index, and returns the selected split index if successful. Returns an error +// when unable to split. func (n *DataSource) Split(splits []int64, frac float64) (int64, error) { - if splits == nil { - return 0, fmt.Errorf("failed to split: requested splits were empty") - } if n == nil { return 0, fmt.Errorf("failed to split at requested splits: {%v}, DataSource not initialized", splits) } + if frac > 1.0 { + frac = 1.0 + } else if frac < 0.0 { + frac = 0.0 + } + n.mu.Lock() - c := n.index - // Find the smallest split index that we haven't yet processed, and set - // the promised split index to this value. - for _, s := range splits { - // // Never split on the first element, or the current element. - if s > 0 && s > c && s <= n.splitIdx { - n.splitIdx = s - fs := n.splitIdx - n.mu.Unlock() - return fs, nil - } + s, err := splitHelper(n.index, n.splitIdx, splits, frac) + if err != nil { + n.mu.Unlock() + return 0, err } + n.splitIdx = s + fs := n.splitIdx n.mu.Unlock() - // If we can't find a suitable split index from the requested choices, - // return an error. - return 0, fmt.Errorf("failed to split at requested splits: {%v}, DataSource at index: %v", splits, c) + return fs, nil +} + +// splitHelper is a helper function that finds a split point in a range. +// currIdx and splitIdx should match the DataSource's index and splitIdx fields, +// and represent the start and end of the splittable range respectively. splits +// is an optional slice of valid split indices, and if nil then all indices are +// considered valid split points. frac must be between [0, 1], and represents +// a fraction of the remaining work that the split point aims to be as close +// as possible to. +func splitHelper(currIdx, splitIdx int64, splits []int64, frac float64) (int64, error) { + // Get split index from fraction. Find the closest index to the fraction of + // the remainder. + var start int64 = 0 + if currIdx > start { + start = currIdx + } + // This is the first valid split index, since we should never split at 0 or + // at the current element. + safeStart := start + 1 + // The remainder starts at our actual progress (i.e. start), but our final + // split index has to be >= our safeStart. + fracIdx := start + int64(math.Round(frac*float64(splitIdx-start))) + if fracIdx < safeStart { + fracIdx = safeStart + } + if splits == nil { + // All split points are valid so just split at fraction. + return fracIdx, nil + } else { + // Find the closest unprocessed split point to our fraction. + sort.Slice(splits, func(i, j int) bool { return splits[i] < splits[j] }) + var prevDiff int64 = math.MaxInt64 + var bestS int64 = -1 + for _, s := range splits { + if s >= safeStart && s <= splitIdx { + diff := intAbs(fracIdx - s) + if diff <= prevDiff { + prevDiff = diff + bestS = s + } else { + break // Stop early if the difference starts increasing. + } + } + } + if bestS != -1 { + return bestS, nil + } + } + return 0, fmt.Errorf("failed to split at requested splits: {%v}, DataSource at index: %v", splits, currIdx) Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c
[GitHub] [beam] youngoli commented on a change in pull request #11791: [BEAM-9935] Respect allowed split points and fraction in Go.
youngoli commented on a change in pull request #11791: URL: https://github.com/apache/beam/pull/11791#discussion_r431615308 ## File path: sdks/go/pkg/beam/core/runtime/exec/datasource.go ## @@ -266,33 +267,85 @@ func (n *DataSource) Progress() ProgressReportSnapshot { return ProgressReportSnapshot{PID: n.outputPID, ID: n.SID.PtransformID, Name: n.Name, Count: c} } -// Split takes a sorted set of potential split indices, selects and actuates -// split on an appropriate split index, and returns the selected split index -// if successful. Returns an error when unable to split. +// Split takes a sorted set of potential split indices and a fraction of the +// remainder to split at, selects and actuates a split on an appropriate split +// index, and returns the selected split index if successful. Returns an error +// when unable to split. func (n *DataSource) Split(splits []int64, frac float64) (int64, error) { - if splits == nil { - return 0, fmt.Errorf("failed to split: requested splits were empty") - } if n == nil { return 0, fmt.Errorf("failed to split at requested splits: {%v}, DataSource not initialized", splits) } + if frac > 1.0 { + frac = 1.0 + } else if frac < 0.0 { + frac = 0.0 + } + n.mu.Lock() - c := n.index - // Find the smallest split index that we haven't yet processed, and set - // the promised split index to this value. - for _, s := range splits { - // // Never split on the first element, or the current element. - if s > 0 && s > c && s <= n.splitIdx { - n.splitIdx = s - fs := n.splitIdx - n.mu.Unlock() - return fs, nil - } + s, err := splitHelper(n.index, n.splitIdx, splits, frac) + if err != nil { + n.mu.Unlock() + return 0, err } + n.splitIdx = s + fs := n.splitIdx n.mu.Unlock() - // If we can't find a suitable split index from the requested choices, - // return an error. - return 0, fmt.Errorf("failed to split at requested splits: {%v}, DataSource at index: %v", splits, c) + return fs, nil +} + +// splitHelper is a helper function that finds a split point in a range. +// currIdx and splitIdx should match the DataSource's index and splitIdx fields, +// and represent the start and end of the splittable range respectively. splits +// is an optional slice of valid split indices, and if nil then all indices are +// considered valid split points. frac must be between [0, 1], and represents +// a fraction of the remaining work that the split point aims to be as close +// as possible to. +func splitHelper(currIdx, splitIdx int64, splits []int64, frac float64) (int64, error) { + // Get split index from fraction. Find the closest index to the fraction of + // the remainder. + var start int64 = 0 + if currIdx > start { + start = currIdx + } + // This is the first valid split index, since we should never split at 0 or + // at the current element. + safeStart := start + 1 + // The remainder starts at our actual progress (i.e. start), but our final + // split index has to be >= our safeStart. + fracIdx := start + int64(math.Round(frac*float64(splitIdx-start))) + if fracIdx < safeStart { + fracIdx = safeStart + } + if splits == nil { + // All split points are valid so just split at fraction. + return fracIdx, nil + } else { + // Find the closest unprocessed split point to our fraction. + sort.Slice(splits, func(i, j int) bool { return splits[i] < splits[j] }) + var prevDiff int64 = math.MaxInt64 + var bestS int64 = -1 + for _, s := range splits { Review comment: Ack, although I think Search can probably be done even with int64. Might be worth benchmarking still. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] amaliujia commented on a change in pull request #11845: [BEAM-9198] BeamSQL aggregation analytics functionality
amaliujia commented on a change in pull request #11845: URL: https://github.com/apache/beam/pull/11845#discussion_r431601151 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamWindowRule.java ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.impl.rule; + +import org.apache.beam.sdk.extensions.sql.impl.rel.*; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.*; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.*; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.convert.*; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.*; + +public class BeamWindowRule extends ConverterRule { + public static final BeamWindowRule INSTANCE = new BeamWindowRule(); + + private BeamWindowRule() { +super(LogicalWindow.class, Convention.NONE, BeamLogicalConvention.INSTANCE, "BeamWindowRule"); + } + + @Override + public RelNode convert(RelNode relNode) { +// transforms relNode (LogicalWindow) to BeamWindowRel + assert false; Review comment: Now your code will hit this line, which will crash. You can try to create BeamWindowRel now in convert function. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11844: [BEAM-8019] Enables proto holders when 'test_runner_api' is True.
chamikaramj commented on pull request #11844: URL: https://github.com/apache/beam/pull/11844#issuecomment-635120602 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11843: [BEAM-10052][BEAM-10077] Cherry-pick PR 11771 to 2.22.0 branch
chamikaramj commented on pull request #11843: URL: https://github.com/apache/beam/pull/11843#issuecomment-635120392 Failure is unrelated (seems to be due to Dataflow workers not being available). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] rezarokni commented on pull request #11796: [BEAM-10003] Use local code for building code samples on website
rezarokni commented on pull request #11796: URL: https://github.com/apache/beam/pull/11796#issuecomment-635105846 Great stuff! There will be a few more folks adding patterns and making this easier will help us grow the community contributions :-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] henryken commented on pull request #11806: [BEAM-9679] Flatten Kata for Go
henryken commented on pull request #11806: URL: https://github.com/apache/beam/pull/11806#issuecomment-635102477 This looks good now. Thanks @brucearctor! @lostluck, please help to merge this PR. @damondouglas, once this is merged, you can merge this to your PR and then we can upload to Stepik. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11796: [BEAM-10003] Use local code for building code samples on website
pabloem commented on pull request #11796: URL: https://github.com/apache/beam/pull/11796#issuecomment-635091326 fyi @rezarokni with this fix, you should be able to create snippets and add them to the website in the same PR This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] amaliujia merged pull request #11807: [BEAM-9363] Support TUMBLE aggregation
amaliujia merged pull request #11807: URL: https://github.com/apache/beam/pull/11807 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] jhnmora000 commented on pull request #11845: [BEAM-9198] BeamSQL aggregation analytics functionality
jhnmora000 commented on pull request #11845: URL: https://github.com/apache/beam/pull/11845#issuecomment-635075698 R: @amaliujia This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] jhnmora000 opened a new pull request #11845: [BEAM-9198] BeamSQL aggregation analytics functionality
jhnmora000 opened a new pull request #11845: URL: https://github.com/apache/beam/pull/11845 A simple Analytic Functions experiment for BeamSQL created in order to understand the query processing workflow of BeamSQL and Calcite. The experiment is implemented in the test BeamAnalyticFunctionsExperimentTest.testSimpleOverFunction(), when executing it a "BEAM_LOGICAL but does not implement the required interface" exception is thrown. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://buil
[GitHub] [beam] ananvay commented on pull request #11207: [BEAM-9220] Go Dataflow jobs to use runner v2
ananvay commented on pull request #11207: URL: https://github.com/apache/beam/pull/11207#issuecomment-635064457 /cc: @robertwb /cc: @lukecwik This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ananvay commented on pull request #11207: [BEAM-9220] Go Dataflow jobs to use runner v2
ananvay commented on pull request #11207: URL: https://github.com/apache/beam/pull/11207#issuecomment-635064351 Awesome! LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lostluck commented on pull request #11207: [BEAM-9220] Go Dataflow jobs to use runner v2
lostluck commented on pull request #11207: URL: https://github.com/apache/beam/pull/11207#issuecomment-635063900 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lostluck commented on pull request #11832: [BEAM-10110] Propagate ids for custom coders.
lostluck commented on pull request #11832: URL: https://github.com/apache/beam/pull/11832#issuecomment-635063412 I suspect that the Dataflow JRH doesn't like this as much as runner v2 does. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #11821: [WIP] [BEAM-10097, BEAM-5982, BEAM-3080] Use primitive views directly instead of transforming KV> to the view type via a na
lukecwik commented on pull request #11821: URL: https://github.com/apache/beam/pull/11821#issuecomment-635062102 Run Java Spark PortableValidatesRunner Batch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #11821: [WIP] [BEAM-10097, BEAM-5982, BEAM-3080] Use primitive views directly instead of transforming KV> to the view type via a na
lukecwik commented on pull request #11821: URL: https://github.com/apache/beam/pull/11821#issuecomment-635061728 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #11821: [WIP] [BEAM-10097, BEAM-5982, BEAM-3080] Use primitive views directly instead of transforming KV> to the view type via a na
lukecwik commented on pull request #11821: URL: https://github.com/apache/beam/pull/11821#issuecomment-635061994 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lostluck commented on pull request #11832: [BEAM-10110] Propagate ids for custom coders.
lostluck commented on pull request #11832: URL: https://github.com/apache/beam/pull/11832#issuecomment-635055261 Run Go PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11844: [BEAM-8019] Enables proto holders when 'test_runner_api' is True.
chamikaramj commented on pull request #11844: URL: https://github.com/apache/beam/pull/11844#issuecomment-635047434 R: @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11844: [BEAM-8019] Enables proto holders when 'test_runner_api' is True.
chamikaramj commented on pull request #11844: URL: https://github.com/apache/beam/pull/11844#issuecomment-635042169 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11844: [BEAM-8019] Enables proto holders when 'test_runner_api' is True.
chamikaramj commented on pull request #11844: URL: https://github.com/apache/beam/pull/11844#issuecomment-635038300 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11844: [BEAM-8019] Enables proto holders when 'test_runner_api' is True.
chamikaramj commented on pull request #11844: URL: https://github.com/apache/beam/pull/11844#issuecomment-635038250 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj opened a new pull request #11844: [BEAM-8019] Enables proto holders when 'test_runner_api' is True.
chamikaramj opened a new pull request #11844: URL: https://github.com/apache/beam/pull/11844 Without this X-lang can be broken for Dataflow where this property automatically gets enabled for some execution paths. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache
[GitHub] [beam] pulasthi commented on pull request #10888: [BEAM-7304] Twister2 Beam runner
pulasthi commented on pull request #10888: URL: https://github.com/apache/beam/pull/10888#issuecomment-635033768 @iemejia Looking forward for your feedback. And about the maintainability question, the Twister2 project has 10-15 active contributors at the moment who can take over the responsibility to maintain the contribution if I am not able to work on it somehow. Several of them know the codebase (of the Twister2 Runner) in detail, so it should not be an issue. We are also planning to join the Apache incubator in the near future after a couple of more features that we think are important are completed. The Twister2 Github page seems a little inactive these days because we are working on the Twisterx ( https://github.com/DSC-SPIDAL/twisterx ) project which will be merged into the Twister2 codebase in the coming weeks :). The beam runner is a major aspect that we plan to update and develop in the future. I hope that answers your concerns to some level. I will also work with the University and get the required CCLA as requested. Best Regards, Pulasthi This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tvalentyn merged pull request #11707: [BEAM-9810] Add a Tox (precommit) suite for Python 3.8
tvalentyn merged pull request #11707: URL: https://github.com/apache/beam/pull/11707 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
pabloem commented on pull request #11086: URL: https://github.com/apache/beam/pull/11086#issuecomment-635031678 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
pabloem commented on pull request #11086: URL: https://github.com/apache/beam/pull/11086#issuecomment-635031588 py2 failure is hdfs integration test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tvalentyn commented on a change in pull request #11828: [BEAM-10106] Script the deployment of artifacts to pypi
tvalentyn commented on a change in pull request #11828: URL: https://github.com/apache/beam/pull/11828#discussion_r431527674 ## File path: release/src/main/scripts/deploy_pypi.sh ## @@ -41,9 +41,12 @@ fi mkdir ${LOCAL_CLONE_DIR} cd ${LOCAL_CLONE_DIR} +virtualenv deploy_pypi_env +source ./deploy_pypi_env/bin/activate +pip install twine Review comment: Consider creating virtualenv in a /tmp/... directory or in the directory that you create and later clean up (assuming it won't interfere with pypi uploads). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tvalentyn commented on a change in pull request #11828: [BEAM-10106] Script the deployment of artifacts to pypi
tvalentyn commented on a change in pull request #11828: URL: https://github.com/apache/beam/pull/11828#discussion_r431527674 ## File path: release/src/main/scripts/deploy_pypi.sh ## @@ -41,9 +41,12 @@ fi mkdir ${LOCAL_CLONE_DIR} cd ${LOCAL_CLONE_DIR} +virtualenv deploy_pypi_env +source ./deploy_pypi_env/bin/activate +pip install twine Review comment: Consider making this in a /tmp directory or the directory that you create and later clean up (assuming it won't interfere with pypi uploads). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] amaliujia commented on pull request #11807: [BEAM-9363] Support TUMBLE aggregation
amaliujia commented on pull request #11807: URL: https://github.com/apache/beam/pull/11807#issuecomment-635021964 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] nfisher commented on a change in pull request #11732: [BEAM-10017] Expose Cassandra Connect and Read timeouts
nfisher commented on a change in pull request #11732: URL: https://github.com/apache/beam/pull/11732#discussion_r431519624 ## File path: sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraIO.java ## @@ -1023,6 +1065,8 @@ private String getMutationTypeName() { abstract Builder setMutationType(MutationType mutationType); + abstract Builder setConnectTimeout(ValueProvider timeout); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] nfisher commented on a change in pull request #11732: [BEAM-10017] Expose Cassandra Connect and Read timeouts
nfisher commented on a change in pull request #11732: URL: https://github.com/apache/beam/pull/11732#discussion_r431519562 ## File path: sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraIO.java ## @@ -948,6 +980,15 @@ public T getCurrent() throws NoSuchElementException { return builder().setConsistencyLevel(consistencyLevel).build(); } +/** Cassandra client socket option for connect timeout. */ +public Write withConnectTimeout(Integer timeout) { + return withConnectTimeout(ValueProvider.StaticValueProvider.of(timeout)); +} + +public Write withConnectTimeout(ValueProvider timeout) { Review comment: Done line 1010 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] nfisher commented on a change in pull request #11732: [BEAM-10017] Expose Cassandra Connect and Read timeouts
nfisher commented on a change in pull request #11732: URL: https://github.com/apache/beam/pull/11732#discussion_r431519019 ## File path: sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraIO.java ## @@ -1142,7 +1201,9 @@ private static Cluster getCluster( spec.username(), spec.password(), spec.localDc(), - spec.consistencyLevel()); + spec.consistencyLevel(), + spec.connectTimeout(), + null); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] amaliujia commented on pull request #11807: [BEAM-9363] Support TUMBLE aggregation
amaliujia commented on pull request #11807: URL: https://github.com/apache/beam/pull/11807#issuecomment-635018712 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] steveniemitz edited a comment on pull request #11814: [BEAM-10078] uniquify Dataflow specific jars when staging
steveniemitz edited a comment on pull request #11814: URL: https://github.com/apache/beam/pull/11814#issuecomment-635015560 I just tested this change out, it seems like things work as they did before in 2.20. The only difference I noticed is that my `filesToStage` are slightly different. Previously they looked like: ``` dataflow-worker.jar=, ``` however now they're just: ``` , ``` It seems like the job launches correctly and uses the jar I set, so it seems like it's still working correctly, but I'm not sure if that missing might cause other issues down the road. Also, the names are deterministic again which is great! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] steveniemitz commented on pull request #11814: [BEAM-10078] uniquify Dataflow specific jars when staging
steveniemitz commented on pull request #11814: URL: https://github.com/apache/beam/pull/11814#issuecomment-635015560 I just tested this change out, it seems like things work as they did before. The only difference I noticed is that my `filesToStage` are slightly different. Previously they looked like: ``` dataflow-worker.jar=, ``` however now they're just: ``` , ``` It seems like the job launches correctly and uses the jar I set, so it seems like it's still working correctly, but I'm not sure if that missing might cause other issues down the road. Also, the names are deterministic again which is great! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11843: [BEAM-10052] Cherry pick pr 11771 to 2.22.0 branch
chamikaramj commented on pull request #11843: URL: https://github.com/apache/beam/pull/11843#issuecomment-635013993 R: @TheNeuralBit CC: @robertwb @ihji This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj opened a new pull request #11843: [BEAM-10052] Cherry pick pr 11771 to 2.22.0 branch
chamikaramj opened a new pull request #11843: URL: https://github.com/apache/beam/pull/11843 Without this Dataflow tries to upload the same artifact multiple times which breaks the Dataflow artifact container for some cases. This change already was merged to HEAD abut a week back but seems like we forgot to add it to the release branch. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Pyth
[GitHub] [beam] aaltay commented on pull request #11826: Add a powered by page
aaltay commented on pull request #11826: URL: https://github.com/apache/beam/pull/11826#issuecomment-635002337 Thank you all. I can merge this PR as is, but it would be great if others in the community could add other projects/products to the list. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] brucearctor commented on pull request #11826: Add a powered by page
brucearctor commented on pull request #11826: URL: https://github.com/apache/beam/pull/11826#issuecomment-635000108 Nice! Some of what I am hearing is that people want to use what others are. So finding ways to highlight may be valuable for encouraging adoption. I had missed website was ready to be updated (had been on hold). Yaml: no issues, so if interested in doing the work... would want to ensure there was a good way to preview that wasn't burdensome to those updating the page. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] damondouglas commented on pull request #11806: [BEAM-9679] Flatten Kata for Go
damondouglas commented on pull request #11806: URL: https://github.com/apache/beam/pull/11806#issuecomment-634999073 @henryken Related to [Comment from PR 11803](https://github.com/apache/beam/pull/11803#issuecomment-634696399), would you mind to let me know when this is ready to update Stepik along with PR #11803? Thank you @brucearctor for doing this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #11842: [BEAM-9971][release-2.22.0] Do not use context classloader.
TheNeuralBit commented on pull request #11842: URL: https://github.com/apache/beam/pull/11842#issuecomment-634995178 Run Java Spark PortableValidatesRunner Batch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit opened a new pull request #11842: [BEAM-9971][release-2.22.0] Do not use context classloader.
TheNeuralBit opened a new pull request #11842: URL: https://github.com/apache/beam/pull/11842 Cherry pick #11784 into 2.22.0 R: @ibzib Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)[![
[GitHub] [beam] aijamalnk commented on a change in pull request #11780: [BEAM-9948] Uploading mascot to the website
aijamalnk commented on a change in pull request #11780: URL: https://github.com/apache/beam/pull/11780#discussion_r431494965 ## File path: website/www/site/content/en/community/mascot.md ## @@ -0,0 +1,52 @@ +--- +title: "Beam Mascot" +--- + + +# Beam Mascot Design + +This page contains Apache Beam's mascot designs. + +Beam firefly is cute, friendly, agile, easy to use, and its main objective is to fetch streams and batches of data and process it. The mascot’s model sheet is useful to understand its features, capabilities, as well as its morphology. The original design of the mascot was created by [Julian G. Bruno](https://www.artstation.com/jbruno) and was donated to the Apache Beam community under the [Apache license 2.0](https://www.apache.org/licenses/LICENSE-2.0). + +You can browse the original mascot and its adaptations in different sizes and image formats in [this directory](https://github.com/apache/beam/tree/mascot-upload/website/www/site/static/images/mascot). Review comment: The directory is aprt of the github repository. it is not yet merged, but once it is, it will work. I've fixed the apache licenses. Can you try taking a lo0ok? Than ks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ibzib merged pull request #11727: Update Beam website to release 2.21.0.
ibzib merged pull request #11727: URL: https://github.com/apache/beam/pull/11727 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aijamalnk commented on pull request #11826: Add a powered by page
aijamalnk commented on pull request #11826: URL: https://github.com/apache/beam/pull/11826#issuecomment-634987468 This looks great. I think it's good to merge. I like the idea of YAML files, perhaps with support of thumbnails and external links. @epicfaace is it something you'd like to help us with after this PR is merged? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ibzib merged pull request #11729: Add blog post announcing the Beam 2.21.0 release.
ibzib merged pull request #11729: URL: https://github.com/apache/beam/pull/11729 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #10165: [BEAM-7390] Add code snippet for GroupIntoBatches
aaltay commented on pull request #10165: URL: https://github.com/apache/beam/pull/10165#issuecomment-634985346 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] amaliujia commented on a change in pull request #11807: [BEAM-9363] Support TUMBLE aggregation
amaliujia commented on a change in pull request #11807: URL: https://github.com/apache/beam/pull/11807#discussion_r431486841 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamTableFunctionScanRel.java ## @@ -99,14 +102,32 @@ public TableFunctionScan copy( RexInputRef wmCol = (RexInputRef) call.getOperands().get(1); PCollection upstream = input.get(0); Schema outputSchema = CalciteUtils.toSchema(getRowType()); - return upstream - .apply( - ParDo.of( - new FixedWindowDoFn( - FixedWindows.of(durationParameter(call.getOperands().get(2))), - wmCol.getIndex(), - outputSchema))) - .setRowSchema(outputSchema); + FixedWindows windowFn = FixedWindows.of(durationParameter(call.getOperands().get(2))); + PCollection streamWithWindowMetadata = + upstream + .apply(ParDo.of(new FixedWindowDoFn(windowFn, wmCol.getIndex(), outputSchema))) + .setRowSchema(outputSchema); + + PCollection windowedStream = + assignTimestampsAndWindow( + streamWithWindowMetadata, wmCol.getIndex(), (WindowFn) windowFn); + + return windowedStream; +} + +/** Extract timestamps from the windowFieldIndex, then window into windowFns. */ +private PCollection assignTimestampsAndWindow( +PCollection upstream, int windowFieldIndex, WindowFn windowFn) { + PCollection windowedStream; + windowedStream = + upstream Review comment: Not a big deal. Just want to use the name `windowedStream` to improve readability. E.g. readers know it's returning a windowed PCollection. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robinyqiu commented on a change in pull request #11807: [BEAM-9363] Support TUMBLE aggregation
robinyqiu commented on a change in pull request #11807: URL: https://github.com/apache/beam/pull/11807#discussion_r431481621 ## File path: sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLDialectSpecTest.java ## @@ -4780,6 +4780,31 @@ public void testTumbleAsTVF() { pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); } + @Test + public void testTVFTumbleAggregation() { +String sql = +"SELECT COUNT(*) as field_count, " ++ "window_start " ++ "FROM TUMBLE((select * from KeyValue), descriptor(ts), 'INTERVAL 1 SECOND') " ++ "GROUP BY window_start"; +ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); +BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); + +PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, beamRelNode); + +final Schema schema = + Schema.builder().addInt64Field("count_start").addDateTimeField("window_start").build(); Review comment: Nit: `count_start` should be `field_count`. ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamTableFunctionScanRel.java ## @@ -99,14 +102,32 @@ public TableFunctionScan copy( RexInputRef wmCol = (RexInputRef) call.getOperands().get(1); PCollection upstream = input.get(0); Schema outputSchema = CalciteUtils.toSchema(getRowType()); - return upstream - .apply( - ParDo.of( - new FixedWindowDoFn( - FixedWindows.of(durationParameter(call.getOperands().get(2))), - wmCol.getIndex(), - outputSchema))) - .setRowSchema(outputSchema); + FixedWindows windowFn = FixedWindows.of(durationParameter(call.getOperands().get(2))); + PCollection streamWithWindowMetadata = + upstream + .apply(ParDo.of(new FixedWindowDoFn(windowFn, wmCol.getIndex(), outputSchema))) + .setRowSchema(outputSchema); + + PCollection windowedStream = + assignTimestampsAndWindow( + streamWithWindowMetadata, wmCol.getIndex(), (WindowFn) windowFn); + + return windowedStream; +} + +/** Extract timestamps from the windowFieldIndex, then window into windowFns. */ +private PCollection assignTimestampsAndWindow( +PCollection upstream, int windowFieldIndex, WindowFn windowFn) { + PCollection windowedStream; + windowedStream = + upstream Review comment: Why not just `return upstream.apply(...)`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lukecwik merged pull request #11784: [BEAM-9971] Do not use context classloader.
lukecwik merged pull request #11784: URL: https://github.com/apache/beam/pull/11784 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tysonjh commented on a change in pull request #11566: [BEAM-9723] Add DLP integration transforms
tysonjh commented on a change in pull request #11566: URL: https://github.com/apache/beam/pull/11566#discussion_r431463509 ## File path: sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/DLPDeidentifyText.java ## @@ -0,0 +1,267 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +import com.google.auto.value.AutoValue; +import com.google.cloud.dlp.v2.DlpServiceClient; +import com.google.privacy.dlp.v2.ContentItem; +import com.google.privacy.dlp.v2.DeidentifyConfig; +import com.google.privacy.dlp.v2.DeidentifyContentRequest; +import com.google.privacy.dlp.v2.DeidentifyContentResponse; +import com.google.privacy.dlp.v2.FieldId; +import com.google.privacy.dlp.v2.InspectConfig; +import com.google.privacy.dlp.v2.ProjectName; +import com.google.privacy.dlp.v2.Table; +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.stream.Collectors; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionView; + +/** + * A {@link PTransform} connecting to Cloud DLP (https://cloud.google.com/dlp/docs/libraries) and + * deidentifying text according to provided settings. The transform supports both CSV formatted + * input data and unstructured input. + * + * If the csvHeader property is set and a sideinput with CSV headers is added to the PTransform, + * csvDelimiter also should be set, else the results will be incorrect. If csvHeader is neither set + * nor passed as sideinput, input is assumed to be unstructured. + * + * Either deidentifyTemplateName (String) or deidentifyConfig {@link DeidentifyConfig} need to be + * set. inspectTemplateName and inspectConfig ({@link InspectConfig} are optional. + * + * Batch size defines how big are batches sent to DLP at once in bytes. + * + * The transform consumes {@link KV} of {@link String}s (assumed to be filename as key and + * contents as value) and outputs {@link KV} of {@link String} (eg. filename) and {@link + * DeidentifyContentResponse}, which will contain {@link Table} of results for the user to consume. + */ +@Experimental +@AutoValue +public abstract class DLPDeidentifyText +extends PTransform< +PCollection>, PCollection>> { + + public static final Integer DLP_PAYLOAD_LIMIT_BYTES = 524000; + + /** @return Template name for data inspection. */ + @Nullable + public abstract String inspectTemplateName(); + + /** @return Template name for data deidentification. */ + @Nullable + public abstract String deidentifyTemplateName(); + + /** + * @return Configuration object for data inspection. If present, supersedes the template settings. + */ + @Nullable + public abstract InspectConfig inspectConfig(); + + /** @return Configuration object for deidentification. If present, supersedes the template. */ + @Nullable + public abstract DeidentifyConfig deidentifyConfig(); + + /** @return List of column names if the input KV value is a CSV formatted row. */ + @Nullable + public abstract PCollectionView> csvHeader(); + + /** @return Delimiter to be used when splitting values from input strings into columns. */ + @Nullable + public abstract String csvColumnDelimiter(); Review comment: Since the builder has setter methods prefixed with `set` having parity with getters prefixed with `get` would be nice. Also dropping the `csv` makes sense. ```suggestion public abstract String getColumnDelimiter(); ``` ## File path: sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/DLPReidentifyText.java ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License,
[GitHub] [beam] KevinGG commented on a change in pull request #11838: [BEAM-9322] Modify the TestStream to output a dict when no output_tags are specified
KevinGG commented on a change in pull request #11838: URL: https://github.com/apache/beam/pull/11838#discussion_r431480466 ## File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py ## @@ -315,6 +327,7 @@ def read_multiple(self, labels): StreamingCacheSource(self._cache_dir, l, self._is_cache_complete).read(tail=True) for l in labels +if not [sub_l for sub_l in l if self.sentinel_label() in sub_l] Review comment: Or if a label `l` is a `list` of `str` and `*labels` is a `list` of `list` of `str`, then this makes sense. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #10165: [BEAM-7390] Add code snippet for GroupIntoBatches
aaltay commented on pull request #10165: URL: https://github.com/apache/beam/pull/10165#issuecomment-634978238 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #11839: [BEAM-10122] Python RowCoder throws NotImplementedError in DataflowRunner
TheNeuralBit commented on pull request #11839: URL: https://github.com/apache/beam/pull/11839#issuecomment-634977924 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] KevinGG commented on a change in pull request #11838: [BEAM-9322] Modify the TestStream to output a dict when no output_tags are specified
KevinGG commented on a change in pull request #11838: URL: https://github.com/apache/beam/pull/11838#discussion_r431469495 ## File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py ## @@ -315,6 +327,7 @@ def read_multiple(self, labels): StreamingCacheSource(self._cache_dir, l, self._is_cache_complete).read(tail=True) for l in labels +if not [sub_l for sub_l in l if self.sentinel_label() in sub_l] Review comment: This is a little hard to read. Isn't a label `l` a `str`, so a `sub_l` is a character of that `str`? I suppose `if not [sub_l for ...]` evaluates to `True` when the `[sub_l for ...]` is empty. And the emptiness of `[sub_l for ...]` is based on whether the `sentinel_label` exists in the `sub_l`? This is where I get confused. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit opened a new pull request #11841: [BEAM-10121] Python RowCoder doesn't support nested structs
TheNeuralBit opened a new pull request #11841: URL: https://github.com/apache/beam/pull/11841 Add support for nested structs to RowCoder. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)[![Build
[GitHub] [beam] pabloem merged pull request #11745: [BEAM-9692] Add to/from_runner_api_parameters to WriteToBigQuery
pabloem merged pull request #11745: URL: https://github.com/apache/beam/pull/11745 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11745: [BEAM-9692] Add to/from_runner_api_parameters to WriteToBigQuery
pabloem commented on pull request #11745: URL: https://github.com/apache/beam/pull/11745#issuecomment-634975097 Thanks Sam! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] rohdesamuel commented on pull request #11745: [BEAM-9692] Add to/from_runner_api_parameters to WriteToBigQuery
rohdesamuel commented on pull request #11745: URL: https://github.com/apache/beam/pull/11745#issuecomment-634973345 > you can use the same jira for the ptransformoverride done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] epicfaace opened a new pull request #11840: [draft] [BEAM-10111] Allow reading from archive files with MatchFiles
epicfaace opened a new pull request #11840: URL: https://github.com/apache/beam/pull/11840 [still a draft] The general idea here is that we use mixins (subclasses of `ArchiveFileSystemMixin`) for each archive type (tar, zip, etc.). Each mixin "wraps" the I/O operations of its corresponding FileSystem (GCSFileSystem, LocalFileSystem, etc.) so that it behaves like an archive that is mounted on its corresponding FileSystem. ```python ( p | MatchFiles("*.log", archive_path="s3://ashwin-bucket123/logs.zip") | Map(print) ) ``` Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastComplet
[GitHub] [beam] pabloem commented on pull request #11745: Add to/from_runner_api_parameters to WriteToBigQuery
pabloem commented on pull request #11745: URL: https://github.com/apache/beam/pull/11745#issuecomment-634972814 you can use the same jira for the ptransformoverride This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem merged pull request #11796: [BEAM-10003] Use local code for building code samples on website
pabloem merged pull request #11796: URL: https://github.com/apache/beam/pull/11796 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] rohdesamuel commented on a change in pull request #11838: [BEAM-9322] Modify the streaming cache to always have multiple outputs
rohdesamuel commented on a change in pull request #11838: URL: https://github.com/apache/beam/pull/11838#discussion_r431471609 ## File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py ## @@ -304,6 +304,18 @@ def read(self, *labels): return iter([]), -1 return StreamingCache.Reader([header], [reader]).read(), 1 + @staticmethod + def sentinel_label(): Review comment: Yeah that can work, I like that because it keeps the same semantics. I'll go with the {None} alternative because the output_tags are always manually specified in the from_runner_api_parameter method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11745: Add to/from_runner_api_parameters to WriteToBigQuery
pabloem commented on pull request #11745: URL: https://github.com/apache/beam/pull/11745#issuecomment-634969845 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11745: Add to/from_runner_api_parameters to WriteToBigQuery
pabloem commented on pull request #11745: URL: https://github.com/apache/beam/pull/11745#issuecomment-634969665 thanks Sam! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit opened a new pull request #11839: [BEAM-10122] Python RowCoder throws NotImplementedError in DataflowRunner
TheNeuralBit opened a new pull request #11839: URL: https://github.com/apache/beam/pull/11839 Remove `as_cloud_object` override so that DataflowRunner can just use the default. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_
[GitHub] [beam] aaltay commented on pull request #10165: [BEAM-7390] Add code snippet for GroupIntoBatches
aaltay commented on pull request #10165: URL: https://github.com/apache/beam/pull/10165#issuecomment-634968738 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] davidcavazos commented on pull request #10165: [BEAM-7390] Add code snippet for GroupIntoBatches
davidcavazos commented on pull request #10165: URL: https://github.com/apache/beam/pull/10165#issuecomment-634968220 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb merged pull request #11793: [BEAM-10064] Fix google3 import error for BEAM-9383
robertwb merged pull request #11793: URL: https://github.com/apache/beam/pull/11793 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] epicfaace commented on a change in pull request #11826: Add a powered by page
epicfaace commented on a change in pull request #11826: URL: https://github.com/apache/beam/pull/11826#discussion_r431466644 ## File path: website/www/site/content/en/community/powered-by.md ## @@ -0,0 +1,25 @@ +--- +title: 'Powered by Apache Beam' +--- + +# Projects and Products Powered by Apache Beam + +To add yourself to the list, please open a [pull request](https://github.com/apache/beam/edit/master/website/www/site/content/en/community/powered_by.md) adding your organization name, URL, a list of which Beam components you are using, and a short description of your use case. + +* **[Cloud Dataflow](https://cloud.google.com/dataflow):** Google Cloud Dataflow is a fully managed service for executing Review comment: It wouldn't be too bad -- I'd be glad to help do it -- it would just make sure that we maintain a consistent format and layout that's easy to change (for example, https://arrow.apache.org/powered_by/ isn't in alphabetical order, and the bolding of titles etc. isn't too consistent -- and it would take more than a simple change to change this format quickly, because it may not be data-driven like with YAML). Making new data templates in YAML will also help us be better prepared for localization down the road. If you're okay with it, we could probably do it after this PR is merged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb commented on pull request #11838: [BEAM-9322] Modify the streaming cache to always have multiple outputs
robertwb commented on pull request #11838: URL: https://github.com/apache/beam/pull/11838#issuecomment-634964145 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb commented on a change in pull request #11838: [BEAM-9322] Modify the streaming cache to always have multiple outputs
robertwb commented on a change in pull request #11838: URL: https://github.com/apache/beam/pull/11838#discussion_r431465968 ## File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py ## @@ -304,6 +304,18 @@ def read(self, *labels): return iter([]), -1 return StreamingCache.Reader([header], [reader]).read(), 1 + @staticmethod + def sentinel_label(): Review comment: Rather than introduce a sentinel label, how about returning a dict from expand iff output_tags was manually specified (or, alternatively, something other than `{None}`)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] rohdesamuel commented on pull request #11838: [BEAM-9322] Modify the streaming cache to always have multiple outputs
rohdesamuel commented on pull request #11838: URL: https://github.com/apache/beam/pull/11838#issuecomment-634961730 R: @KevinGG This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11826: Add a powered by page
aaltay commented on pull request #11826: URL: https://github.com/apache/beam/pull/11826#issuecomment-634958799 R: @aijamalnk @brucearctor -- what do you think about this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] rohdesamuel opened a new pull request #11838: [BEAM-9322] Modify the streaming cache to always have multiple outputs
rohdesamuel opened a new pull request #11838: URL: https://github.com/apache/beam/pull/11838 Change-Id: I6a8eba4e323bf0fff318a56e44e512916c06266f https://github.com/apache/beam/pull/11765 removes the ability to set the output id on TestStreams with single outputs. This PR circumvents this by always adding a dummy output to the TestStream so that it will always output a dict, so that we can control the output ids. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_Po
[GitHub] [beam] iemejia commented on pull request #10888: [BEAM-7304] Twister2 Beam runner
iemejia commented on pull request #10888: URL: https://github.com/apache/beam/pull/10888#issuecomment-634953208 @pulasthi technically there are not many major things. (I may comment on those but don't worry for those much). Currently the only concern I have is about the [bus factor](https://en.wikipedia.org/wiki/Bus_factor) of this contribution. Is there anyone apart of you able to contribute fixes/support for this runner in the future? Also given the size of the contribution and the fact that it comes from a University (aka corporation) you need to send a signed CCLA https://www.apache.org/licenses/cla-corporate.pdf More info on why here https://www.apache.org/licenses/contributor-agreements.html Once you have the signed document from your University please send a copy to secret...@apache.org and priv...@beam.apache.org If you have still questions about this legal part please ask those to priv...@beam.apache.org Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ibzib commented on a change in pull request #11828: [BEAM-10106] Script the deployment of artifacts to pypi
ibzib commented on a change in pull request #11828: URL: https://github.com/apache/beam/pull/11828#discussion_r431449865 ## File path: release/src/main/scripts/deploy_pypi.sh ## @@ -0,0 +1,57 @@ +#!/bin/bash +# +#Licensed to the Apache Software Foundation (ASF) under one or more +#contributor license agreements. See the NOTICE file distributed with +#this work for additional information regarding copyright ownership. +#The ASF licenses this file to You under the Apache License, Version 2.0 +#(the "License"); you may not use this file except in compliance with +#the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +#Unless required by applicable law or agreed to in writing, software +#distributed under the License is distributed on an "AS IS" BASIS, +#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +#See the License for the specific language governing permissions and +#limitations under the License. +# + +# This script uploads Python artifacts staged at dist.apache.org to PyPI. + +set -e + +function clean_up(){ + echo "Do you want to clean local clone repo? [y|N]" + read confirmation + if [[ $confirmation = "y" ]]; then +cd ~ +rm -rf ${LOCAL_CLONE_DIR} +echo "Clean up local repo." Review comment: Done ## File path: release/src/main/scripts/deploy_pypi.sh ## @@ -0,0 +1,57 @@ +#!/bin/bash +# +#Licensed to the Apache Software Foundation (ASF) under one or more +#contributor license agreements. See the NOTICE file distributed with +#this work for additional information regarding copyright ownership. +#The ASF licenses this file to You under the Apache License, Version 2.0 +#(the "License"); you may not use this file except in compliance with +#the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +#Unless required by applicable law or agreed to in writing, software +#distributed under the License is distributed on an "AS IS" BASIS, +#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +#See the License for the specific language governing permissions and +#limitations under the License. +# + +# This script uploads Python artifacts staged at dist.apache.org to PyPI. + +set -e + +function clean_up(){ + echo "Do you want to clean local clone repo? [y|N]" + read confirmation + if [[ $confirmation = "y" ]]; then +cd ~ +rm -rf ${LOCAL_CLONE_DIR} +echo "Clean up local repo." + fi +} + +echo "Enter the release version, e.g. 2.21.0:" +read RELEASE +LOCAL_CLONE_DIR="beam_release_${RELEASE}" +cd ~ +if [[ -d ${LOCAL_CLONE_DIR} ]]; then + rm -rf ${LOCAL_CLONE_DIR} Review comment: Done ## File path: release/src/main/scripts/deploy_pypi.sh ## @@ -0,0 +1,57 @@ +#!/bin/bash +# +#Licensed to the Apache Software Foundation (ASF) under one or more +#contributor license agreements. See the NOTICE file distributed with +#this work for additional information regarding copyright ownership. +#The ASF licenses this file to You under the Apache License, Version 2.0 +#(the "License"); you may not use this file except in compliance with +#the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +#Unless required by applicable law or agreed to in writing, software +#distributed under the License is distributed on an "AS IS" BASIS, +#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +#See the License for the specific language governing permissions and +#limitations under the License. +# + +# This script uploads Python artifacts staged at dist.apache.org to PyPI. + +set -e + +function clean_up(){ + echo "Do you want to clean local clone repo? [y|N]" Review comment: Done ## File path: release/src/main/scripts/deploy_pypi.sh ## @@ -0,0 +1,57 @@ +#!/bin/bash +# +#Licensed to the Apache Software Foundation (ASF) under one or more +#contributor license agreements. See the NOTICE file distributed with +#this work for additional information regarding copyright ownership. +#The ASF licenses this file to You under the Apache License, Version 2.0 +#(the "License"); you may not use this file except in compliance with +#the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +#Unless required by applicable law or agreed to in writing, software +#distributed under the License is distributed on an "AS IS" BASIS, +#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +#See the License for the specific language governing permissions and +#limitations under the License. +# + +# This script uploads Python artifacts staged at dist.apache.org to PyP
[GitHub] [beam] boyuanzz commented on pull request #11831: [BEAM-9603] Enable UsesTimerMap tests for flink portable runner
boyuanzz commented on pull request #11831: URL: https://github.com/apache/beam/pull/11831#issuecomment-634950453 There is a bug tracking spark implementation: https://issues.apache.org/jira/browse/BEAM-9912. Opened https://issues.apache.org/jira/browse/BEAM-10120 for Flink. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
pabloem commented on pull request #11086: URL: https://github.com/apache/beam/pull/11086#issuecomment-634947671 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lukecwik removed a comment on pull request #11821: [WIP] [BEAM-10097, BEAM-5982, BEAM-3080] Use primitive views directly instead of transforming KV> to the view type
lukecwik removed a comment on pull request #11821: URL: https://github.com/apache/beam/pull/11821#issuecomment-634312177 R: @mxm @tweise This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] y1chi closed pull request #11831: [BEAM-9603] Enable UsesTimerMap tests for flink portable runner
y1chi closed pull request #11831: URL: https://github.com/apache/beam/pull/11831 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] iemejia commented on pull request #11186: [BEAM-9564] Remove insecure ssl options from MongoDBIO
iemejia commented on pull request #11186: URL: https://github.com/apache/beam/pull/11186#issuecomment-634946173 Lies this is not stale!!! well maybe a bit, but I plan to come back to it soonish. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] amaliujia merged pull request #11817: [BEAM-10074] | implement hashing functions
amaliujia merged pull request #11817: URL: https://github.com/apache/beam/pull/11817 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on a change in pull request #11802: [BEAM-9916] Update I/O documentation links and create more complete I/O matrix
pabloem commented on a change in pull request #11802: URL: https://github.com/apache/beam/pull/11802#discussion_r431442795 ## File path: website/www/site/data/io_matrix.yaml ## @@ -0,0 +1,377 @@ +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +categories: + - name: File-based +description: These I/O connectors involve working with files. +rows: + - transform: FileIO +description: "General-purpose transforms for working with files: listing files (matching), reading and writing." +implementations: + - language: java +name: org.apache.beam.sdk.io.FileIO +url: https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileIO.html + - language: py +name: apache_beam.io.FileIO +url: https://beam.apache.org/releases/pydoc/current/apache_beam.io.fileio.html + - transform: AvroIO +description: PTransforms for reading from and writing to [Avro](https://avro.apache.org/) files. +implementations: + - language: java +name: org.apache.beam.sdk.io.AvroIO +url: https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/AvroIO.html + - language: py +name: apache_beam.io.avroio +url: https://beam.apache.org/releases/pydoc/current/apache_beam.io.avroio.html + - language: go +name: github.com/apache/beam/sdks/go/pkg/beam/io/avroio +url: https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/io/avroio + - transform: TextIO +description: PTransforms for reading and writing text files. +implementations: + - language: java +name: org.apache.beam.sdk.io.TextIO +url: https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/TextIO.html + - language: py +name: apache_beam.io.textio +url: https://beam.apache.org/releases/pydoc/current/apache_beam.io.textio.html + - language: go +name: github.com/apache/beam/sdks/go/pkg/beam/io/textio +url: https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/io/textio + - transform: TFRecordIO +description: PTransforms for reading and writing [TensorFlow TFRecord](https://www.tensorflow.org/tutorials/load_data/tfrecord) files. +implementations: + - language: java +name: org.apache.beam.sdk.io.TFRecordIO +url: https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/TFRecordIO.html + - language: py +name: apache_beam.io.tfrecordio +url: https://beam.apache.org/releases/pydoc/current/apache_beam.io.tfrecordio.html + - transform: XmlIO +description: Transforms for reading and writing XML files using [JAXB](https://www.oracle.com/technical-resources/articles/javase/jaxb.html) mappers. +implementations: + - language: java +name: org.apache.beam.sdk.io.xml.XmlIO +url: https://beam.apache.org/releases/javadoc/2.3.0/org/apache/beam/sdk/io/xml/XmlIO.html + - transform: TikaIO +description: Transforms for parsing arbitrary files using [Apache Tika](https://tika.apache.org/). +implementations: + - language: java +name: org.apache.beam.sdk.io.tika.TikaIO +url: https://beam.apache.org/releases/javadoc/2.3.0/org/apache/beam/sdk/io/tika/TikaIO.html + - transform: ParquetIO +description: IO for reading from and writing to [Parquet](https://parquet.apache.org/) files. +docs: /documentation/io/built-in/parquet/ +implementations: + - language: java +name: org.apache.beam.sdk.io.parquet.ParquetIO +url: https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/parquet/ParquetIO.html + - language: py +name: apache_beam.io.parquetio +url: https://beam.apache.org/releases/pydoc/current/apache_beam.io.parquetio.html + - transform: ThriftIO +description: PTransforms for reading and writing files containing [Thrift](https://thrift.apache.org/)-encoded data. +implementations: + - language: java +name: org.apache.beam.sdk.io.thrift.ThriftIO +url: https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/thrift/ThriftIO.html + - t
[GitHub] [beam] pabloem opened a new pull request #11837: [BEAM-10098] Enabling javadoc export for RabbitMqIO and KuduIO
pabloem opened a new pull request #11837: URL: https://github.com/apache/beam/pull/11837 Starting export of javadocs for these IOs Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://bui
[GitHub] [beam] boyuanzz commented on pull request #11831: [BEAM-9603] Enable UsesTimerMap tests for flink portable runner
boyuanzz commented on pull request #11831: URL: https://github.com/apache/beam/pull/11831#issuecomment-634933808 It seems like Flink doesn't support timer family either. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] amaliujia commented on pull request #11807: [BEAM-9363] Support TUMBLE aggregation
amaliujia commented on pull request #11807: URL: https://github.com/apache/beam/pull/11807#issuecomment-634932759 A friendly ping~ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] apilloud commented on pull request #11834: [BEAM-10117] Correct erroneous Job Failed message
apilloud commented on pull request #11834: URL: https://github.com/apache/beam/pull/11834#issuecomment-634931406 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] apilloud opened a new pull request #11836: [BEAM-8693] Remove workaround for bigquery client
apilloud opened a new pull request #11836: URL: https://github.com/apache/beam/pull/11836 The issue this workaround was for should have been fixed by upgrading the BigQuery IO. Removing to verify. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_Po
[GitHub] [beam] ibzib merged pull request #11829: [BEAM-10108] Update Flink versions in publish_docker_images.sh.
ibzib merged pull request #11829: URL: https://github.com/apache/beam/pull/11829 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ibzib commented on a change in pull request #11729: Add blog post announcing the Beam 2.21.0 release.
ibzib commented on a change in pull request #11729: URL: https://github.com/apache/beam/pull/11729#discussion_r431426084 ## File path: website/www/site/content/en/blog/beam-2.21.0.md ## @@ -0,0 +1,97 @@ +--- +title: "Apache Beam 2.21.0" +date: 2020-05-27 00:00:01 -0800 +categories: + - blog +authors: + - ibzib +--- + + +We are happy to present the new 2.21.0 release of Beam. This release includes both improvements and new functionality. +See the [download page](/get-started/downloads/#-) for this release. +For more information on changes in 2.21.0, check out the +[detailed release notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12347143). + +## I/Os +* Python: Deprecated module `apache_beam.io.gcp.datastore.v1` has been removed +as the client it uses is out of date and does not support Python 3 +([BEAM-9529](https://issues.apache.org/jira/browse/BEAM-9529)). +Please migrate your code to use +[apache_beam.io.gcp.datastore.**v1new**](https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.datastore.v1new.datastoreio.html). +See the updated +[datastore_wordcount](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/cookbook/datastore_wordcount.py) +for example usage. +* Python SDK: Added integration tests and updated batch write functionality for Google Cloud Spanner transform ([BEAM-8949](https://issues.apache.org/jira/browse/BEAM-8949)). + +## New Features / Improvements +* Python SDK will now use Python 3 type annotations as pipeline type hints. +([#10717](https://github.com/apache/beam/pull/10717)) + +If you suspect that this feature is causing your pipeline to fail, calling +`apache_beam.typehints.disable_type_annotations()` before pipeline creation +will disable is completely, and decorating specific functions (such as +`process()`) with `@apache_beam.typehints.no_annotations` will disable it +for that function. + +More details will be in +[Ensuring Python Type Safety](https://beam.apache.org/documentation/sdks/python-type-safety/) +and an upcoming +[blog post](https://beam.apache.org/blog/python/typing/2020/03/06/python-typing.html). + +* Java SDK: Introducing the concept of options in Beam Schema’s. These options add extra +context to fields and schemas. This replaces the current Beam metadata that is present +in a FieldType only, options are available in fields and row schemas. Schema options are +fully typed and can contain complex rows. *Remark: Schema aware is still experimental.* +([BEAM-9035](https://issues.apache.org/jira/browse/BEAM-9035)) +* Java SDK: The protobuf extension is fully schema aware and also includes protobuf option +conversion to beam schema options. *Remark: Schema aware is still experimental.* +([BEAM-9044](https://issues.apache.org/jira/browse/BEAM-9044)) +* Added ability to write to BigQuery via Avro file loads (Python) ([BEAM-8841](https://issues.apache.org/jira/browse/BEAM-8841)) + +By default, file loads will be done using JSON, but it is possible to Review comment: I'll leave it here since this was already in CHANGES.md. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ibzib commented on a change in pull request #11727: Update Beam website to release 2.21.0.
ibzib commented on a change in pull request #11727: URL: https://github.com/apache/beam/pull/11727#discussion_r431424992 ## File path: website/www/site/content/en/get-started/downloads.md ## @@ -87,6 +87,13 @@ versions denoted `0.x.y`. ## Releases +### 2.21.0 (2020-05-27) +Official [source code download](http://www.apache.org/dyn/closer.cgi/beam/2.21.0/apache-beam-2.21.0-source-release.zip). +[SHA-512](https://downloads.apache.org/beam/2.21.0/apache-beam-2.21.0-source-release.zip.sha512). +[signature](https://downloads.apache.org/beam/2.21.0/apache-beam-2.21.0-source-release.zip.asc). + +[Release notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12347143). + ### 2.20.0 (2020-04-15) Official [source code download](http://www.apache.org/dyn/closer.cgi/beam/2.20.0/apache-beam-2.20.0-source-release.zip). Review comment: Done. (I'll wait to merge this since files haven't been moved yet.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11824: [BEAM-10101] Add HttpIO / HttpFileSystem (Python)
pabloem commented on pull request #11824: URL: https://github.com/apache/beam/pull/11824#issuecomment-634923054 I'll be happy to take a look at this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on a change in pull request #11582: [BEAM-9650] Add ReadAllFromBigQuery PTransform
pabloem commented on a change in pull request #11582: URL: https://github.com/apache/beam/pull/11582#discussion_r431421613 ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -1641,3 +1644,316 @@ def process(self, unused_element, signal): *self._args, **self._kwargs)) | _PassThroughThenCleanup(RemoveJsonFiles(gcs_location))) + + +class _ExtractBQData(DoFn): + ''' + PTransform:ReadAllFromBigQueryRequest->FileMetadata that fetches BQ data into + a temporary storage and returns metadata for created files. + ''' + def __init__( + self, + gcs_location_pattern=None, + project=None, + coder=None, + schema=None, + kms_key=None, + bq_client=None): + +self.gcs_location_pattern = gcs_location_pattern +self.project = project +self.coder = coder or _JsonToDictCoder +self.kms_key = kms_key +self.split_result = None +self.schema = schema +self.target_schema = None +self.bq_client = bq_client + + def process(self, element): +''' +:param element(ReadAllFromBigQueryRequest): +:return: +''' +element.validate() +if element.table is not None: + table_reference = bigquery_tools.parse_table_reference(element.table) + query = None + use_legacy_sql = True +else: + query = element.query + use_legacy_sql = element.use_legacy_sql + +flatten_results = element.flatten_results + +bq = bigquery_tools.BigQueryWrapper(self.bq_client) + +try: + if element.query is not None: +self._setup_temporary_dataset(bq, query, use_legacy_sql) +table_reference = self._execute_query( +bq, query, use_legacy_sql, flatten_results) + + gcs_location = self.gcs_location_pattern.format(uuid.uuid4().hex) + + table_schema = bq.get_table( + table_reference.projectId, + table_reference.datasetId, + table_reference.tableId).schema + + if self.target_schema is None: +self.target_schema = bigquery_tools.parse_table_schema_from_json( +json.dumps(self.schema)) + + if not self.target_schema == table_schema: +raise ValueError(( +"Schema generated by reading from BQ doesn't match expected" +"schema.\nExpected: {}\nActual: {}").format( +self.target_schema, table_schema)) + + metadata_list = self._export_files(bq, table_reference, gcs_location) + + yield pvalue.TaggedOutput('location_to_cleanup', gcs_location) + for metadata in metadata_list: +yield metadata.path + +finally: + if query is not None: +bq.clean_up_temporary_dataset(self.project) + + def _setup_temporary_dataset(self, bq, query, use_legacy_sql): +location = bq.get_query_location(self.project, query, use_legacy_sql) +bq.create_temporary_dataset(self.project, location) + + def _execute_query(self, bq, query, use_legacy_sql, flatten_results): +job = bq._start_query_job( +self.project, +query, +use_legacy_sql, +flatten_results, +job_id=uuid.uuid4().hex, +kms_key=self.kms_key) +job_ref = job.jobReference +bq.wait_for_bq_job(job_ref) +return bq._get_temp_table(self.project) + + def _export_files(self, bq, table_reference, gcs_location): +"""Runs a BigQuery export job. + +Returns: + a list of FileMetadata instances +""" +job_id = uuid.uuid4().hex +job_ref = bq.perform_extract_job([gcs_location], + job_id, + table_reference, + bigquery_tools.FileFormat.JSON, + include_header=False) +bq.wait_for_bq_job(job_ref) +metadata_list = FileSystems.match([gcs_location])[0].metadata_list + +return metadata_list + + +class _PassThroughThenCleanupWithSI(PTransform): + """A PTransform that invokes a DoFn after the input PCollection has been +processed. + +DoFn should have arguments (element, side_input, cleanup_signal). + +Utilizes readiness of PCollection to trigger DoFn. + """ + def __init__(self, cleanup_dofn, side_input): +self.cleanup_dofn = cleanup_dofn +self.side_input = side_input + + def expand(self, input): +class PassThrough(beam.DoFn): + def process(self, element): +yield element + +main_output, cleanup_signal = input | beam.ParDo( + PassThrough()).with_outputs( + 'cleanup_signal', main='main') + +_ = ( +input.pipeline +| beam.Create([None]) +| beam.ParDo( +self.cleanup_dofn, +self.side_input, +beam.pvalue.AsSingleton(cleanup_signal))) + +return main_output + + +class ReadAllFromBigQueryRequest: + ''' + Class that defines data to read from BQ. + ''' + def __init__( + self, + query=None, + use_legacy_sql=False, + table=None, + fl
[GitHub] [beam] iemejia commented on pull request #11831: [BEAM-9603] Enable UsesTimerMap tests for flink portable runner
iemejia commented on pull request #11831: URL: https://github.com/apache/beam/pull/11831#issuecomment-634914484 For Spark runner it does not work with the current code maybe some information is not being passed during the translation. Maybe @ibzib can give us some hints. You can reproduce this easily locally by running ``` ./gradlew :runners:spark:job-server:validatesPortableRunnerBatch --tests "*ParDoTest\$TimerFamily*" --scan ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] iemejia commented on pull request #11824: [BEAM-10101] Add HttpIO / HttpFileSystem (Python)
iemejia commented on pull request #11824: URL: https://github.com/apache/beam/pull/11824#issuecomment-634907464 @aaltay can you please review this one or pass to someone who can. thx! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] iemejia removed a comment on pull request #11824: [BEAM-10101] Add HttpIO / HttpFileSystem (Python)
iemejia removed a comment on pull request #11824: URL: https://github.com/apache/beam/pull/11824#issuecomment-634898332 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org