[GitHub] [beam] youngoli commented on a change in pull request #11791: [BEAM-9935] Respect allowed split points and fraction in Go.

2020-05-27 Thread GitBox


youngoli commented on a change in pull request #11791:
URL: https://github.com/apache/beam/pull/11791#discussion_r431615906



##
File path: sdks/go/pkg/beam/core/runtime/exec/datasource.go
##
@@ -266,33 +267,85 @@ func (n *DataSource) Progress() ProgressReportSnapshot {
return ProgressReportSnapshot{PID: n.outputPID, ID: n.SID.PtransformID, 
Name: n.Name, Count: c}
 }
 
-// Split takes a sorted set of potential split indices, selects and actuates
-// split on an appropriate split index, and returns the selected split index
-// if successful. Returns an error when unable to split.
+// Split takes a sorted set of potential split indices and a fraction of the
+// remainder to split at, selects and actuates a split on an appropriate split
+// index, and returns the selected split index if successful. Returns an error
+// when unable to split.
 func (n *DataSource) Split(splits []int64, frac float64) (int64, error) {
-   if splits == nil {
-   return 0, fmt.Errorf("failed to split: requested splits were 
empty")
-   }
if n == nil {
return 0, fmt.Errorf("failed to split at requested splits: 
{%v}, DataSource not initialized", splits)
}
+   if frac > 1.0 {
+   frac = 1.0
+   } else if frac < 0.0 {
+   frac = 0.0
+   }
+
n.mu.Lock()
-   c := n.index
-   // Find the smallest split index that we haven't yet processed, and set
-   // the promised split index to this value.
-   for _, s := range splits {
-   // // Never split on the first element, or the current element.
-   if s > 0 && s > c && s <= n.splitIdx {
-   n.splitIdx = s
-   fs := n.splitIdx
-   n.mu.Unlock()
-   return fs, nil
-   }
+   s, err := splitHelper(n.index, n.splitIdx, splits, frac)
+   if err != nil {
+   n.mu.Unlock()
+   return 0, err
}
+   n.splitIdx = s
+   fs := n.splitIdx
n.mu.Unlock()
-   // If we can't find a suitable split index from the requested choices,
-   // return an error.
-   return 0, fmt.Errorf("failed to split at requested splits: {%v}, 
DataSource at index: %v", splits, c)
+   return fs, nil
+}
+
+// splitHelper is a helper function that finds a split point in a range.
+// currIdx and splitIdx should match the DataSource's index and splitIdx 
fields,
+// and represent the start and end of the splittable range respectively. splits
+// is an optional slice of valid split indices, and if nil then all indices are
+// considered valid split points. frac must be between [0, 1], and represents
+// a fraction of the remaining work that the split point aims to be as close
+// as possible to.
+func splitHelper(currIdx, splitIdx int64, splits []int64, frac float64) 
(int64, error) {
+   // Get split index from fraction. Find the closest index to the 
fraction of
+   // the remainder.
+   var start int64 = 0
+   if currIdx > start {
+   start = currIdx
+   }
+   // This is the first valid split index, since we should never split at 
0 or
+   // at the current element.
+   safeStart := start + 1
+   // The remainder starts at our actual progress (i.e. start), but our 
final
+   // split index has to be >= our safeStart.
+   fracIdx := start + int64(math.Round(frac*float64(splitIdx-start)))
+   if fracIdx < safeStart {
+   fracIdx = safeStart
+   }
+   if splits == nil {

Review comment:
   Done. I missed that in the original. Added that behavior and a test for 
it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] youngoli commented on a change in pull request #11791: [BEAM-9935] Respect allowed split points and fraction in Go.

2020-05-27 Thread GitBox


youngoli commented on a change in pull request #11791:
URL: https://github.com/apache/beam/pull/11791#discussion_r431615426



##
File path: sdks/go/pkg/beam/core/runtime/exec/datasource.go
##
@@ -266,33 +267,85 @@ func (n *DataSource) Progress() ProgressReportSnapshot {
return ProgressReportSnapshot{PID: n.outputPID, ID: n.SID.PtransformID, 
Name: n.Name, Count: c}
 }
 
-// Split takes a sorted set of potential split indices, selects and actuates
-// split on an appropriate split index, and returns the selected split index
-// if successful. Returns an error when unable to split.
+// Split takes a sorted set of potential split indices and a fraction of the
+// remainder to split at, selects and actuates a split on an appropriate split
+// index, and returns the selected split index if successful. Returns an error
+// when unable to split.
 func (n *DataSource) Split(splits []int64, frac float64) (int64, error) {
-   if splits == nil {
-   return 0, fmt.Errorf("failed to split: requested splits were 
empty")
-   }
if n == nil {
return 0, fmt.Errorf("failed to split at requested splits: 
{%v}, DataSource not initialized", splits)
}
+   if frac > 1.0 {
+   frac = 1.0
+   } else if frac < 0.0 {
+   frac = 0.0
+   }
+
n.mu.Lock()
-   c := n.index
-   // Find the smallest split index that we haven't yet processed, and set
-   // the promised split index to this value.
-   for _, s := range splits {
-   // // Never split on the first element, or the current element.
-   if s > 0 && s > c && s <= n.splitIdx {
-   n.splitIdx = s
-   fs := n.splitIdx
-   n.mu.Unlock()
-   return fs, nil
-   }
+   s, err := splitHelper(n.index, n.splitIdx, splits, frac)
+   if err != nil {
+   n.mu.Unlock()
+   return 0, err
}
+   n.splitIdx = s
+   fs := n.splitIdx
n.mu.Unlock()
-   // If we can't find a suitable split index from the requested choices,
-   // return an error.
-   return 0, fmt.Errorf("failed to split at requested splits: {%v}, 
DataSource at index: %v", splits, c)
+   return fs, nil
+}
+
+// splitHelper is a helper function that finds a split point in a range.
+// currIdx and splitIdx should match the DataSource's index and splitIdx 
fields,
+// and represent the start and end of the splittable range respectively. splits
+// is an optional slice of valid split indices, and if nil then all indices are
+// considered valid split points. frac must be between [0, 1], and represents
+// a fraction of the remaining work that the split point aims to be as close
+// as possible to.
+func splitHelper(currIdx, splitIdx int64, splits []int64, frac float64) 
(int64, error) {
+   // Get split index from fraction. Find the closest index to the 
fraction of
+   // the remainder.
+   var start int64 = 0
+   if currIdx > start {
+   start = currIdx
+   }
+   // This is the first valid split index, since we should never split at 
0 or
+   // at the current element.
+   safeStart := start + 1
+   // The remainder starts at our actual progress (i.e. start), but our 
final
+   // split index has to be >= our safeStart.
+   fracIdx := start + int64(math.Round(frac*float64(splitIdx-start)))
+   if fracIdx < safeStart {
+   fracIdx = safeStart
+   }
+   if splits == nil {
+   // All split points are valid so just split at fraction.
+   return fracIdx, nil
+   } else {
+   // Find the closest unprocessed split point to our fraction.
+   sort.Slice(splits, func(i, j int) bool { return splits[i] < 
splits[j] })
+   var prevDiff int64 = math.MaxInt64
+   var bestS int64 = -1
+   for _, s := range splits {
+   if s >= safeStart && s <= splitIdx {
+   diff := intAbs(fracIdx - s)
+   if diff <= prevDiff {
+   prevDiff = diff
+   bestS = s
+   } else {
+   break // Stop early if the difference 
starts increasing.
+   }
+   }
+   }
+   if bestS != -1 {
+   return bestS, nil
+   }
+   }
+   return 0, fmt.Errorf("failed to split at requested splits: {%v}, 
DataSource at index: %v", splits, currIdx)

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific c

[GitHub] [beam] youngoli commented on a change in pull request #11791: [BEAM-9935] Respect allowed split points and fraction in Go.

2020-05-27 Thread GitBox


youngoli commented on a change in pull request #11791:
URL: https://github.com/apache/beam/pull/11791#discussion_r431615308



##
File path: sdks/go/pkg/beam/core/runtime/exec/datasource.go
##
@@ -266,33 +267,85 @@ func (n *DataSource) Progress() ProgressReportSnapshot {
return ProgressReportSnapshot{PID: n.outputPID, ID: n.SID.PtransformID, 
Name: n.Name, Count: c}
 }
 
-// Split takes a sorted set of potential split indices, selects and actuates
-// split on an appropriate split index, and returns the selected split index
-// if successful. Returns an error when unable to split.
+// Split takes a sorted set of potential split indices and a fraction of the
+// remainder to split at, selects and actuates a split on an appropriate split
+// index, and returns the selected split index if successful. Returns an error
+// when unable to split.
 func (n *DataSource) Split(splits []int64, frac float64) (int64, error) {
-   if splits == nil {
-   return 0, fmt.Errorf("failed to split: requested splits were 
empty")
-   }
if n == nil {
return 0, fmt.Errorf("failed to split at requested splits: 
{%v}, DataSource not initialized", splits)
}
+   if frac > 1.0 {
+   frac = 1.0
+   } else if frac < 0.0 {
+   frac = 0.0
+   }
+
n.mu.Lock()
-   c := n.index
-   // Find the smallest split index that we haven't yet processed, and set
-   // the promised split index to this value.
-   for _, s := range splits {
-   // // Never split on the first element, or the current element.
-   if s > 0 && s > c && s <= n.splitIdx {
-   n.splitIdx = s
-   fs := n.splitIdx
-   n.mu.Unlock()
-   return fs, nil
-   }
+   s, err := splitHelper(n.index, n.splitIdx, splits, frac)
+   if err != nil {
+   n.mu.Unlock()
+   return 0, err
}
+   n.splitIdx = s
+   fs := n.splitIdx
n.mu.Unlock()
-   // If we can't find a suitable split index from the requested choices,
-   // return an error.
-   return 0, fmt.Errorf("failed to split at requested splits: {%v}, 
DataSource at index: %v", splits, c)
+   return fs, nil
+}
+
+// splitHelper is a helper function that finds a split point in a range.
+// currIdx and splitIdx should match the DataSource's index and splitIdx 
fields,
+// and represent the start and end of the splittable range respectively. splits
+// is an optional slice of valid split indices, and if nil then all indices are
+// considered valid split points. frac must be between [0, 1], and represents
+// a fraction of the remaining work that the split point aims to be as close
+// as possible to.
+func splitHelper(currIdx, splitIdx int64, splits []int64, frac float64) 
(int64, error) {
+   // Get split index from fraction. Find the closest index to the 
fraction of
+   // the remainder.
+   var start int64 = 0
+   if currIdx > start {
+   start = currIdx
+   }
+   // This is the first valid split index, since we should never split at 
0 or
+   // at the current element.
+   safeStart := start + 1
+   // The remainder starts at our actual progress (i.e. start), but our 
final
+   // split index has to be >= our safeStart.
+   fracIdx := start + int64(math.Round(frac*float64(splitIdx-start)))
+   if fracIdx < safeStart {
+   fracIdx = safeStart
+   }
+   if splits == nil {
+   // All split points are valid so just split at fraction.
+   return fracIdx, nil
+   } else {
+   // Find the closest unprocessed split point to our fraction.
+   sort.Slice(splits, func(i, j int) bool { return splits[i] < 
splits[j] })
+   var prevDiff int64 = math.MaxInt64
+   var bestS int64 = -1
+   for _, s := range splits {

Review comment:
   Ack, although I think Search can probably be done even with int64. Might 
be worth benchmarking still.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] amaliujia commented on a change in pull request #11845: [BEAM-9198] BeamSQL aggregation analytics functionality

2020-05-27 Thread GitBox


amaliujia commented on a change in pull request #11845:
URL: https://github.com/apache/beam/pull/11845#discussion_r431601151



##
File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamWindowRule.java
##
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.rule;
+
+import org.apache.beam.sdk.extensions.sql.impl.rel.*;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.*;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.*;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.convert.*;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.*;
+
+public class BeamWindowRule extends ConverterRule {
+  public static final BeamWindowRule INSTANCE = new BeamWindowRule();
+
+  private BeamWindowRule() {
+super(LogicalWindow.class, Convention.NONE, 
BeamLogicalConvention.INSTANCE, "BeamWindowRule");
+  }
+
+  @Override
+  public RelNode convert(RelNode relNode) {
+// transforms relNode (LogicalWindow) to BeamWindowRel
+   assert false;

Review comment:
   Now your code will hit this line, which will crash.
   
   You can try to create BeamWindowRel now in convert function.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11844: [BEAM-8019] Enables proto holders when 'test_runner_api' is True.

2020-05-27 Thread GitBox


chamikaramj commented on pull request #11844:
URL: https://github.com/apache/beam/pull/11844#issuecomment-635120602


   Run Python2_PVR_Flink PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11843: [BEAM-10052][BEAM-10077] Cherry-pick PR 11771 to 2.22.0 branch

2020-05-27 Thread GitBox


chamikaramj commented on pull request #11843:
URL: https://github.com/apache/beam/pull/11843#issuecomment-635120392


   Failure is unrelated (seems to be due to Dataflow workers not being 
available).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rezarokni commented on pull request #11796: [BEAM-10003] Use local code for building code samples on website

2020-05-27 Thread GitBox


rezarokni commented on pull request #11796:
URL: https://github.com/apache/beam/pull/11796#issuecomment-635105846


   Great stuff! There will be a few more folks adding patterns and making this 
easier will help us grow the community contributions :-) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] henryken commented on pull request #11806: [BEAM-9679] Flatten Kata for Go

2020-05-27 Thread GitBox


henryken commented on pull request #11806:
URL: https://github.com/apache/beam/pull/11806#issuecomment-635102477


   This looks good now. Thanks @brucearctor!
   
   @lostluck, please help to merge this PR.
   @damondouglas, once this is merged, you can merge this to your PR and then 
we can upload to Stepik.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11796: [BEAM-10003] Use local code for building code samples on website

2020-05-27 Thread GitBox


pabloem commented on pull request #11796:
URL: https://github.com/apache/beam/pull/11796#issuecomment-635091326


   fyi @rezarokni with this fix, you should be able to create snippets and add 
them to the website in the same PR



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] amaliujia merged pull request #11807: [BEAM-9363] Support TUMBLE aggregation

2020-05-27 Thread GitBox


amaliujia merged pull request #11807:
URL: https://github.com/apache/beam/pull/11807


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] jhnmora000 commented on pull request #11845: [BEAM-9198] BeamSQL aggregation analytics functionality

2020-05-27 Thread GitBox


jhnmora000 commented on pull request #11845:
URL: https://github.com/apache/beam/pull/11845#issuecomment-635075698


   R: @amaliujia 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] jhnmora000 opened a new pull request #11845: [BEAM-9198] BeamSQL aggregation analytics functionality

2020-05-27 Thread GitBox


jhnmora000 opened a new pull request #11845:
URL: https://github.com/apache/beam/pull/11845


   A simple Analytic Functions experiment for BeamSQL created
   in order to understand the query processing workflow of 
   BeamSQL and Calcite.
   
   The experiment is implemented in the test 
BeamAnalyticFunctionsExperimentTest.testSimpleOverFunction(), 
   when executing it a "BEAM_LOGICAL but does not implement
   the required interface" exception is thrown.
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://buil

[GitHub] [beam] ananvay commented on pull request #11207: [BEAM-9220] Go Dataflow jobs to use runner v2

2020-05-27 Thread GitBox


ananvay commented on pull request #11207:
URL: https://github.com/apache/beam/pull/11207#issuecomment-635064457


   /cc: @robertwb 
   /cc: @lukecwik 
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ananvay commented on pull request #11207: [BEAM-9220] Go Dataflow jobs to use runner v2

2020-05-27 Thread GitBox


ananvay commented on pull request #11207:
URL: https://github.com/apache/beam/pull/11207#issuecomment-635064351


   Awesome! LGTM.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on pull request #11207: [BEAM-9220] Go Dataflow jobs to use runner v2

2020-05-27 Thread GitBox


lostluck commented on pull request #11207:
URL: https://github.com/apache/beam/pull/11207#issuecomment-635063900







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on pull request #11832: [BEAM-10110] Propagate ids for custom coders.

2020-05-27 Thread GitBox


lostluck commented on pull request #11832:
URL: https://github.com/apache/beam/pull/11832#issuecomment-635063412


   I suspect that the Dataflow JRH doesn't like this as much as runner v2 does.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lukecwik commented on pull request #11821: [WIP] [BEAM-10097, BEAM-5982, BEAM-3080] Use primitive views directly instead of transforming KV> to the view type via a na

2020-05-27 Thread GitBox


lukecwik commented on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-635062102


   Run Java Spark PortableValidatesRunner Batch



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lukecwik commented on pull request #11821: [WIP] [BEAM-10097, BEAM-5982, BEAM-3080] Use primitive views directly instead of transforming KV> to the view type via a na

2020-05-27 Thread GitBox


lukecwik commented on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-635061728







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lukecwik commented on pull request #11821: [WIP] [BEAM-10097, BEAM-5982, BEAM-3080] Use primitive views directly instead of transforming KV> to the view type via a na

2020-05-27 Thread GitBox


lukecwik commented on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-635061994







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on pull request #11832: [BEAM-10110] Propagate ids for custom coders.

2020-05-27 Thread GitBox


lostluck commented on pull request #11832:
URL: https://github.com/apache/beam/pull/11832#issuecomment-635055261


   Run Go PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11844: [BEAM-8019] Enables proto holders when 'test_runner_api' is True.

2020-05-27 Thread GitBox


chamikaramj commented on pull request #11844:
URL: https://github.com/apache/beam/pull/11844#issuecomment-635047434


   R: @robertwb 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11844: [BEAM-8019] Enables proto holders when 'test_runner_api' is True.

2020-05-27 Thread GitBox


chamikaramj commented on pull request #11844:
URL: https://github.com/apache/beam/pull/11844#issuecomment-635042169


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11844: [BEAM-8019] Enables proto holders when 'test_runner_api' is True.

2020-05-27 Thread GitBox


chamikaramj commented on pull request #11844:
URL: https://github.com/apache/beam/pull/11844#issuecomment-635038300


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11844: [BEAM-8019] Enables proto holders when 'test_runner_api' is True.

2020-05-27 Thread GitBox


chamikaramj commented on pull request #11844:
URL: https://github.com/apache/beam/pull/11844#issuecomment-635038250


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj opened a new pull request #11844: [BEAM-8019] Enables proto holders when 'test_runner_api' is True.

2020-05-27 Thread GitBox


chamikaramj opened a new pull request #11844:
URL: https://github.com/apache/beam/pull/11844


   Without this X-lang can be broken for Dataflow where this property 
automatically gets enabled for some execution paths.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache

[GitHub] [beam] pulasthi commented on pull request #10888: [BEAM-7304] Twister2 Beam runner

2020-05-27 Thread GitBox


pulasthi commented on pull request #10888:
URL: https://github.com/apache/beam/pull/10888#issuecomment-635033768


   @iemejia Looking forward for your feedback. And about the maintainability 
question, the Twister2 project has 10-15 active contributors at the moment who 
can take over the responsibility to maintain the contribution if I am not able 
to work on it somehow. Several of them know the codebase (of the Twister2 
Runner) in detail, so it should not be an issue.
   
   We are also planning to join the Apache incubator in the near future after a 
couple of more features that we think are important are completed. The Twister2 
Github page seems a little inactive these days because we are working on the 
Twisterx ( https://github.com/DSC-SPIDAL/twisterx ) project which will be 
merged into the Twister2 codebase in the coming weeks :). The beam runner is a 
major aspect that we plan to update and develop in the future.
   
   I hope that answers your concerns to some level. I will also work with the 
University and get the required CCLA as requested.
   
   Best Regards,
   Pulasthi



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn merged pull request #11707: [BEAM-9810] Add a Tox (precommit) suite for Python 3.8

2020-05-27 Thread GitBox


tvalentyn merged pull request #11707:
URL: https://github.com/apache/beam/pull/11707


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

2020-05-27 Thread GitBox


pabloem commented on pull request #11086:
URL: https://github.com/apache/beam/pull/11086#issuecomment-635031678


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

2020-05-27 Thread GitBox


pabloem commented on pull request #11086:
URL: https://github.com/apache/beam/pull/11086#issuecomment-635031588


   py2 failure is hdfs integration test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on a change in pull request #11828: [BEAM-10106] Script the deployment of artifacts to pypi

2020-05-27 Thread GitBox


tvalentyn commented on a change in pull request #11828:
URL: https://github.com/apache/beam/pull/11828#discussion_r431527674



##
File path: release/src/main/scripts/deploy_pypi.sh
##
@@ -41,9 +41,12 @@ fi
 mkdir ${LOCAL_CLONE_DIR}
 cd ${LOCAL_CLONE_DIR}
 
+virtualenv deploy_pypi_env
+source ./deploy_pypi_env/bin/activate
+pip install twine

Review comment:
   Consider  creating virtualenv in a /tmp/... directory or in the 
directory that you create and later clean up (assuming it won't interfere with 
pypi uploads).
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on a change in pull request #11828: [BEAM-10106] Script the deployment of artifacts to pypi

2020-05-27 Thread GitBox


tvalentyn commented on a change in pull request #11828:
URL: https://github.com/apache/beam/pull/11828#discussion_r431527674



##
File path: release/src/main/scripts/deploy_pypi.sh
##
@@ -41,9 +41,12 @@ fi
 mkdir ${LOCAL_CLONE_DIR}
 cd ${LOCAL_CLONE_DIR}
 
+virtualenv deploy_pypi_env
+source ./deploy_pypi_env/bin/activate
+pip install twine

Review comment:
   Consider making this in a /tmp directory or the directory that you 
create and later clean up (assuming it won't interfere with pypi uploads).
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] amaliujia commented on pull request #11807: [BEAM-9363] Support TUMBLE aggregation

2020-05-27 Thread GitBox


amaliujia commented on pull request #11807:
URL: https://github.com/apache/beam/pull/11807#issuecomment-635021964


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] nfisher commented on a change in pull request #11732: [BEAM-10017] Expose Cassandra Connect and Read timeouts

2020-05-27 Thread GitBox


nfisher commented on a change in pull request #11732:
URL: https://github.com/apache/beam/pull/11732#discussion_r431519624



##
File path: 
sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraIO.java
##
@@ -1023,6 +1065,8 @@ private String getMutationTypeName() {
 
   abstract Builder setMutationType(MutationType mutationType);
 
+  abstract Builder setConnectTimeout(ValueProvider timeout);

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] nfisher commented on a change in pull request #11732: [BEAM-10017] Expose Cassandra Connect and Read timeouts

2020-05-27 Thread GitBox


nfisher commented on a change in pull request #11732:
URL: https://github.com/apache/beam/pull/11732#discussion_r431519562



##
File path: 
sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraIO.java
##
@@ -948,6 +980,15 @@ public T getCurrent() throws NoSuchElementException {
   return builder().setConsistencyLevel(consistencyLevel).build();
 }
 
+/** Cassandra client socket option for connect timeout. */
+public Write withConnectTimeout(Integer timeout) {
+  return withConnectTimeout(ValueProvider.StaticValueProvider.of(timeout));
+}
+
+public Write withConnectTimeout(ValueProvider timeout) {

Review comment:
   Done line 1010





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] nfisher commented on a change in pull request #11732: [BEAM-10017] Expose Cassandra Connect and Read timeouts

2020-05-27 Thread GitBox


nfisher commented on a change in pull request #11732:
URL: https://github.com/apache/beam/pull/11732#discussion_r431519019



##
File path: 
sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraIO.java
##
@@ -1142,7 +1201,9 @@ private static Cluster getCluster(
   spec.username(),
   spec.password(),
   spec.localDc(),
-  spec.consistencyLevel());
+  spec.consistencyLevel(),
+  spec.connectTimeout(),
+  null);

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] amaliujia commented on pull request #11807: [BEAM-9363] Support TUMBLE aggregation

2020-05-27 Thread GitBox


amaliujia commented on pull request #11807:
URL: https://github.com/apache/beam/pull/11807#issuecomment-635018712


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] steveniemitz edited a comment on pull request #11814: [BEAM-10078] uniquify Dataflow specific jars when staging

2020-05-27 Thread GitBox


steveniemitz edited a comment on pull request #11814:
URL: https://github.com/apache/beam/pull/11814#issuecomment-635015560


   I just tested this change out, it seems like things work as they did before 
in 2.20.  The only difference I noticed is that my `filesToStage` are slightly 
different.
   
   Previously they looked like:
   ```
   dataflow-worker.jar=,
   
   ```
   
   however now they're just:
   ```
   ,
   
   ```
   
   It seems like the job launches correctly and uses the jar I set, so it seems 
like it's still working correctly, but I'm not sure if that missing might cause 
other issues down the road.  Also, the names are deterministic again which is 
great!
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] steveniemitz commented on pull request #11814: [BEAM-10078] uniquify Dataflow specific jars when staging

2020-05-27 Thread GitBox


steveniemitz commented on pull request #11814:
URL: https://github.com/apache/beam/pull/11814#issuecomment-635015560


   I just tested this change out, it seems like things work as they did before. 
 The only difference I noticed is that my `filesToStage` are slightly different.
   
   Previously they looked like:
   ```
   dataflow-worker.jar=,
   
   ```
   
   however now they're just:
   ```
   ,
   
   ```
   
   It seems like the job launches correctly and uses the jar I set, so it seems 
like it's still working correctly, but I'm not sure if that missing might cause 
other issues down the road.  Also, the names are deterministic again which is 
great!
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11843: [BEAM-10052] Cherry pick pr 11771 to 2.22.0 branch

2020-05-27 Thread GitBox


chamikaramj commented on pull request #11843:
URL: https://github.com/apache/beam/pull/11843#issuecomment-635013993


   R: @TheNeuralBit 
   
   CC: @robertwb @ihji 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj opened a new pull request #11843: [BEAM-10052] Cherry pick pr 11771 to 2.22.0 branch

2020-05-27 Thread GitBox


chamikaramj opened a new pull request #11843:
URL: https://github.com/apache/beam/pull/11843


   Without this Dataflow tries to upload the same artifact multiple times which 
breaks the Dataflow artifact container for some cases.
   
   This change already was merged to HEAD abut a week back but seems like we 
forgot to add it to the release branch.
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Pyth

[GitHub] [beam] aaltay commented on pull request #11826: Add a powered by page

2020-05-27 Thread GitBox


aaltay commented on pull request #11826:
URL: https://github.com/apache/beam/pull/11826#issuecomment-635002337


   Thank you all. I can merge this PR as is, but it would be great if others in 
the community could add other projects/products to the list.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] brucearctor commented on pull request #11826: Add a powered by page

2020-05-27 Thread GitBox


brucearctor commented on pull request #11826:
URL: https://github.com/apache/beam/pull/11826#issuecomment-635000108


   Nice!  Some of what I am hearing is that people want to use what others are. 
 So finding ways to highlight may be valuable for encouraging adoption.   
   
   I had missed website was ready to be updated (had been on hold).
   
   Yaml: no issues, so if interested in doing the work...   would want to 
ensure there was a good way to preview that wasn't burdensome to those updating 
the page.  



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] damondouglas commented on pull request #11806: [BEAM-9679] Flatten Kata for Go

2020-05-27 Thread GitBox


damondouglas commented on pull request #11806:
URL: https://github.com/apache/beam/pull/11806#issuecomment-634999073


   @henryken Related to [Comment from PR 
11803](https://github.com/apache/beam/pull/11803#issuecomment-634696399), would 
you mind to let me know when this is ready to update Stepik along with PR 
#11803?
   
   Thank you @brucearctor for doing this.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #11842: [BEAM-9971][release-2.22.0] Do not use context classloader.

2020-05-27 Thread GitBox


TheNeuralBit commented on pull request #11842:
URL: https://github.com/apache/beam/pull/11842#issuecomment-634995178


   Run Java Spark PortableValidatesRunner Batch



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit opened a new pull request #11842: [BEAM-9971][release-2.22.0] Do not use context classloader.

2020-05-27 Thread GitBox


TheNeuralBit opened a new pull request #11842:
URL: https://github.com/apache/beam/pull/11842


   Cherry pick #11784 into 2.22.0
   
   R: @ibzib 
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)[![

[GitHub] [beam] aijamalnk commented on a change in pull request #11780: [BEAM-9948] Uploading mascot to the website

2020-05-27 Thread GitBox


aijamalnk commented on a change in pull request #11780:
URL: https://github.com/apache/beam/pull/11780#discussion_r431494965



##
File path: website/www/site/content/en/community/mascot.md
##
@@ -0,0 +1,52 @@
+---
+title: "Beam Mascot"
+---
+
+
+# Beam Mascot Design
+
+This page contains Apache Beam's mascot designs. 
+
+Beam firefly is cute, friendly, agile, easy to use, and its main objective is 
to fetch streams and batches of data and process it. The mascot’s model sheet 
is useful to understand its features, capabilities, as well as its morphology.  
The original design of the mascot was created by [Julian G. 
Bruno](https://www.artstation.com/jbruno) and was donated to the Apache Beam 
community under the [Apache license 
2.0](https://www.apache.org/licenses/LICENSE-2.0). 
+
+You can browse the original mascot and its adaptations in different sizes and 
image formats in [this 
directory](https://github.com/apache/beam/tree/mascot-upload/website/www/site/static/images/mascot).
 

Review comment:
   The directory is aprt of the github repository. it is not yet merged, 
but once it is, it will work. I've fixed the apache licenses. Can you try 
taking a lo0ok? Than ks!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib merged pull request #11727: Update Beam website to release 2.21.0.

2020-05-27 Thread GitBox


ibzib merged pull request #11727:
URL: https://github.com/apache/beam/pull/11727


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aijamalnk commented on pull request #11826: Add a powered by page

2020-05-27 Thread GitBox


aijamalnk commented on pull request #11826:
URL: https://github.com/apache/beam/pull/11826#issuecomment-634987468


   This looks great. I think it's good to merge.
   
   I like the idea of YAML files, perhaps with support of thumbnails and 
external links. @epicfaace is it something you'd like to help us with after 
this PR is merged?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib merged pull request #11729: Add blog post announcing the Beam 2.21.0 release.

2020-05-27 Thread GitBox


ibzib merged pull request #11729:
URL: https://github.com/apache/beam/pull/11729


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #10165: [BEAM-7390] Add code snippet for GroupIntoBatches

2020-05-27 Thread GitBox


aaltay commented on pull request #10165:
URL: https://github.com/apache/beam/pull/10165#issuecomment-634985346


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] amaliujia commented on a change in pull request #11807: [BEAM-9363] Support TUMBLE aggregation

2020-05-27 Thread GitBox


amaliujia commented on a change in pull request #11807:
URL: https://github.com/apache/beam/pull/11807#discussion_r431486841



##
File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamTableFunctionScanRel.java
##
@@ -99,14 +102,32 @@ public TableFunctionScan copy(
   RexInputRef wmCol = (RexInputRef) call.getOperands().get(1);
   PCollection upstream = input.get(0);
   Schema outputSchema = CalciteUtils.toSchema(getRowType());
-  return upstream
-  .apply(
-  ParDo.of(
-  new FixedWindowDoFn(
-  
FixedWindows.of(durationParameter(call.getOperands().get(2))),
-  wmCol.getIndex(),
-  outputSchema)))
-  .setRowSchema(outputSchema);
+  FixedWindows windowFn = 
FixedWindows.of(durationParameter(call.getOperands().get(2)));
+  PCollection streamWithWindowMetadata =
+  upstream
+  .apply(ParDo.of(new FixedWindowDoFn(windowFn, wmCol.getIndex(), 
outputSchema)))
+  .setRowSchema(outputSchema);
+
+  PCollection windowedStream =
+  assignTimestampsAndWindow(
+  streamWithWindowMetadata, wmCol.getIndex(), (WindowFn) windowFn);
+
+  return windowedStream;
+}
+
+/** Extract timestamps from the windowFieldIndex, then window into 
windowFns. */
+private PCollection assignTimestampsAndWindow(
+PCollection upstream, int windowFieldIndex, WindowFn windowFn) {
+  PCollection windowedStream;
+  windowedStream =
+  upstream

Review comment:
   Not a big deal. Just want to use the name `windowedStream` to improve 
readability. E.g. readers know it's returning a windowed PCollection.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robinyqiu commented on a change in pull request #11807: [BEAM-9363] Support TUMBLE aggregation

2020-05-27 Thread GitBox


robinyqiu commented on a change in pull request #11807:
URL: https://github.com/apache/beam/pull/11807#discussion_r431481621



##
File path: 
sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLDialectSpecTest.java
##
@@ -4780,6 +4780,31 @@ public void testTumbleAsTVF() {
 
pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES));
   }
 
+  @Test
+  public void testTVFTumbleAggregation() {
+String sql =
+"SELECT COUNT(*) as field_count, "
++ "window_start "
++ "FROM TUMBLE((select * from KeyValue), descriptor(ts), 'INTERVAL 
1 SECOND') "
++ "GROUP BY window_start";
+ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config);
+BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql);
+
+PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, 
beamRelNode);
+
+final Schema schema =
+
Schema.builder().addInt64Field("count_start").addDateTimeField("window_start").build();

Review comment:
   Nit: `count_start` should be `field_count`.

##
File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamTableFunctionScanRel.java
##
@@ -99,14 +102,32 @@ public TableFunctionScan copy(
   RexInputRef wmCol = (RexInputRef) call.getOperands().get(1);
   PCollection upstream = input.get(0);
   Schema outputSchema = CalciteUtils.toSchema(getRowType());
-  return upstream
-  .apply(
-  ParDo.of(
-  new FixedWindowDoFn(
-  
FixedWindows.of(durationParameter(call.getOperands().get(2))),
-  wmCol.getIndex(),
-  outputSchema)))
-  .setRowSchema(outputSchema);
+  FixedWindows windowFn = 
FixedWindows.of(durationParameter(call.getOperands().get(2)));
+  PCollection streamWithWindowMetadata =
+  upstream
+  .apply(ParDo.of(new FixedWindowDoFn(windowFn, wmCol.getIndex(), 
outputSchema)))
+  .setRowSchema(outputSchema);
+
+  PCollection windowedStream =
+  assignTimestampsAndWindow(
+  streamWithWindowMetadata, wmCol.getIndex(), (WindowFn) windowFn);
+
+  return windowedStream;
+}
+
+/** Extract timestamps from the windowFieldIndex, then window into 
windowFns. */
+private PCollection assignTimestampsAndWindow(
+PCollection upstream, int windowFieldIndex, WindowFn windowFn) {
+  PCollection windowedStream;
+  windowedStream =
+  upstream

Review comment:
   Why not just `return upstream.apply(...)`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lukecwik merged pull request #11784: [BEAM-9971] Do not use context classloader.

2020-05-27 Thread GitBox


lukecwik merged pull request #11784:
URL: https://github.com/apache/beam/pull/11784


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tysonjh commented on a change in pull request #11566: [BEAM-9723] Add DLP integration transforms

2020-05-27 Thread GitBox


tysonjh commented on a change in pull request #11566:
URL: https://github.com/apache/beam/pull/11566#discussion_r431463509



##
File path: 
sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/DLPDeidentifyText.java
##
@@ -0,0 +1,267 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.ml;
+
+import com.google.auto.value.AutoValue;
+import com.google.cloud.dlp.v2.DlpServiceClient;
+import com.google.privacy.dlp.v2.ContentItem;
+import com.google.privacy.dlp.v2.DeidentifyConfig;
+import com.google.privacy.dlp.v2.DeidentifyContentRequest;
+import com.google.privacy.dlp.v2.DeidentifyContentResponse;
+import com.google.privacy.dlp.v2.FieldId;
+import com.google.privacy.dlp.v2.InspectConfig;
+import com.google.privacy.dlp.v2.ProjectName;
+import com.google.privacy.dlp.v2.Table;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionView;
+
+/**
+ * A {@link PTransform} connecting to Cloud DLP 
(https://cloud.google.com/dlp/docs/libraries) and
+ * deidentifying text according to provided settings. The transform supports 
both CSV formatted
+ * input data and unstructured input.
+ *
+ * If the csvHeader property is set and a sideinput with CSV headers is 
added to the PTransform,
+ * csvDelimiter also should be set, else the results will be incorrect. If 
csvHeader is neither set
+ * nor passed as sideinput, input is assumed to be unstructured.
+ *
+ * Either deidentifyTemplateName (String) or deidentifyConfig {@link 
DeidentifyConfig} need to be
+ * set. inspectTemplateName and inspectConfig ({@link InspectConfig} are 
optional.
+ *
+ * Batch size defines how big are batches sent to DLP at once in bytes.
+ *
+ * The transform consumes {@link KV} of {@link String}s (assumed to be 
filename as key and
+ * contents as value) and outputs {@link KV} of {@link String} (eg. filename) 
and {@link
+ * DeidentifyContentResponse}, which will contain {@link Table} of results for 
the user to consume.
+ */
+@Experimental
+@AutoValue
+public abstract class DLPDeidentifyText
+extends PTransform<
+PCollection>, PCollection>> {
+
+  public static final Integer DLP_PAYLOAD_LIMIT_BYTES = 524000;
+
+  /** @return Template name for data inspection. */
+  @Nullable
+  public abstract String inspectTemplateName();
+
+  /** @return Template name for data deidentification. */
+  @Nullable
+  public abstract String deidentifyTemplateName();
+
+  /**
+   * @return Configuration object for data inspection. If present, supersedes 
the template settings.
+   */
+  @Nullable
+  public abstract InspectConfig inspectConfig();
+
+  /** @return Configuration object for deidentification. If present, 
supersedes the template. */
+  @Nullable
+  public abstract DeidentifyConfig deidentifyConfig();
+
+  /** @return List of column names if the input KV value is a CSV formatted 
row. */
+  @Nullable
+  public abstract PCollectionView> csvHeader();
+
+  /** @return Delimiter to be used when splitting values from input strings 
into columns. */
+  @Nullable
+  public abstract String csvColumnDelimiter();

Review comment:
   Since the builder has setter methods prefixed with `set` having parity 
with getters prefixed with `get` would be nice. Also dropping the `csv` makes 
sense.
   
   ```suggestion
 public abstract String getColumnDelimiter();
   ```

##
File path: 
sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/DLPReidentifyText.java
##
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License,

[GitHub] [beam] KevinGG commented on a change in pull request #11838: [BEAM-9322] Modify the TestStream to output a dict when no output_tags are specified

2020-05-27 Thread GitBox


KevinGG commented on a change in pull request #11838:
URL: https://github.com/apache/beam/pull/11838#discussion_r431480466



##
File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
##
@@ -315,6 +327,7 @@ def read_multiple(self, labels):
 StreamingCacheSource(self._cache_dir, l,
  self._is_cache_complete).read(tail=True)
 for l in labels
+if not [sub_l for sub_l in l if self.sentinel_label() in sub_l]

Review comment:
   Or if a label `l` is a `list` of `str` and `*labels` is a `list` of 
`list` of `str`, then this makes sense.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #10165: [BEAM-7390] Add code snippet for GroupIntoBatches

2020-05-27 Thread GitBox


aaltay commented on pull request #10165:
URL: https://github.com/apache/beam/pull/10165#issuecomment-634978238


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #11839: [BEAM-10122] Python RowCoder throws NotImplementedError in DataflowRunner

2020-05-27 Thread GitBox


TheNeuralBit commented on pull request #11839:
URL: https://github.com/apache/beam/pull/11839#issuecomment-634977924


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] KevinGG commented on a change in pull request #11838: [BEAM-9322] Modify the TestStream to output a dict when no output_tags are specified

2020-05-27 Thread GitBox


KevinGG commented on a change in pull request #11838:
URL: https://github.com/apache/beam/pull/11838#discussion_r431469495



##
File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
##
@@ -315,6 +327,7 @@ def read_multiple(self, labels):
 StreamingCacheSource(self._cache_dir, l,
  self._is_cache_complete).read(tail=True)
 for l in labels
+if not [sub_l for sub_l in l if self.sentinel_label() in sub_l]

Review comment:
   This is a little hard to read. 
   Isn't a label `l` a `str`, so a `sub_l` is a character of that `str`?
   I suppose `if not [sub_l for ...]` evaluates to `True` when the `[sub_l for 
...]` is empty.
   And the emptiness of `[sub_l for ...]` is based on whether the 
`sentinel_label` exists in the `sub_l`? This is where I get confused.
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit opened a new pull request #11841: [BEAM-10121] Python RowCoder doesn't support nested structs

2020-05-27 Thread GitBox


TheNeuralBit opened a new pull request #11841:
URL: https://github.com/apache/beam/pull/11841


   Add support for nested structs to RowCoder.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)[![Build

[GitHub] [beam] pabloem merged pull request #11745: [BEAM-9692] Add to/from_runner_api_parameters to WriteToBigQuery

2020-05-27 Thread GitBox


pabloem merged pull request #11745:
URL: https://github.com/apache/beam/pull/11745


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11745: [BEAM-9692] Add to/from_runner_api_parameters to WriteToBigQuery

2020-05-27 Thread GitBox


pabloem commented on pull request #11745:
URL: https://github.com/apache/beam/pull/11745#issuecomment-634975097


   Thanks Sam!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rohdesamuel commented on pull request #11745: [BEAM-9692] Add to/from_runner_api_parameters to WriteToBigQuery

2020-05-27 Thread GitBox


rohdesamuel commented on pull request #11745:
URL: https://github.com/apache/beam/pull/11745#issuecomment-634973345


   > you can use the same jira for the ptransformoverride
   
   done



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] epicfaace opened a new pull request #11840: [draft] [BEAM-10111] Allow reading from archive files with MatchFiles

2020-05-27 Thread GitBox


epicfaace opened a new pull request #11840:
URL: https://github.com/apache/beam/pull/11840


   [still a draft]
   
   The general idea here is that we use mixins (subclasses of 
`ArchiveFileSystemMixin`) for each archive type (tar, zip, etc.). Each mixin 
"wraps" the I/O operations of its corresponding FileSystem (GCSFileSystem, 
LocalFileSystem, etc.) so that it behaves like an archive that is mounted on 
its corresponding FileSystem.
   
   ```python
   (
   p
   | MatchFiles("*.log", 
archive_path="s3://ashwin-bucket123/logs.zip")
   | Map(print)
   )
   ```
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastComplet

[GitHub] [beam] pabloem commented on pull request #11745: Add to/from_runner_api_parameters to WriteToBigQuery

2020-05-27 Thread GitBox


pabloem commented on pull request #11745:
URL: https://github.com/apache/beam/pull/11745#issuecomment-634972814


   you can use the same jira for the ptransformoverride



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem merged pull request #11796: [BEAM-10003] Use local code for building code samples on website

2020-05-27 Thread GitBox


pabloem merged pull request #11796:
URL: https://github.com/apache/beam/pull/11796


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rohdesamuel commented on a change in pull request #11838: [BEAM-9322] Modify the streaming cache to always have multiple outputs

2020-05-27 Thread GitBox


rohdesamuel commented on a change in pull request #11838:
URL: https://github.com/apache/beam/pull/11838#discussion_r431471609



##
File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
##
@@ -304,6 +304,18 @@ def read(self, *labels):
   return iter([]), -1
 return StreamingCache.Reader([header], [reader]).read(), 1
 
+  @staticmethod
+  def sentinel_label():

Review comment:
   Yeah that can work, I like that because it keeps the same semantics. 
I'll go with the {None} alternative because the output_tags are always manually 
specified in the from_runner_api_parameter method.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11745: Add to/from_runner_api_parameters to WriteToBigQuery

2020-05-27 Thread GitBox


pabloem commented on pull request #11745:
URL: https://github.com/apache/beam/pull/11745#issuecomment-634969845







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11745: Add to/from_runner_api_parameters to WriteToBigQuery

2020-05-27 Thread GitBox


pabloem commented on pull request #11745:
URL: https://github.com/apache/beam/pull/11745#issuecomment-634969665


   thanks Sam!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit opened a new pull request #11839: [BEAM-10122] Python RowCoder throws NotImplementedError in DataflowRunner

2020-05-27 Thread GitBox


TheNeuralBit opened a new pull request #11839:
URL: https://github.com/apache/beam/pull/11839


   Remove `as_cloud_object` override so that DataflowRunner can just use the 
default.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_

[GitHub] [beam] aaltay commented on pull request #10165: [BEAM-7390] Add code snippet for GroupIntoBatches

2020-05-27 Thread GitBox


aaltay commented on pull request #10165:
URL: https://github.com/apache/beam/pull/10165#issuecomment-634968738


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] davidcavazos commented on pull request #10165: [BEAM-7390] Add code snippet for GroupIntoBatches

2020-05-27 Thread GitBox


davidcavazos commented on pull request #10165:
URL: https://github.com/apache/beam/pull/10165#issuecomment-634968220







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb merged pull request #11793: [BEAM-10064] Fix google3 import error for BEAM-9383

2020-05-27 Thread GitBox


robertwb merged pull request #11793:
URL: https://github.com/apache/beam/pull/11793


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] epicfaace commented on a change in pull request #11826: Add a powered by page

2020-05-27 Thread GitBox


epicfaace commented on a change in pull request #11826:
URL: https://github.com/apache/beam/pull/11826#discussion_r431466644



##
File path: website/www/site/content/en/community/powered-by.md
##
@@ -0,0 +1,25 @@
+---
+title: 'Powered by Apache Beam'
+---
+
+# Projects and Products Powered by Apache Beam
+
+To add yourself to the list, please open a [pull 
request](https://github.com/apache/beam/edit/master/website/www/site/content/en/community/powered_by.md)
 adding your organization name, URL, a list of which Beam components you are 
using, and a short description of your use case.
+
+* **[Cloud Dataflow](https://cloud.google.com/dataflow):** Google Cloud 
Dataflow is a fully managed service for executing

Review comment:
   It wouldn't be too bad -- I'd be glad to help do it -- it would just 
make sure that we maintain a consistent format and layout that's easy to change 
(for example, https://arrow.apache.org/powered_by/ isn't in alphabetical order, 
and the bolding of titles etc. isn't too consistent -- and it would take more 
than a simple change to change this format quickly, because it may not be 
data-driven like with YAML). Making new data templates in YAML will also help 
us be better prepared for localization down the road.
   
   If you're okay with it, we could probably do it after this PR is merged.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb commented on pull request #11838: [BEAM-9322] Modify the streaming cache to always have multiple outputs

2020-05-27 Thread GitBox


robertwb commented on pull request #11838:
URL: https://github.com/apache/beam/pull/11838#issuecomment-634964145


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb commented on a change in pull request #11838: [BEAM-9322] Modify the streaming cache to always have multiple outputs

2020-05-27 Thread GitBox


robertwb commented on a change in pull request #11838:
URL: https://github.com/apache/beam/pull/11838#discussion_r431465968



##
File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
##
@@ -304,6 +304,18 @@ def read(self, *labels):
   return iter([]), -1
 return StreamingCache.Reader([header], [reader]).read(), 1
 
+  @staticmethod
+  def sentinel_label():

Review comment:
   Rather than introduce a sentinel label, how about returning a dict from 
expand iff output_tags was manually specified (or, alternatively, something 
other than `{None}`)?  





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rohdesamuel commented on pull request #11838: [BEAM-9322] Modify the streaming cache to always have multiple outputs

2020-05-27 Thread GitBox


rohdesamuel commented on pull request #11838:
URL: https://github.com/apache/beam/pull/11838#issuecomment-634961730


   R: @KevinGG 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #11826: Add a powered by page

2020-05-27 Thread GitBox


aaltay commented on pull request #11826:
URL: https://github.com/apache/beam/pull/11826#issuecomment-634958799


   R: @aijamalnk @brucearctor -- what do you think about this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rohdesamuel opened a new pull request #11838: [BEAM-9322] Modify the streaming cache to always have multiple outputs

2020-05-27 Thread GitBox


rohdesamuel opened a new pull request #11838:
URL: https://github.com/apache/beam/pull/11838


   Change-Id: I6a8eba4e323bf0fff318a56e44e512916c06266f
   
   https://github.com/apache/beam/pull/11765 removes the ability to set the 
output id on TestStreams with single outputs. This PR circumvents this by 
always adding a dummy output to the TestStream so that it will always output a 
dict, so that we can control the output ids.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_Po

[GitHub] [beam] iemejia commented on pull request #10888: [BEAM-7304] Twister2 Beam runner

2020-05-27 Thread GitBox


iemejia commented on pull request #10888:
URL: https://github.com/apache/beam/pull/10888#issuecomment-634953208


   @pulasthi technically there are not many major things. (I may comment on 
those but don't worry for those much). Currently the only concern I have is 
about the [bus factor](https://en.wikipedia.org/wiki/Bus_factor) of this 
contribution. Is there anyone apart of you able to contribute fixes/support for 
this runner in the future?
   
   Also given the size of the contribution and the fact that it comes from a 
University (aka corporation) you need to send a signed CCLA 
https://www.apache.org/licenses/cla-corporate.pdf 
   More info on why here 
https://www.apache.org/licenses/contributor-agreements.html
   
   Once you have the signed document from your University please send a copy to 
secret...@apache.org and priv...@beam.apache.org
   
   If you have still questions about this legal part please ask those to 
priv...@beam.apache.org
   
   Thanks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib commented on a change in pull request #11828: [BEAM-10106] Script the deployment of artifacts to pypi

2020-05-27 Thread GitBox


ibzib commented on a change in pull request #11828:
URL: https://github.com/apache/beam/pull/11828#discussion_r431449865



##
File path: release/src/main/scripts/deploy_pypi.sh
##
@@ -0,0 +1,57 @@
+#!/bin/bash
+#
+#Licensed to the Apache Software Foundation (ASF) under one or more
+#contributor license agreements.  See the NOTICE file distributed with
+#this work for additional information regarding copyright ownership.
+#The ASF licenses this file to You under the Apache License, Version 2.0
+#(the "License"); you may not use this file except in compliance with
+#the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+#
+
+# This script uploads Python artifacts staged at dist.apache.org to PyPI.
+
+set -e
+
+function clean_up(){
+  echo "Do you want to clean local clone repo? [y|N]"
+  read confirmation
+  if [[ $confirmation = "y" ]]; then
+cd ~
+rm -rf ${LOCAL_CLONE_DIR}
+echo "Clean up local repo."

Review comment:
   Done

##
File path: release/src/main/scripts/deploy_pypi.sh
##
@@ -0,0 +1,57 @@
+#!/bin/bash
+#
+#Licensed to the Apache Software Foundation (ASF) under one or more
+#contributor license agreements.  See the NOTICE file distributed with
+#this work for additional information regarding copyright ownership.
+#The ASF licenses this file to You under the Apache License, Version 2.0
+#(the "License"); you may not use this file except in compliance with
+#the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+#
+
+# This script uploads Python artifacts staged at dist.apache.org to PyPI.
+
+set -e
+
+function clean_up(){
+  echo "Do you want to clean local clone repo? [y|N]"
+  read confirmation
+  if [[ $confirmation = "y" ]]; then
+cd ~
+rm -rf ${LOCAL_CLONE_DIR}
+echo "Clean up local repo."
+  fi
+}
+
+echo "Enter the release version, e.g. 2.21.0:"
+read RELEASE
+LOCAL_CLONE_DIR="beam_release_${RELEASE}"
+cd ~
+if [[ -d ${LOCAL_CLONE_DIR} ]]; then
+  rm -rf ${LOCAL_CLONE_DIR}

Review comment:
   Done

##
File path: release/src/main/scripts/deploy_pypi.sh
##
@@ -0,0 +1,57 @@
+#!/bin/bash
+#
+#Licensed to the Apache Software Foundation (ASF) under one or more
+#contributor license agreements.  See the NOTICE file distributed with
+#this work for additional information regarding copyright ownership.
+#The ASF licenses this file to You under the Apache License, Version 2.0
+#(the "License"); you may not use this file except in compliance with
+#the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+#
+
+# This script uploads Python artifacts staged at dist.apache.org to PyPI.
+
+set -e
+
+function clean_up(){
+  echo "Do you want to clean local clone repo? [y|N]"

Review comment:
   Done

##
File path: release/src/main/scripts/deploy_pypi.sh
##
@@ -0,0 +1,57 @@
+#!/bin/bash
+#
+#Licensed to the Apache Software Foundation (ASF) under one or more
+#contributor license agreements.  See the NOTICE file distributed with
+#this work for additional information regarding copyright ownership.
+#The ASF licenses this file to You under the Apache License, Version 2.0
+#(the "License"); you may not use this file except in compliance with
+#the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+#
+
+# This script uploads Python artifacts staged at dist.apache.org to PyP

[GitHub] [beam] boyuanzz commented on pull request #11831: [BEAM-9603] Enable UsesTimerMap tests for flink portable runner

2020-05-27 Thread GitBox


boyuanzz commented on pull request #11831:
URL: https://github.com/apache/beam/pull/11831#issuecomment-634950453


   There is a bug tracking spark implementation: 
https://issues.apache.org/jira/browse/BEAM-9912. Opened 
https://issues.apache.org/jira/browse/BEAM-10120 for Flink.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

2020-05-27 Thread GitBox


pabloem commented on pull request #11086:
URL: https://github.com/apache/beam/pull/11086#issuecomment-634947671







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lukecwik removed a comment on pull request #11821: [WIP] [BEAM-10097, BEAM-5982, BEAM-3080] Use primitive views directly instead of transforming KV> to the view type

2020-05-27 Thread GitBox


lukecwik removed a comment on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-634312177


   R: @mxm @tweise 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] y1chi closed pull request #11831: [BEAM-9603] Enable UsesTimerMap tests for flink portable runner

2020-05-27 Thread GitBox


y1chi closed pull request #11831:
URL: https://github.com/apache/beam/pull/11831


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] iemejia commented on pull request #11186: [BEAM-9564] Remove insecure ssl options from MongoDBIO

2020-05-27 Thread GitBox


iemejia commented on pull request #11186:
URL: https://github.com/apache/beam/pull/11186#issuecomment-634946173


   Lies this is not stale!!! well maybe a bit, but I plan to come back to it 
soonish.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] amaliujia merged pull request #11817: [BEAM-10074] | implement hashing functions

2020-05-27 Thread GitBox


amaliujia merged pull request #11817:
URL: https://github.com/apache/beam/pull/11817


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on a change in pull request #11802: [BEAM-9916] Update I/O documentation links and create more complete I/O matrix

2020-05-27 Thread GitBox


pabloem commented on a change in pull request #11802:
URL: https://github.com/apache/beam/pull/11802#discussion_r431442795



##
File path: website/www/site/data/io_matrix.yaml
##
@@ -0,0 +1,377 @@
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+categories:
+  - name: File-based
+description: These I/O connectors involve working with files.
+rows:
+  - transform: FileIO
+description: "General-purpose transforms for working with files: 
listing files (matching), reading and writing."
+implementations:
+  - language: java
+name: org.apache.beam.sdk.io.FileIO
+url: 
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileIO.html
+  - language: py
+name: apache_beam.io.FileIO
+url: 
https://beam.apache.org/releases/pydoc/current/apache_beam.io.fileio.html
+  - transform: AvroIO
+description: PTransforms for reading from and writing to 
[Avro](https://avro.apache.org/) files.
+implementations:
+  - language: java
+name: org.apache.beam.sdk.io.AvroIO
+url: 
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/AvroIO.html
+  - language: py
+name: apache_beam.io.avroio
+url: 
https://beam.apache.org/releases/pydoc/current/apache_beam.io.avroio.html
+  - language: go
+name: github.com/apache/beam/sdks/go/pkg/beam/io/avroio
+url: 
https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/io/avroio
+  - transform: TextIO
+description: PTransforms for reading and writing text files.
+implementations:
+  - language: java
+name: org.apache.beam.sdk.io.TextIO
+url: 
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/TextIO.html
+  - language: py
+name: apache_beam.io.textio
+url: 
https://beam.apache.org/releases/pydoc/current/apache_beam.io.textio.html
+  - language: go
+name: github.com/apache/beam/sdks/go/pkg/beam/io/textio
+url: 
https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/io/textio
+  - transform: TFRecordIO
+description: PTransforms for reading and writing [TensorFlow 
TFRecord](https://www.tensorflow.org/tutorials/load_data/tfrecord) files.
+implementations:
+  - language: java
+name: org.apache.beam.sdk.io.TFRecordIO
+url: 
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/TFRecordIO.html
+  - language: py
+name: apache_beam.io.tfrecordio
+url: 
https://beam.apache.org/releases/pydoc/current/apache_beam.io.tfrecordio.html
+  - transform: XmlIO
+description: Transforms for reading and writing XML files using 
[JAXB](https://www.oracle.com/technical-resources/articles/javase/jaxb.html) 
mappers.
+implementations:
+  - language: java
+name: org.apache.beam.sdk.io.xml.XmlIO
+url: 
https://beam.apache.org/releases/javadoc/2.3.0/org/apache/beam/sdk/io/xml/XmlIO.html
+  - transform: TikaIO
+description: Transforms for parsing arbitrary files using [Apache 
Tika](https://tika.apache.org/).
+implementations:
+  - language: java
+name: org.apache.beam.sdk.io.tika.TikaIO
+url: 
https://beam.apache.org/releases/javadoc/2.3.0/org/apache/beam/sdk/io/tika/TikaIO.html
+  - transform: ParquetIO
+description: IO for reading from and writing to 
[Parquet](https://parquet.apache.org/) files.
+docs: /documentation/io/built-in/parquet/
+implementations:
+  - language: java
+name: org.apache.beam.sdk.io.parquet.ParquetIO
+url: 
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/parquet/ParquetIO.html
+  - language: py
+name: apache_beam.io.parquetio
+url: 
https://beam.apache.org/releases/pydoc/current/apache_beam.io.parquetio.html
+  - transform: ThriftIO
+description: PTransforms for reading and writing files containing 
[Thrift](https://thrift.apache.org/)-encoded data.
+implementations:
+  - language: java
+name: org.apache.beam.sdk.io.thrift.ThriftIO
+url: 
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/thrift/ThriftIO.html
+  - t

[GitHub] [beam] pabloem opened a new pull request #11837: [BEAM-10098] Enabling javadoc export for RabbitMqIO and KuduIO

2020-05-27 Thread GitBox


pabloem opened a new pull request #11837:
URL: https://github.com/apache/beam/pull/11837


   Starting export of javadocs for these IOs
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://bui

[GitHub] [beam] boyuanzz commented on pull request #11831: [BEAM-9603] Enable UsesTimerMap tests for flink portable runner

2020-05-27 Thread GitBox


boyuanzz commented on pull request #11831:
URL: https://github.com/apache/beam/pull/11831#issuecomment-634933808


   It seems like Flink doesn't support timer family either.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] amaliujia commented on pull request #11807: [BEAM-9363] Support TUMBLE aggregation

2020-05-27 Thread GitBox


amaliujia commented on pull request #11807:
URL: https://github.com/apache/beam/pull/11807#issuecomment-634932759


   A friendly ping~



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] apilloud commented on pull request #11834: [BEAM-10117] Correct erroneous Job Failed message

2020-05-27 Thread GitBox


apilloud commented on pull request #11834:
URL: https://github.com/apache/beam/pull/11834#issuecomment-634931406


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] apilloud opened a new pull request #11836: [BEAM-8693] Remove workaround for bigquery client

2020-05-27 Thread GitBox


apilloud opened a new pull request #11836:
URL: https://github.com/apache/beam/pull/11836


   The issue this workaround was for should have been fixed by upgrading the 
BigQuery IO. Removing to verify.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_Po

[GitHub] [beam] ibzib merged pull request #11829: [BEAM-10108] Update Flink versions in publish_docker_images.sh.

2020-05-27 Thread GitBox


ibzib merged pull request #11829:
URL: https://github.com/apache/beam/pull/11829


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib commented on a change in pull request #11729: Add blog post announcing the Beam 2.21.0 release.

2020-05-27 Thread GitBox


ibzib commented on a change in pull request #11729:
URL: https://github.com/apache/beam/pull/11729#discussion_r431426084



##
File path: website/www/site/content/en/blog/beam-2.21.0.md
##
@@ -0,0 +1,97 @@
+---
+title:  "Apache Beam 2.21.0"
+date:   2020-05-27 00:00:01 -0800
+categories: 
+  - blog
+authors:
+  - ibzib
+---
+
+
+We are happy to present the new 2.21.0 release of Beam. This release includes 
both improvements and new functionality.
+See the [download page](/get-started/downloads/#-) for this release.
+For more information on changes in 2.21.0, check out the
+[detailed release 
notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12347143).
+
+## I/Os
+* Python: Deprecated module `apache_beam.io.gcp.datastore.v1` has been removed
+as the client it uses is out of date and does not support Python 3
+([BEAM-9529](https://issues.apache.org/jira/browse/BEAM-9529)).
+Please migrate your code to use
+[apache_beam.io.gcp.datastore.**v1new**](https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.datastore.v1new.datastoreio.html).
+See the updated
+[datastore_wordcount](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/cookbook/datastore_wordcount.py)
+for example usage.
+* Python SDK: Added integration tests and updated batch write functionality 
for Google Cloud Spanner transform 
([BEAM-8949](https://issues.apache.org/jira/browse/BEAM-8949)).
+
+## New Features / Improvements
+* Python SDK will now use Python 3 type annotations as pipeline type hints.
+([#10717](https://github.com/apache/beam/pull/10717))
+
+If you suspect that this feature is causing your pipeline to fail, calling
+`apache_beam.typehints.disable_type_annotations()` before pipeline creation
+will disable is completely, and decorating specific functions (such as 
+`process()`) with `@apache_beam.typehints.no_annotations` will disable it
+for that function.
+
+More details will be in 
+[Ensuring Python Type 
Safety](https://beam.apache.org/documentation/sdks/python-type-safety/)
+and an upcoming
+[blog 
post](https://beam.apache.org/blog/python/typing/2020/03/06/python-typing.html).
+
+* Java SDK: Introducing the concept of options in Beam Schema’s. These options 
add extra 
+context to fields and schemas. This replaces the current Beam metadata that is 
present 
+in a FieldType only, options are available in fields and row schemas. Schema 
options are
+fully typed and can contain complex rows. *Remark: Schema aware is still 
experimental.*
+([BEAM-9035](https://issues.apache.org/jira/browse/BEAM-9035))
+* Java SDK: The protobuf extension is fully schema aware and also includes 
protobuf option
+conversion to beam schema options. *Remark: Schema aware is still 
experimental.*
+([BEAM-9044](https://issues.apache.org/jira/browse/BEAM-9044))
+* Added ability to write to BigQuery via Avro file loads (Python) 
([BEAM-8841](https://issues.apache.org/jira/browse/BEAM-8841))
+
+By default, file loads will be done using JSON, but it is possible to

Review comment:
   I'll leave it here since this was already in CHANGES.md.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib commented on a change in pull request #11727: Update Beam website to release 2.21.0.

2020-05-27 Thread GitBox


ibzib commented on a change in pull request #11727:
URL: https://github.com/apache/beam/pull/11727#discussion_r431424992



##
File path: website/www/site/content/en/get-started/downloads.md
##
@@ -87,6 +87,13 @@ versions denoted `0.x.y`.
 
 ## Releases
 
+### 2.21.0 (2020-05-27)
+Official [source code 
download](http://www.apache.org/dyn/closer.cgi/beam/2.21.0/apache-beam-2.21.0-source-release.zip).
+[SHA-512](https://downloads.apache.org/beam/2.21.0/apache-beam-2.21.0-source-release.zip.sha512).
+[signature](https://downloads.apache.org/beam/2.21.0/apache-beam-2.21.0-source-release.zip.asc).
+
+[Release 
notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12347143).
+
 ### 2.20.0 (2020-04-15)
 Official [source code 
download](http://www.apache.org/dyn/closer.cgi/beam/2.20.0/apache-beam-2.20.0-source-release.zip).

Review comment:
   Done. (I'll wait to merge this since files haven't been moved yet.)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11824: [BEAM-10101] Add HttpIO / HttpFileSystem (Python)

2020-05-27 Thread GitBox


pabloem commented on pull request #11824:
URL: https://github.com/apache/beam/pull/11824#issuecomment-634923054


   I'll be happy to take a look at this



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on a change in pull request #11582: [BEAM-9650] Add ReadAllFromBigQuery PTransform

2020-05-27 Thread GitBox


pabloem commented on a change in pull request #11582:
URL: https://github.com/apache/beam/pull/11582#discussion_r431421613



##
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##
@@ -1641,3 +1644,316 @@ def process(self, unused_element, signal):
 *self._args,
 **self._kwargs))
 | _PassThroughThenCleanup(RemoveJsonFiles(gcs_location)))
+
+
+class _ExtractBQData(DoFn):
+  '''
+  PTransform:ReadAllFromBigQueryRequest->FileMetadata that fetches BQ data into
+  a temporary storage and returns metadata for created files.
+  '''
+  def __init__(
+  self,
+  gcs_location_pattern=None,
+  project=None,
+  coder=None,
+  schema=None,
+  kms_key=None,
+  bq_client=None):
+
+self.gcs_location_pattern = gcs_location_pattern
+self.project = project
+self.coder = coder or _JsonToDictCoder
+self.kms_key = kms_key
+self.split_result = None
+self.schema = schema
+self.target_schema = None
+self.bq_client = bq_client
+
+  def process(self, element):
+'''
+:param element(ReadAllFromBigQueryRequest):
+:return:
+'''
+element.validate()
+if element.table is not None:
+  table_reference = bigquery_tools.parse_table_reference(element.table)
+  query = None
+  use_legacy_sql = True
+else:
+  query = element.query
+  use_legacy_sql = element.use_legacy_sql
+
+flatten_results = element.flatten_results
+
+bq = bigquery_tools.BigQueryWrapper(self.bq_client)
+
+try:
+  if element.query is not None:
+self._setup_temporary_dataset(bq, query, use_legacy_sql)
+table_reference = self._execute_query(
+bq, query, use_legacy_sql, flatten_results)
+
+  gcs_location = self.gcs_location_pattern.format(uuid.uuid4().hex)
+
+  table_schema = bq.get_table(
+  table_reference.projectId,
+  table_reference.datasetId,
+  table_reference.tableId).schema
+
+  if self.target_schema is None:
+self.target_schema = bigquery_tools.parse_table_schema_from_json(
+json.dumps(self.schema))
+
+  if not self.target_schema == table_schema:
+raise ValueError((
+"Schema generated by reading from BQ doesn't match expected"
+"schema.\nExpected: {}\nActual: {}").format(
+self.target_schema, table_schema))
+
+  metadata_list = self._export_files(bq, table_reference, gcs_location)
+
+  yield pvalue.TaggedOutput('location_to_cleanup', gcs_location)
+  for metadata in metadata_list:
+yield metadata.path
+
+finally:
+  if query is not None:
+bq.clean_up_temporary_dataset(self.project)
+
+  def _setup_temporary_dataset(self, bq, query, use_legacy_sql):
+location = bq.get_query_location(self.project, query, use_legacy_sql)
+bq.create_temporary_dataset(self.project, location)
+
+  def _execute_query(self, bq, query, use_legacy_sql, flatten_results):
+job = bq._start_query_job(
+self.project,
+query,
+use_legacy_sql,
+flatten_results,
+job_id=uuid.uuid4().hex,
+kms_key=self.kms_key)
+job_ref = job.jobReference
+bq.wait_for_bq_job(job_ref)
+return bq._get_temp_table(self.project)
+
+  def _export_files(self, bq, table_reference, gcs_location):
+"""Runs a BigQuery export job.
+
+Returns:
+  a list of FileMetadata instances
+"""
+job_id = uuid.uuid4().hex
+job_ref = bq.perform_extract_job([gcs_location],
+ job_id,
+ table_reference,
+ bigquery_tools.FileFormat.JSON,
+ include_header=False)
+bq.wait_for_bq_job(job_ref)
+metadata_list = FileSystems.match([gcs_location])[0].metadata_list
+
+return metadata_list
+
+
+class _PassThroughThenCleanupWithSI(PTransform):
+  """A PTransform that invokes a DoFn after the input PCollection has been
+processed.
+
+DoFn should have arguments (element, side_input, cleanup_signal).
+
+Utilizes readiness of PCollection to trigger DoFn.
+  """
+  def __init__(self, cleanup_dofn, side_input):
+self.cleanup_dofn = cleanup_dofn
+self.side_input = side_input
+
+  def expand(self, input):
+class PassThrough(beam.DoFn):
+  def process(self, element):
+yield element
+
+main_output, cleanup_signal = input | beam.ParDo(
+  PassThrough()).with_outputs(
+  'cleanup_signal', main='main')
+
+_ = (
+input.pipeline
+| beam.Create([None])
+| beam.ParDo(
+self.cleanup_dofn,
+self.side_input,
+beam.pvalue.AsSingleton(cleanup_signal)))
+
+return main_output
+
+
+class ReadAllFromBigQueryRequest:
+  '''
+  Class that defines data to read from BQ.
+  '''
+  def __init__(
+  self,
+  query=None,
+  use_legacy_sql=False,
+  table=None,
+  fl

[GitHub] [beam] iemejia commented on pull request #11831: [BEAM-9603] Enable UsesTimerMap tests for flink portable runner

2020-05-27 Thread GitBox


iemejia commented on pull request #11831:
URL: https://github.com/apache/beam/pull/11831#issuecomment-634914484


   For Spark runner it does not work with the current code maybe some 
information is not being passed during the translation. Maybe @ibzib can give 
us some hints.
   You can reproduce this easily locally by running
   ```
   ./gradlew :runners:spark:job-server:validatesPortableRunnerBatch --tests 
"*ParDoTest\$TimerFamily*" --scan
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] iemejia commented on pull request #11824: [BEAM-10101] Add HttpIO / HttpFileSystem (Python)

2020-05-27 Thread GitBox


iemejia commented on pull request #11824:
URL: https://github.com/apache/beam/pull/11824#issuecomment-634907464


   @aaltay can you please review this one or pass to someone who can. thx!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] iemejia removed a comment on pull request #11824: [BEAM-10101] Add HttpIO / HttpFileSystem (Python)

2020-05-27 Thread GitBox


iemejia removed a comment on pull request #11824:
URL: https://github.com/apache/beam/pull/11824#issuecomment-634898332


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   >