[GitHub] [beam] epicfaace opened a new pull request #11802: [BEAM-9916] Update I/O documentation links and create more complete I/O matrix
epicfaace opened a new pull request #11802: URL: https://github.com/apache/beam/pull/11802 Update I/O documentation links to point to the appropriate pydoc / javadoc / godoc pages. Also, create a more complete I/O matrix that has all the existing I/O's used in all three languages, with a tab to switch between languages, too. R: @pabloem The page now looks like this: ![image](https://user-images.githubusercontent.com/1689183/82719976-298a5c80-9c7d-11ea-8614-a6f40e5e3921.png) Additionally, I've also added links to the built-in I/O connector guides on the side bar, because previously they didn't exist: ![image](https://user-images.githubusercontent.com/1689183/82719978-314a0100-9c7d-11ea-9071-c47f137c7c2b.png) Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
[GitHub] [beam] TheNeuralBit merged pull request #11778: [BEAM-8889][release-2.22.0] Upgrades gcsio to 2.1.3
TheNeuralBit merged pull request #11778: URL: https://github.com/apache/beam/pull/11778 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #11778: [BEAM-8889][release-2.22.0] Upgrades gcsio to 2.1.3
TheNeuralBit commented on pull request #11778: URL: https://github.com/apache/beam/pull/11778#issuecomment-632932510 D'oh Python PreCommit is failing just because 2.22.0 containers haven't been built yet. I'll merge now This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] stale[bot] commented on pull request #10528: [BEAM-9064] Add pytype checks to tox
stale[bot] commented on pull request #10528: URL: https://github.com/apache/beam/pull/10528#issuecomment-632834816 This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] stale[bot] closed pull request #10528: [BEAM-9064] Add pytype checks to tox
stale[bot] closed pull request #10528: URL: https://github.com/apache/beam/pull/10528 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] iemejia commented on a change in pull request #11780: [BEAM-9948] Uploading mascot to the website
iemejia commented on a change in pull request #11780: URL: https://github.com/apache/beam/pull/11780#discussion_r429122732 ## File path: website/www/site/content/en/community/mascot.md ## @@ -0,0 +1,52 @@ +--- +title: "Beam Mascot" +--- + + +# Beam Mascot Design + +This page contains Apache Beam's mascot designs. + +Beam firefly is cute, friendly, agile, easy to use, and its main objective is to fetch streams and batches of data and process it. The mascot’s model sheet is useful to understand its features, capabilities, as well as its morphology. The original design of the mascot was created by [Julian G. Bruno](https://www.artstation.com/jbruno) and was donated to the Apache Beam community under the [Apache license 2.0](https://www.apache.org/licenses/LICENSE-2.0). + +You can browse the original mascot and its adaptations in different sizes and image formats in [this directory](https://github.com/apache/beam/tree/mascot-upload/website/www/site/static/images/mascot). + +![Model Sheet](/images/mascot/model_sheet.png) Review comment: Can we make this image clickable since it is barely readable in its compact form. ## File path: website/www/site/content/en/community/mascot.md ## @@ -0,0 +1,52 @@ +--- +title: "Beam Mascot" +--- + + +# Beam Mascot Design + +This page contains Apache Beam's mascot designs. + +Beam firefly is cute, friendly, agile, easy to use, and its main objective is to fetch streams and batches of data and process it. The mascot’s model sheet is useful to understand its features, capabilities, as well as its morphology. The original design of the mascot was created by [Julian G. Bruno](https://www.artstation.com/jbruno) and was donated to the Apache Beam community under the [Apache license 2.0](https://www.apache.org/licenses/LICENSE-2.0). Review comment: (nit) upper case Apache **L**icense ## File path: website/www/site/content/en/community/mascot.md ## @@ -0,0 +1,52 @@ +--- +title: "Beam Mascot" +--- + + +# Beam Mascot Design + +This page contains Apache Beam's mascot designs. + +Beam firefly is cute, friendly, agile, easy to use, and its main objective is to fetch streams and batches of data and process it. The mascot’s model sheet is useful to understand its features, capabilities, as well as its morphology. The original design of the mascot was created by [Julian G. Bruno](https://www.artstation.com/jbruno) and was donated to the Apache Beam community under the [Apache license 2.0](https://www.apache.org/licenses/LICENSE-2.0). + +You can browse the original mascot and its adaptations in different sizes and image formats in [this directory](https://github.com/apache/beam/tree/mascot-upload/website/www/site/static/images/mascot). Review comment: Also I cannot see the svg images in the web browser it seems there is an issue with the Apache License headers, e.g. https://raw.githubusercontent.com/aijamalnk/beam/mascot-upload/website/www/site/static/images/mascot/beam_mascot.svg ## File path: website/www/site/content/en/community/mascot.md ## @@ -0,0 +1,52 @@ +--- +title: "Beam Mascot" +--- + + +# Beam Mascot Design + +This page contains Apache Beam's mascot designs. + +Beam firefly is cute, friendly, agile, easy to use, and its main objective is to fetch streams and batches of data and process it. The mascot’s model sheet is useful to understand its features, capabilities, as well as its morphology. The original design of the mascot was created by [Julian G. Bruno](https://www.artstation.com/jbruno) and was donated to the Apache Beam community under the [Apache license 2.0](https://www.apache.org/licenses/LICENSE-2.0). + +You can browse the original mascot and its adaptations in different sizes and image formats in [this directory](https://github.com/apache/beam/tree/mascot-upload/website/www/site/static/images/mascot). Review comment: Link does not work and can we make it relative too e.g. `(/images/mascot/)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #11778: [BEAM-8889][release-2.22.0] Upgrades gcsio to 2.1.3
TheNeuralBit commented on pull request #11778: URL: https://github.com/apache/beam/pull/11778#issuecomment-632831985 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] epicfaace commented on a change in pull request #11780: [BEAM-9948] Uploading mascot to the website
epicfaace commented on a change in pull request #11780: URL: https://github.com/apache/beam/pull/11780#discussion_r429376885 ## File path: website/www/site/content/en/community/mascot.md ## @@ -0,0 +1,52 @@ +--- +title: "Beam Mascot" +--- + + +# Beam Mascot Design + +This page contains Apache Beam's mascot designs. + +Beam firefly is cute, friendly, agile, easy to use, and its main objective is to fetch streams and batches of data and process it. The mascot’s model sheet is useful to understand its features, capabilities, as well as its morphology. The original design of the mascot was created by [Julian G. Bruno](https://www.artstation.com/jbruno) and was donated to the Apache Beam community under the [Apache license 2.0](https://www.apache.org/licenses/LICENSE-2.0). Review comment: ```suggestion Beam Firefly is cute, friendly, agile, easy to use, and its main objective is to fetch streams and batches of data and process it. The mascot’s model sheet is useful to understand its features, capabilities, as well as its morphology. The original design of the mascot was created by [Julian G. Bruno](https://www.artstation.com/jbruno) and was donated to the Apache Beam community under the [Apache license 2.0](https://www.apache.org/licenses/LICENSE-2.0). ``` ## File path: website/www/site/content/en/community/mascot.md ## @@ -0,0 +1,52 @@ +--- +title: "Beam Mascot" +--- + + +# Beam Mascot Design + +This page contains Apache Beam's mascot designs. + +Beam firefly is cute, friendly, agile, easy to use, and its main objective is to fetch streams and batches of data and process it. The mascot’s model sheet is useful to understand its features, capabilities, as well as its morphology. The original design of the mascot was created by [Julian G. Bruno](https://www.artstation.com/jbruno) and was donated to the Apache Beam community under the [Apache license 2.0](https://www.apache.org/licenses/LICENSE-2.0). + +You can browse the original mascot and its adaptations in different sizes and image formats in [this directory](https://github.com/apache/beam/tree/mascot-upload/website/www/site/static/images/mascot). + +![Model Sheet](/images/mascot/model_sheet.png) + +## Beam mascot adaptations + +The original Beam mascot - simple and eye-catching, was developed with its body shaped like a “B” from Beam’s logo. The ideal choice for printing in any size or using for swags. The lit-up tail is another element to play with if you want to animate the firefly. + + + +The adaptation below represents independent learning. The bubble can be replaced by any other description. An ideal choice for promoting Beam talks, workshops, and webinars where the Beam community learns new things. + + + +The second adaptation below represents data processing. The data goes through the firefly’s tail, filling the entire body with streams of information (for example, codes) and when it is done processing the mascot’s entire body outline lits up in yellow color to tell that it is loaded with data and is ready to process it, while it becomes stronger and faster. Mascot returns to its original status after finishing this process. + + + + + + + +## Colors +For Beam mascot we used Apache Beam project’s predefined colors and fonts. [This document](/downloads/palette.pdf) has more information. Color Palette (RGB), Orange : #FF570B, Yellow : #FFF200, White: #FF, Black: #1D1D1B Review comment: ```suggestion For the Beam mascot, we used Apache Beam project’s predefined colors and fonts. [This document](/downloads/palette.pdf) has more information. Color Palette (RGB), Orange : #FF570B, Yellow : #FFF200, White: #FF, Black: #1D1D1B ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lostluck commented on a change in pull request #11791: [BEAM-9935] Respect allowed split points and fraction in Go.
lostluck commented on a change in pull request #11791: URL: https://github.com/apache/beam/pull/11791#discussion_r429361145 ## File path: sdks/go/pkg/beam/core/runtime/exec/datasource.go ## @@ -266,33 +267,85 @@ func (n *DataSource) Progress() ProgressReportSnapshot { return ProgressReportSnapshot{PID: n.outputPID, ID: n.SID.PtransformID, Name: n.Name, Count: c} } -// Split takes a sorted set of potential split indices, selects and actuates -// split on an appropriate split index, and returns the selected split index -// if successful. Returns an error when unable to split. +// Split takes a sorted set of potential split indices and a fraction of the +// remainder to split at, selects and actuates a split on an appropriate split +// index, and returns the selected split index if successful. Returns an error +// when unable to split. func (n *DataSource) Split(splits []int64, frac float64) (int64, error) { - if splits == nil { - return 0, fmt.Errorf("failed to split: requested splits were empty") - } if n == nil { return 0, fmt.Errorf("failed to split at requested splits: {%v}, DataSource not initialized", splits) } + if frac > 1.0 { + frac = 1.0 + } else if frac < 0.0 { + frac = 0.0 + } + n.mu.Lock() - c := n.index - // Find the smallest split index that we haven't yet processed, and set - // the promised split index to this value. - for _, s := range splits { - // // Never split on the first element, or the current element. - if s > 0 && s > c && s <= n.splitIdx { - n.splitIdx = s - fs := n.splitIdx - n.mu.Unlock() - return fs, nil - } + s, err := splitHelper(n.index, n.splitIdx, splits, frac) + if err != nil { + n.mu.Unlock() + return 0, err } + n.splitIdx = s + fs := n.splitIdx n.mu.Unlock() - // If we can't find a suitable split index from the requested choices, - // return an error. - return 0, fmt.Errorf("failed to split at requested splits: {%v}, DataSource at index: %v", splits, c) + return fs, nil +} + +// splitHelper is a helper function that finds a split point in a range. +// currIdx and splitIdx should match the DataSource's index and splitIdx fields, +// and represent the start and end of the splittable range respectively. splits +// is an optional slice of valid split indices, and if nil then all indices are +// considered valid split points. frac must be between [0, 1], and represents +// a fraction of the remaining work that the split point aims to be as close +// as possible to. +func splitHelper(currIdx, splitIdx int64, splits []int64, frac float64) (int64, error) { + // Get split index from fraction. Find the closest index to the fraction of + // the remainder. + var start int64 = 0 + if currIdx > start { + start = currIdx + } + // This is the first valid split index, since we should never split at 0 or + // at the current element. + safeStart := start + 1 + // The remainder starts at our actual progress (i.e. start), but our final + // split index has to be >= our safeStart. + fracIdx := start + int64(math.Round(frac*float64(splitIdx-start))) + if fracIdx < safeStart { + fracIdx = safeStart + } + if splits == nil { + // All split points are valid so just split at fraction. + return fracIdx, nil + } else { + // Find the closest unprocessed split point to our fraction. + sort.Slice(splits, func(i, j int) bool { return splits[i] < splits[j] }) Review comment: Consider https://golang.org/pkg/sort/#Ints ## File path: sdks/go/pkg/beam/core/runtime/exec/datasource.go ## @@ -266,33 +267,85 @@ func (n *DataSource) Progress() ProgressReportSnapshot { return ProgressReportSnapshot{PID: n.outputPID, ID: n.SID.PtransformID, Name: n.Name, Count: c} } -// Split takes a sorted set of potential split indices, selects and actuates -// split on an appropriate split index, and returns the selected split index -// if successful. Returns an error when unable to split. +// Split takes a sorted set of potential split indices and a fraction of the +// remainder to split at, selects and actuates a split on an appropriate split +// index, and returns the selected split index if successful. Returns an error +// when unable to split. func (n *DataSource) Split(splits []int64, frac float64) (int64, error) { - if splits == nil { - return 0, fmt.Errorf("failed to split: requested splits were empty") - } if n == nil { return 0, fmt.Errorf("failed to split at
[GitHub] [beam] chadrik commented on a change in pull request #11632: [BEAM-7746] Fix type errors and enable checks for apache_beam.dataframe.*
chadrik commented on a change in pull request #11632: URL: https://github.com/apache/beam/pull/11632#discussion_r429361766 ## File path: sdks/python/apache_beam/dataframe/convert.py ## @@ -16,13 +16,23 @@ from __future__ import absolute_import +import typing + import inspect from apache_beam import pvalue from apache_beam.dataframe import expressions from apache_beam.dataframe import frame_base from apache_beam.dataframe import transforms +if typing.TYPE_CHECKING: + # pylint: disable=ungrouped-imports + from typing import Any + from typing import Dict + from typing import Tuple + from typing import Union Review comment: This note hasn't been addressed: > I had a look at the lint errors, and they are legitimate, but scoping the imports is not the right solution. see above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lostluck commented on pull request #11782: [BEAM-10056] Fix validation for struct CoGBKs
lostluck commented on pull request #11782: URL: https://github.com/apache/beam/pull/11782#issuecomment-632807038 I like that the tests are very nicely separated for the different cases eg. Unknown being more permissive than any of the stricter checks. Well done! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lostluck merged pull request #11782: [BEAM-10056] Fix validation for struct CoGBKs
lostluck merged pull request #11782: URL: https://github.com/apache/beam/pull/11782 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lostluck merged pull request #11768: [BEAM-10051] Move closed reader check after sentinel.
lostluck merged pull request #11768: URL: https://github.com/apache/beam/pull/11768 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] epicfaace opened a new pull request #11801: [BEAM-10067] Minify website assets with --minify flag
epicfaace opened a new pull request #11801: URL: https://github.com/apache/beam/pull/11801 Use Hugo's `--minify` flag to minify website assets, such as HTML files. Size of the `dist` folder has changed from 41M -> 39M. R: @aaltay Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build
[GitHub] [beam] epicfaace commented on pull request #11759: [BEAM-9926] Docs - show placeholder code snippets if code snippets are unavailable
epicfaace commented on pull request #11759: URL: https://github.com/apache/beam/pull/11759#issuecomment-632737396 I don't like this approach. I think it's best to manually add code snippets as in #11790. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] santhh commented on a change in pull request #11566: [BEAM-9723] Add DLP integration transforms
santhh commented on a change in pull request #11566: URL: https://github.com/apache/beam/pull/11566#discussion_r429297864 ## File path: sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/DLPDeidentifyText.java ## @@ -0,0 +1,215 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +import com.google.auto.value.AutoValue; +import com.google.cloud.dlp.v2.DlpServiceClient; +import com.google.privacy.dlp.v2.ContentItem; +import com.google.privacy.dlp.v2.DeidentifyConfig; +import com.google.privacy.dlp.v2.DeidentifyContentRequest; +import com.google.privacy.dlp.v2.DeidentifyContentResponse; +import com.google.privacy.dlp.v2.FieldId; +import com.google.privacy.dlp.v2.InspectConfig; +import com.google.privacy.dlp.v2.ProjectName; +import com.google.privacy.dlp.v2.Table; +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.stream.Collectors; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionView; + +/** + * A {@link PTransform} connecting to Cloud DLP and deidentifying text according to provided + * settings. The transform supports both CSV formatted input data and unstructured input. + * + * If the csvHeader property is set, csvDelimiter also should be, else the results will be + * incorrect. If csvHeader is not set, input is assumed to be unstructured. + * + * Either inspectTemplateName (String) or inspectConfig {@link InspectConfig} need to be set. The + * situation is the same with deidentifyTemplateName and deidentifyConfig ({@link DeidentifyConfig}. + * + * Batch size defines how big are batches sent to DLP at once in bytes. + * + * The transform outputs {@link KV} of {@link String} (eg. filename) and {@link + * DeidentifyContentResponse}, which will contain {@link Table} of results for the user to consume. + */ +@Experimental +@AutoValue +public abstract class DLPDeidentifyText +extends PTransform< +PCollection>, PCollection>> { + + @Nullable + public abstract String inspectTemplateName(); + + @Nullable + public abstract String deidentifyTemplateName(); + + @Nullable + public abstract InspectConfig inspectConfig(); + + @Nullable + public abstract DeidentifyConfig deidentifyConfig(); + + @Nullable + public abstract PCollectionView> csvHeader(); + + @Nullable + public abstract String csvDelimiter(); + + public abstract Integer batchSize(); + + public abstract String projectId(); + + @AutoValue.Builder + public abstract static class Builder { +public abstract Builder setInspectTemplateName(String inspectTemplateName); + +public abstract Builder setCsvHeader(PCollectionView> csvHeader); + +public abstract Builder setCsvDelimiter(String delimiter); + +public abstract Builder setBatchSize(Integer batchSize); + +public abstract Builder setProjectId(String projectId); + +public abstract Builder setDeidentifyTemplateName(String deidentifyTemplateName); + +public abstract Builder setInspectConfig(InspectConfig inspectConfig); + +public abstract Builder setDeidentifyConfig(DeidentifyConfig deidentifyConfig); + +public abstract DLPDeidentifyText build(); + } + + public static DLPDeidentifyText.Builder newBuilder() { +return new AutoValue_DLPDeidentifyText.Builder(); + } + + /** + * The transform batches the contents of input PCollection and then calls Cloud DLP service to + * perform the deidentification. + * + * @param input input PCollection + * @return PCollection after transformations + */ + @Override + public PCollection> expand( + PCollection> input) { +return input +.apply(ParDo.of(new MapStringToDlpRow(csvDelimiter( +.apply("Batch Contents", ParDo.of(new BatchRequestForDLP(batchSize( +.apply( +"DLPDeidentify", +ParDo.of( +new
[GitHub] [beam] epicfaace closed pull request #11759: [BEAM-9926] Docs - show placeholder code snippets if code snippets are unavailable
epicfaace closed pull request #11759: URL: https://github.com/apache/beam/pull/11759 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] santhh commented on a change in pull request #11566: [BEAM-9723] Add DLP integration transforms
santhh commented on a change in pull request #11566: URL: https://github.com/apache/beam/pull/11566#discussion_r429297298 ## File path: sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/DLPDeidentifyText.java ## @@ -0,0 +1,215 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +import com.google.auto.value.AutoValue; +import com.google.cloud.dlp.v2.DlpServiceClient; +import com.google.privacy.dlp.v2.ContentItem; +import com.google.privacy.dlp.v2.DeidentifyConfig; +import com.google.privacy.dlp.v2.DeidentifyContentRequest; +import com.google.privacy.dlp.v2.DeidentifyContentResponse; +import com.google.privacy.dlp.v2.FieldId; +import com.google.privacy.dlp.v2.InspectConfig; +import com.google.privacy.dlp.v2.ProjectName; +import com.google.privacy.dlp.v2.Table; +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.stream.Collectors; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionView; + +/** + * A {@link PTransform} connecting to Cloud DLP and deidentifying text according to provided + * settings. The transform supports both CSV formatted input data and unstructured input. + * + * If the csvHeader property is set, csvDelimiter also should be, else the results will be + * incorrect. If csvHeader is not set, input is assumed to be unstructured. + * + * Either inspectTemplateName (String) or inspectConfig {@link InspectConfig} need to be set. The + * situation is the same with deidentifyTemplateName and deidentifyConfig ({@link DeidentifyConfig}. + * + * Batch size defines how big are batches sent to DLP at once in bytes. + * + * The transform outputs {@link KV} of {@link String} (eg. filename) and {@link + * DeidentifyContentResponse}, which will contain {@link Table} of results for the user to consume. + */ +@Experimental +@AutoValue +public abstract class DLPDeidentifyText +extends PTransform< +PCollection>, PCollection>> { + + @Nullable + public abstract String inspectTemplateName(); + + @Nullable + public abstract String deidentifyTemplateName(); + + @Nullable + public abstract InspectConfig inspectConfig(); + + @Nullable + public abstract DeidentifyConfig deidentifyConfig(); + + @Nullable + public abstract PCollectionView> csvHeader(); + + @Nullable + public abstract String csvDelimiter(); + + public abstract Integer batchSize(); + + public abstract String projectId(); + + @AutoValue.Builder + public abstract static class Builder { +public abstract Builder setInspectTemplateName(String inspectTemplateName); + +public abstract Builder setCsvHeader(PCollectionView> csvHeader); + +public abstract Builder setCsvDelimiter(String delimiter); + +public abstract Builder setBatchSize(Integer batchSize); + +public abstract Builder setProjectId(String projectId); + +public abstract Builder setDeidentifyTemplateName(String deidentifyTemplateName); + +public abstract Builder setInspectConfig(InspectConfig inspectConfig); + +public abstract Builder setDeidentifyConfig(DeidentifyConfig deidentifyConfig); + +public abstract DLPDeidentifyText build(); + } + + public static DLPDeidentifyText.Builder newBuilder() { +return new AutoValue_DLPDeidentifyText.Builder(); + } + + /** + * The transform batches the contents of input PCollection and then calls Cloud DLP service to + * perform the deidentification. + * + * @param input input PCollection + * @return PCollection after transformations + */ + @Override + public PCollection> expand( + PCollection> input) { +return input +.apply(ParDo.of(new MapStringToDlpRow(csvDelimiter( +.apply("Batch Contents", ParDo.of(new BatchRequestForDLP(batchSize( +.apply( +"DLPDeidentify", +ParDo.of( +new
[GitHub] [beam] santhh commented on a change in pull request #11566: [BEAM-9723] Add DLP integration transforms
santhh commented on a change in pull request #11566: URL: https://github.com/apache/beam/pull/11566#discussion_r429296219 ## File path: sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/DLPDeidentifyText.java ## @@ -0,0 +1,215 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +import com.google.auto.value.AutoValue; +import com.google.cloud.dlp.v2.DlpServiceClient; +import com.google.privacy.dlp.v2.ContentItem; +import com.google.privacy.dlp.v2.DeidentifyConfig; +import com.google.privacy.dlp.v2.DeidentifyContentRequest; +import com.google.privacy.dlp.v2.DeidentifyContentResponse; +import com.google.privacy.dlp.v2.FieldId; +import com.google.privacy.dlp.v2.InspectConfig; +import com.google.privacy.dlp.v2.ProjectName; +import com.google.privacy.dlp.v2.Table; +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.stream.Collectors; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionView; + +/** + * A {@link PTransform} connecting to Cloud DLP and deidentifying text according to provided + * settings. The transform supports both CSV formatted input data and unstructured input. + * + * If the csvHeader property is set, csvDelimiter also should be, else the results will be + * incorrect. If csvHeader is not set, input is assumed to be unstructured. + * + * Either inspectTemplateName (String) or inspectConfig {@link InspectConfig} need to be set. The + * situation is the same with deidentifyTemplateName and deidentifyConfig ({@link DeidentifyConfig}. + * + * Batch size defines how big are batches sent to DLP at once in bytes. Review comment: I think moving to the method definition will be more helpful. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pulasthi commented on pull request #10888: [BEAM-7304] Twister2 Beam runner
pulasthi commented on pull request #10888: URL: https://github.com/apache/beam/pull/10888#issuecomment-632725227 @iemejia Would we be able to merge this into the master so it would be included in the 2.23.0 release? let me know what else needs to happen on my end to proceed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] epicfaace opened a new pull request #11800: Fix typo copyLicenseScrips -> copyLicenseScripts
epicfaace opened a new pull request #11800: URL: https://github.com/apache/beam/pull/11800 R: @youngoli Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
[GitHub] [beam] kamilwu commented on pull request #11776: [BEAM-9421] Better documentation of output results from AnnotateText transform
kamilwu commented on pull request #11776: URL: https://github.com/apache/beam/pull/11776#issuecomment-632724995 R: @aaltay Could you take a look? Website staging is not working at the moment, fix is in progress: https://github.com/apache/beam/pull/11796 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pulasthi commented on pull request #10888: [BEAM-7304] Twister2 Beam runner
pulasthi commented on pull request #10888: URL: https://github.com/apache/beam/pull/10888#issuecomment-632724710 @RyanSkraba Thank you so much for your review and feedback This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] naipath commented on pull request #11799: [BEAM-10066] Added support for ValueProvider in RedisConnectionConfiguration
naipath commented on pull request #11799: URL: https://github.com/apache/beam/pull/11799#issuecomment-632724291 R: @jbonofre This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] epicfaace commented on pull request #11788: [BEAM-9785] Add Python 3.8 postcommit tests
epicfaace commented on pull request #11788: URL: https://github.com/apache/beam/pull/11788#issuecomment-632724234 Run Python PreCommit Run Python 3.7 PostCommit Run Python 3.8 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] epicfaace commented on a change in pull request #11788: [BEAM-9785] Add Python 3.8 postcommit tests
epicfaace commented on a change in pull request #11788: URL: https://github.com/apache/beam/pull/11788#discussion_r429283691 ## File path: .test-infra/jenkins/README.md ## @@ -84,6 +84,7 @@ Beam Jenkins overview page: [link](https://builds.apache.org/view/A-D/view/Beam/ | beam_PostCommit_Python35 | [cron](https://builds.apache.org/job/beam_PostCommit_Python35), [phrase](https://builds.apache.org/job/beam_PostCommit_Python35_PR/) | `Run Python 3.5 PostCommit` | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35) | | beam_PostCommit_Python36 | [cron](https://builds.apache.org/job/beam_PostCommit_Python36), [phrase](https://builds.apache.org/job/beam_PostCommit_Python36_PR/) | `Run Python 3.6 PostCommit` | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36) | | beam_PostCommit_Python37 | [cron](https://builds.apache.org/job/beam_PostCommit_Python37), [phrase](https://builds.apache.org/job/beam_PostCommit_Python37_PR/) | `Run Python 3.7 PostCommit` | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37) | +| beam_PostCommit_Python38 | [cron](https://builds.apache.org/job/beam_PostCommit_Python38), [phrase](https://builds.apache.org/job/beam_PostCommit_Python38_PR/) | `Run Python 3.7 PostCommit` | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python38/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python38) | Review comment: ```suggestion | beam_PostCommit_Python38 | [cron](https://builds.apache.org/job/beam_PostCommit_Python38), [phrase](https://builds.apache.org/job/beam_PostCommit_Python38_PR/) | `Run Python 3.8 PostCommit` | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python38/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python38) | ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pulasthi commented on a change in pull request #10888: [BEAM-7304] Twister2 Beam runner
pulasthi commented on a change in pull request #10888: URL: https://github.com/apache/beam/pull/10888#discussion_r429283778 ## File path: runners/twister2/src/main/java/org/apache/beam/runners/twister2/package-info.java ## @@ -0,0 +1,20 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** Internal implementation of the Beam runner for Apache Twister2. */ Review comment: Fixed it, it is good that you caught that, that would go unfixed for a long time otherwise This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] epicfaace commented on pull request #11788: [BEAM-9785] Add Python 3.8 postcommit tests
epicfaace commented on pull request #11788: URL: https://github.com/apache/beam/pull/11788#issuecomment-632723409 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] naipath opened a new pull request #11799: Added support for the ValueProvider for RedisConnectionConfiguration
naipath opened a new pull request #11799: URL: https://github.com/apache/beam/pull/11799 This pull request adds support for `ValueProvider` for RedisConnectionConfiguration. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build
[GitHub] [beam] epicfaace opened a new pull request #11798: [BEAM-7370] Upgrade sphinx to 3.0.3
epicfaace opened a new pull request #11798: URL: https://github.com/apache/beam/pull/11798 R: @udim. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
[GitHub] [beam] epicfaace opened a new pull request #11797: [BEAM-10065] Fix beam release guide template
epicfaace opened a new pull request #11797: URL: https://github.com/apache/beam/pull/11797 Fix beam release guide template so that it shows up, and so that it also highlights syntax in markdown. It turns out, the `` part of the template was what was causing the display issues. R: @ibzib ![image](https://user-images.githubusercontent.com/1689183/82671081-3f136e00-9c0c-11ea-93eb-732517491a17.png) Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[GitHub] [beam] stale[bot] commented on pull request #11186: [BEAM-9564] Remove insecure ssl options from MongoDBIO
stale[bot] commented on pull request #11186: URL: https://github.com/apache/beam/pull/11186#issuecomment-632681994 This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the d...@beam.apache.org list. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] mwalenia commented on a change in pull request #11566: [BEAM-9723] Add DLP integration transforms
mwalenia commented on a change in pull request #11566: URL: https://github.com/apache/beam/pull/11566#discussion_r429213975 ## File path: sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/DLPDeidentifyText.java ## @@ -0,0 +1,215 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +import com.google.auto.value.AutoValue; +import com.google.cloud.dlp.v2.DlpServiceClient; +import com.google.privacy.dlp.v2.ContentItem; +import com.google.privacy.dlp.v2.DeidentifyConfig; +import com.google.privacy.dlp.v2.DeidentifyContentRequest; +import com.google.privacy.dlp.v2.DeidentifyContentResponse; +import com.google.privacy.dlp.v2.FieldId; +import com.google.privacy.dlp.v2.InspectConfig; +import com.google.privacy.dlp.v2.ProjectName; +import com.google.privacy.dlp.v2.Table; +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.stream.Collectors; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionView; + +/** + * A {@link PTransform} connecting to Cloud DLP and deidentifying text according to provided + * settings. The transform supports both CSV formatted input data and unstructured input. + * + * If the csvHeader property is set, csvDelimiter also should be, else the results will be + * incorrect. If csvHeader is not set, input is assumed to be unstructured. Review comment: Thanks for this remark, I added early validation to the code. ## File path: sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/DLPDeidentifyText.java ## @@ -0,0 +1,215 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +import com.google.auto.value.AutoValue; +import com.google.cloud.dlp.v2.DlpServiceClient; +import com.google.privacy.dlp.v2.ContentItem; +import com.google.privacy.dlp.v2.DeidentifyConfig; +import com.google.privacy.dlp.v2.DeidentifyContentRequest; +import com.google.privacy.dlp.v2.DeidentifyContentResponse; +import com.google.privacy.dlp.v2.FieldId; +import com.google.privacy.dlp.v2.InspectConfig; +import com.google.privacy.dlp.v2.ProjectName; +import com.google.privacy.dlp.v2.Table; +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.stream.Collectors; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionView; + +/** + * A {@link PTransform} connecting to Cloud DLP and deidentifying text according to provided + * settings. The transform supports both CSV formatted input data and unstructured input. + * + * If the csvHeader property is set, csvDelimiter also should be, else the results will be + * incorrect. If csvHeader is not set, input is assumed to be unstructured. + * + * Either inspectTemplateName (String) or
[GitHub] [beam] mwalenia commented on a change in pull request #11566: [BEAM-9723] Add DLP integration transforms
mwalenia commented on a change in pull request #11566: URL: https://github.com/apache/beam/pull/11566#discussion_r429213709 ## File path: sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/BatchRequestForDLP.java ## @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +import com.google.privacy.dlp.v2.Table; +import java.util.ArrayList; +import java.util.List; +import java.util.concurrent.atomic.AtomicInteger; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Metrics; +import org.apache.beam.sdk.state.BagState; +import org.apache.beam.sdk.state.StateSpec; +import org.apache.beam.sdk.state.StateSpecs; +import org.apache.beam.sdk.state.TimeDomain; +import org.apache.beam.sdk.state.Timer; +import org.apache.beam.sdk.state.TimerSpec; +import org.apache.beam.sdk.state.TimerSpecs; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; +import org.apache.beam.sdk.values.KV; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * DoFn batching the input PCollection into bigger requests in order to better utilize the Cloud DLP + * service. + */ +@Experimental +class BatchRequestForDLP extends DoFn, KV>> { + public static final Logger LOG = LoggerFactory.getLogger(BatchRequestForDLP.class); + + private final Counter numberOfRowsBagged = + Metrics.counter(BatchRequestForDLP.class, "numberOfRowsBagged"); + private final Integer batchSize; + + @StateId("elementsBag") + private final StateSpec>> elementsBag = StateSpecs.bag(); + + @TimerId("eventTimer") + private final TimerSpec eventTimer = TimerSpecs.timer(TimeDomain.EVENT_TIME); + + public BatchRequestForDLP(Integer batchSize) { +this.batchSize = batchSize; + } + + @ProcessElement + public void process( + @Element KV element, + @StateId("elementsBag") BagState> elementsBag, + @TimerId("eventTimer") Timer eventTimer, + BoundedWindow w) { +elementsBag.add(element); +eventTimer.set(w.maxTimestamp()); + } + + @OnTimer("eventTimer") + public void onTimer( + @StateId("elementsBag") BagState> elementsBag, + OutputReceiver>> output) { +String key = elementsBag.read().iterator().next().getKey(); +AtomicInteger bufferSize = new AtomicInteger(); +List rows = new ArrayList<>(); +elementsBag +.read() +.forEach( +element -> { + int elementSize = element.getValue().getSerializedSize(); + boolean clearBuffer = bufferSize.intValue() + elementSize > batchSize; + if (clearBuffer) { +numberOfRowsBagged.inc(rows.size()); +LOG.debug("Clear Buffer {} , Key {}", bufferSize.intValue(), element.getKey()); +output.output(KV.of(element.getKey(), rows)); +rows.clear(); +bufferSize.set(0); + } + rows.add(element.getValue()); + bufferSize.getAndAdd(element.getValue().getSerializedSize()); +}); +if (!rows.isEmpty()) { + LOG.debug("Remaining rows {}", rows.size()); Review comment: Done, thanks ## File path: sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/DLPTextOperationsIT.java ## @@ -0,0 +1,154 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing
[GitHub] [beam] mwalenia commented on a change in pull request #11566: [BEAM-9723] Add DLP integration transforms
mwalenia commented on a change in pull request #11566: URL: https://github.com/apache/beam/pull/11566#discussion_r429213646 ## File path: sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/BatchRequestForDLP.java ## @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +import com.google.privacy.dlp.v2.Table; +import java.util.ArrayList; +import java.util.List; +import java.util.concurrent.atomic.AtomicInteger; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Metrics; +import org.apache.beam.sdk.state.BagState; +import org.apache.beam.sdk.state.StateSpec; +import org.apache.beam.sdk.state.StateSpecs; +import org.apache.beam.sdk.state.TimeDomain; +import org.apache.beam.sdk.state.Timer; +import org.apache.beam.sdk.state.TimerSpec; +import org.apache.beam.sdk.state.TimerSpecs; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; +import org.apache.beam.sdk.values.KV; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * DoFn batching the input PCollection into bigger requests in order to better utilize the Cloud DLP + * service. + */ +@Experimental +class BatchRequestForDLP extends DoFn, KV>> { + public static final Logger LOG = LoggerFactory.getLogger(BatchRequestForDLP.class); + + private final Counter numberOfRowsBagged = + Metrics.counter(BatchRequestForDLP.class, "numberOfRowsBagged"); + private final Integer batchSize; + + @StateId("elementsBag") + private final StateSpec>> elementsBag = StateSpecs.bag(); + + @TimerId("eventTimer") + private final TimerSpec eventTimer = TimerSpecs.timer(TimeDomain.EVENT_TIME); + + public BatchRequestForDLP(Integer batchSize) { +this.batchSize = batchSize; + } + + @ProcessElement + public void process( + @Element KV element, + @StateId("elementsBag") BagState> elementsBag, + @TimerId("eventTimer") Timer eventTimer, + BoundedWindow w) { +elementsBag.add(element); +eventTimer.set(w.maxTimestamp()); + } + + @OnTimer("eventTimer") + public void onTimer( + @StateId("elementsBag") BagState> elementsBag, + OutputReceiver>> output) { +String key = elementsBag.read().iterator().next().getKey(); Review comment: Ok, done! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] mwalenia commented on pull request #11566: [BEAM-9723] Add DLP integration transforms
mwalenia commented on pull request #11566: URL: https://github.com/apache/beam/pull/11566#issuecomment-632663593 @tysonjh What do you think about it now? I added Javadocs and some tests - do you feel some tests are still missing? Let me know! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] mwalenia commented on a change in pull request #11566: [BEAM-9723] Add DLP integration transforms
mwalenia commented on a change in pull request #11566: URL: https://github.com/apache/beam/pull/11566#discussion_r429202254 ## File path: sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/DLPDeidentifyText.java ## @@ -0,0 +1,215 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.ml; + +import com.google.auto.value.AutoValue; +import com.google.cloud.dlp.v2.DlpServiceClient; +import com.google.privacy.dlp.v2.ContentItem; +import com.google.privacy.dlp.v2.DeidentifyConfig; +import com.google.privacy.dlp.v2.DeidentifyContentRequest; +import com.google.privacy.dlp.v2.DeidentifyContentResponse; +import com.google.privacy.dlp.v2.FieldId; +import com.google.privacy.dlp.v2.InspectConfig; +import com.google.privacy.dlp.v2.ProjectName; +import com.google.privacy.dlp.v2.Table; +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.stream.Collectors; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionView; + +/** + * A {@link PTransform} connecting to Cloud DLP and deidentifying text according to provided + * settings. The transform supports both CSV formatted input data and unstructured input. + * + * If the csvHeader property is set, csvDelimiter also should be, else the results will be + * incorrect. If csvHeader is not set, input is assumed to be unstructured. + * + * Either inspectTemplateName (String) or inspectConfig {@link InspectConfig} need to be set. The + * situation is the same with deidentifyTemplateName and deidentifyConfig ({@link DeidentifyConfig}. + * + * Batch size defines how big are batches sent to DLP at once in bytes. + * + * The transform outputs {@link KV} of {@link String} (eg. filename) and {@link + * DeidentifyContentResponse}, which will contain {@link Table} of results for the user to consume. + */ +@Experimental +@AutoValue +public abstract class DLPDeidentifyText +extends PTransform< +PCollection>, PCollection>> { + + @Nullable + public abstract String inspectTemplateName(); Review comment: Sure, I'll add comments This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] kgabryje commented on pull request #11796: [BEAM-10003] Use local code for building code samples on website
kgabryje commented on pull request #11796: URL: https://github.com/apache/beam/pull/11796#issuecomment-632648552 Hi @rezarokni , this PR solves the issue submitted by you. Can you take a look? Also @pabloem , I'd appreciate if you took a look as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] kgabryje opened a new pull request #11796: [BEAM-10003] Use local code for building code samples on website
kgabryje opened a new pull request #11796: URL: https://github.com/apache/beam/pull/11796 Currently code samples are fetched from Beam Github's master branch. So in order to make change in code and then write a document piece with code sample, it was necessary create 2 PRs - one with change in code and another with change in documentation. This PR solves this issue by using local code to build code samples for documentation instead of fetching it from Github. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[GitHub] [beam] kamilwu commented on pull request #11795: [BEAM-10058] Provide less strict assertion to make the test more resistant against future changes in a model
kamilwu commented on pull request #11795: URL: https://github.com/apache/beam/pull/11795#issuecomment-632647909 R: @TheNeuralBit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] kamilwu commented on pull request #11795: [BEAM-10058] Provide less strict assertion to make the test more resistant against future changes in a model
kamilwu commented on pull request #11795: URL: https://github.com/apache/beam/pull/11795#issuecomment-632631059 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] purbanow commented on pull request #11794: [BEAM-9894] Add batch SnowflakeIO.Write to Java SDK
purbanow commented on pull request #11794: URL: https://github.com/apache/beam/pull/11794#issuecomment-632624409 Hi @RyanSkraba will you find a moment to make a CR for this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] RyanSkraba commented on pull request #10888: [BEAM-7304] Twister2 Beam runner
RyanSkraba commented on pull request #10888: URL: https://github.com/apache/beam/pull/10888#issuecomment-632621478 I ran the `validatesRunner` tests again and the partition /tmp directory wasn't filled with zip files! Thanks for your speedy attention, LGTM! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] kamilwu commented on pull request #11795: [BEAM-10058] Provide less strict assertion to make the test more resistant against future changes in a model
kamilwu commented on pull request #11795: URL: https://github.com/apache/beam/pull/11795#issuecomment-632612022 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] kamilwu opened a new pull request #11795: [BEAM-10058] Provide less strict assertion to make the test more resistant against future changes in a model
kamilwu opened a new pull request #11795: URL: https://github.com/apache/beam/pull/11795 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
[GitHub] [beam] RyanSkraba commented on a change in pull request #10888: [BEAM-7304] Twister2 Beam runner
RyanSkraba commented on a change in pull request #10888: URL: https://github.com/apache/beam/pull/10888#discussion_r429155712 ## File path: runners/twister2/src/main/java/org/apache/beam/runners/twister2/package-info.java ## @@ -0,0 +1,20 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** Internal implementation of the Beam runner for Apache Twister2. */ Review comment: I am SO sorry, such a minor nitpick but probably the right time to fix it! You probably want to remove or move Apache here. ```suggestion /** Internal implementation of the Beam runner for Twister2. */ ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] RyanSkraba commented on a change in pull request #10888: [BEAM-7304] Twister2 Beam runner
RyanSkraba commented on a change in pull request #10888: URL: https://github.com/apache/beam/pull/10888#discussion_r429149206 ## File path: runners/twister2/src/main/java/org/apache/beam/runners/twister2/BeamBatchWorker.java ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.twister2; + +import edu.iu.dsc.tws.api.config.Config; +import edu.iu.dsc.tws.api.tset.TBase; +import edu.iu.dsc.tws.api.tset.sets.TSet; +import edu.iu.dsc.tws.api.tset.sets.batch.BatchTSet; +import edu.iu.dsc.tws.tset.TBaseGraph; +import edu.iu.dsc.tws.tset.env.BatchTSetEnvironment; +import edu.iu.dsc.tws.tset.links.BaseTLink; +import edu.iu.dsc.tws.tset.sets.BaseTSet; +import edu.iu.dsc.tws.tset.sets.BuildableTSet; +import edu.iu.dsc.tws.tset.sets.batch.CachedTSet; +import edu.iu.dsc.tws.tset.sets.batch.ComputeTSet; +import edu.iu.dsc.tws.tset.sets.batch.SinkTSet; +import edu.iu.dsc.tws.tset.worker.BatchTSetIWorker; +import java.io.Serializable; +import java.util.ArrayDeque; +import java.util.Deque; +import java.util.HashMap; +import java.util.HashSet; +import java.util.LinkedHashMap; +import java.util.Map; +import java.util.Set; +import org.apache.beam.runners.twister2.translators.functions.DoFnFunction; +import org.apache.beam.runners.twister2.translators.functions.Twister2SinkFunction; + +/** + * The Twister2 worker that will execute the job logic once the job is submitted from the run + * method. + */ +public class BeamBatchWorker implements Serializable, BatchTSetIWorker { + + private static final String SIDEINPUTS = "sideInputs"; + private static final String LEAVES = "leaves"; + private static final String GRAPH = "graph"; + private HashMap> sideInputDataSets; + private Set leaves; Review comment: Of course! Objectively, this is **_only_** a preference, I'm OK for leaving it for a subsequent improvement/cleanup. ## File path: runners/twister2/src/main/java/org/apache/beam/runners/twister2/Twister2LegacyRunner.java ## @@ -0,0 +1,339 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.twister2; + +import static org.apache.beam.runners.core.construction.resources.PipelineResources.detectClassPathResourcesToStage; + +import edu.iu.dsc.tws.api.JobConfig; +import edu.iu.dsc.tws.api.Twister2Job; +import edu.iu.dsc.tws.api.config.Config; +import edu.iu.dsc.tws.api.driver.DriverJobState; +import edu.iu.dsc.tws.api.exceptions.Twister2RuntimeException; +import edu.iu.dsc.tws.api.scheduler.Twister2JobState; +import edu.iu.dsc.tws.api.tset.sets.TSet; +import edu.iu.dsc.tws.api.tset.sets.batch.BatchTSet; +import edu.iu.dsc.tws.local.LocalSubmitter; +import edu.iu.dsc.tws.rsched.core.ResourceAllocator; +import edu.iu.dsc.tws.rsched.job.Twister2Submitter; +import java.io.File; +import java.io.FileInputStream; +import java.io.FileNotFoundException; +import java.io.FileOutputStream; +import java.io.IOException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.HashSet; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.UUID; +import java.util.logging.LogManager; +import java.util.logging.Logger; +import java.util.stream.Collectors; +import java.util.zip.ZipEntry; +import java.util.zip.ZipOutputStream; +import org.apache.beam.runners.core.construction.PTransformMatchers; +import
[GitHub] [beam] mwalenia commented on pull request #11775: [BEAM-10050] Change labels checked in VideoIntelligenceIT
mwalenia commented on pull request #11775: URL: https://github.com/apache/beam/pull/11775#issuecomment-632601472 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] mwalenia commented on pull request #11775: [BEAM-10050] Change labels checked in VideoIntelligenceIT
mwalenia commented on pull request #11775: URL: https://github.com/apache/beam/pull/11775#issuecomment-632580694 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] mwalenia commented on pull request #11775: [BEAM-10050] Change labels checked in VideoIntelligenceIT
mwalenia commented on pull request #11775: URL: https://github.com/apache/beam/pull/11775#issuecomment-632580795 First precommit passed, launching second This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] mxm commented on pull request #11777: [BEAM-10054] Fix watermark hold for on_time_pane
mxm commented on pull request #11777: URL: https://github.com/apache/beam/pull/11777#issuecomment-632572299 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] purbanow commented on a change in pull request #11794: [BEAM-9894] Add batch SnowflakeIO.Write to Java SDK
purbanow commented on a change in pull request #11794: URL: https://github.com/apache/beam/pull/11794#discussion_r429105275 ## File path: sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/SnowflakeServiceImpl.java ## @@ -1,90 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.io.snowflake; - -import java.sql.Connection; -import java.sql.PreparedStatement; -import java.sql.ResultSet; -import java.sql.SQLException; -import java.util.function.Consumer; -import javax.sql.DataSource; -import org.apache.beam.sdk.transforms.SerializableFunction; - -/** - * Implemenation of {@link org.apache.beam.sdk.io.snowflake.SnowflakeService} used in production. - */ -public class SnowflakeServiceImpl implements SnowflakeService { Review comment: Note: This file was moved to `services/SnowflakeServiceImpl.java` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] purbanow commented on a change in pull request #11794: [BEAM-9894] Add batch SnowflakeIO.Write to Java SDK
purbanow commented on a change in pull request #11794: URL: https://github.com/apache/beam/pull/11794#discussion_r429105275 ## File path: sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/SnowflakeServiceImpl.java ## @@ -1,90 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.beam.sdk.io.snowflake; - -import java.sql.Connection; -import java.sql.PreparedStatement; -import java.sql.ResultSet; -import java.sql.SQLException; -import java.util.function.Consumer; -import javax.sql.DataSource; -import org.apache.beam.sdk.transforms.SerializableFunction; - -/** - * Implemenation of {@link org.apache.beam.sdk.io.snowflake.SnowflakeService} used in production. - */ -public class SnowflakeServiceImpl implements SnowflakeService { Review comment: Note: This file moved to `services/SnowflakeServiceImpl.java` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] iemejia commented on pull request #11780: [BEAM-9948] Uploading mascot to the website
iemejia commented on pull request #11780: URL: https://github.com/apache/beam/pull/11780#issuecomment-632558904 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] iemejia commented on pull request #11780: [BEAM-9948] Uploading mascot to the website
iemejia commented on pull request #11780: URL: https://github.com/apache/beam/pull/11780#issuecomment-632558819 Run CommunityMetrics PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] purbanow opened a new pull request #11794: [BEAM-9894] Add batch SnowflakeIO.Write to Java SDK
purbanow opened a new pull request #11794: URL: https://github.com/apache/beam/pull/11794 This PR is part of adding SnowflakeIO to Java SDK [BEAM-9893](https://issues.apache.org/jira/browse/BEAM-9893). Precisely this PR is adding write operation to SnowflakeIO [BEAM-9894](https://issues.apache.org/jira/browse/BEAM-9894). The SnowflakeIO.Write works in the way that puts data on GCS as CSV files and then uses Snowflake's JDBC driver to run [COPY INTO TABLE](https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html) statement to move CSV files from GCS to Snowflake table. As mentioned in the previous [PR](https://github.com/apache/beam/pull/11360), next PR’s will contain integration tests, streaming and cross-language support. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
[GitHub] [beam] piotr-szuberski edited a comment on pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …
piotr-szuberski edited a comment on pull request #11661: URL: https://github.com/apache/beam/pull/11661#issuecomment-626801208 @markflyhigh Could I ask you for review? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski edited a comment on pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …
piotr-szuberski edited a comment on pull request #11661: URL: https://github.com/apache/beam/pull/11661#issuecomment-63279 @tvalentyn Thank you much for your review! I made the requested changes. I'm still not sure about the part of the dashboards so please add a word whether my answer is relevant. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski commented on pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …
piotr-szuberski commented on pull request #11661: URL: https://github.com/apache/beam/pull/11661#issuecomment-63279 @tvalentyn I made the requested changes. I'm not sure about the part of dashboards so please add a word whether my answer is sufficient. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski edited a comment on pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …
piotr-szuberski edited a comment on pull request #11661: URL: https://github.com/apache/beam/pull/11661#issuecomment-63279 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski edited a comment on pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …
piotr-szuberski edited a comment on pull request #11661: URL: https://github.com/apache/beam/pull/11661#issuecomment-63279 @tvalentyn Thank you much for your review! I made the requested changes. I'm still not sure about the part of the dashboards so please add a word whether my answer is sufficient. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski commented on a change in pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …
piotr-szuberski commented on a change in pull request #11661: URL: https://github.com/apache/beam/pull/11661#discussion_r429095714 ## File path: sdks/python/test-suites/dataflow/common.gradle ## @@ -109,4 +109,21 @@ task validatesRunnerStreamingTests { args '-c', ". ${envdir}/bin/activate && ${runScriptsDir}/run_integration_test.sh $cmdArgs" } } -} \ No newline at end of file +} + +task runPerformanceTest { +dependsOn 'installGcpTest' +dependsOn ':sdks:python:sdist' + +def test = project.findProperty('test') +def testOpts = project.findProperty('test-pipeline-options') +testOpts += " --sdk_location=${files(configurations.distTarBall.files).singleFile}" + + doLast { +exec { + workingDir "${project.rootDir}/sdks/python" + executable 'sh' + args '-c', ". ${envdir}/bin/activate && ${envdir}/bin/python setup.py nosetests --tests=${test} --test-pipeline-options=\"${testOpts}\" --ignore-files \'.*py3\\d?\\.py\$\'" Review comment: I wanted to be on the safe side because in most places (not only in the bash scripts) it is added. But when I think about it we absolutely don't need to ignore anything, we run just one test. I'll remove it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski commented on a change in pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …
piotr-szuberski commented on a change in pull request #11661: URL: https://github.com/apache/beam/pull/11661#discussion_r429094648 ## File path: sdks/python/test-suites/dataflow/common.gradle ## @@ -109,4 +109,21 @@ task validatesRunnerStreamingTests { args '-c', ". ${envdir}/bin/activate && ${runScriptsDir}/run_integration_test.sh $cmdArgs" } } -} \ No newline at end of file +} + +task runPerformanceTest { +dependsOn 'installGcpTest' +dependsOn ':sdks:python:sdist' + +def test = project.findProperty('test') +def testOpts = project.findProperty('test-pipeline-options') +testOpts += " --sdk_location=${files(configurations.distTarBall.files).singleFile}" + + doLast { +exec { + workingDir "${project.rootDir}/sdks/python" + executable 'sh' + args '-c', ". ${envdir}/bin/activate && ${envdir}/bin/python setup.py nosetests --tests=${test} --test-pipeline-options=\"${testOpts}\" --ignore-files \'.*py3\\d?\\.py\$\'" Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski commented on a change in pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …
piotr-szuberski commented on a change in pull request #11661: URL: https://github.com/apache/beam/pull/11661#discussion_r429088765 ## File path: .test-infra/jenkins/job_PerformanceTests_Python.groovy ## @@ -58,117 +26,59 @@ def dataflowPipelineArgs = [ temp_location : 'gs://temp-storage-for-end-to-end-tests/temp-it', ] - -// Configurations of each Jenkins job. -def testConfigurations = [ -new PerformanceTestConfigurations( -jobName : 'beam_PerformanceTests_WordCountIT_Py27', -jobDescription: 'Python SDK Performance Test - Run WordCountIT in Py27 with 1Gb files', -jobTriggerPhrase : 'Run Python27 WordCountIT Performance Test', -resultTable : 'beam_performance.wordcount_py27_pkb_results', -test : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it', -itModule : ':sdks:python:test-suites:dataflow:py2', -extraPipelineArgs : dataflowPipelineArgs + [ -input: 'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.*', // 1Gb -output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output', -expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710', -num_workers: '10', -autoscaling_algorithm: 'NONE', // Disable autoscale the worker pool. -], -), -new PerformanceTestConfigurations( -jobName : 'beam_PerformanceTests_WordCountIT_Py35', -jobDescription: 'Python SDK Performance Test - Run WordCountIT in Py35 with 1Gb files', -jobTriggerPhrase : 'Run Python35 WordCountIT Performance Test', -resultTable : 'beam_performance.wordcount_py35_pkb_results', -test : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it', -itModule : ':sdks:python:test-suites:dataflow:py35', -extraPipelineArgs : dataflowPipelineArgs + [ -input: 'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.*', // 1Gb -output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output', -expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710', -num_workers: '10', -autoscaling_algorithm: 'NONE', // Disable autoscale the worker pool. -], -), -new PerformanceTestConfigurations( -jobName : 'beam_PerformanceTests_WordCountIT_Py36', -jobDescription: 'Python SDK Performance Test - Run WordCountIT in Py36 with 1Gb files', -jobTriggerPhrase : 'Run Python36 WordCountIT Performance Test', -resultTable : 'beam_performance.wordcount_py36_pkb_results', -test : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it', -itModule : ':sdks:python:test-suites:dataflow:py36', -extraPipelineArgs : dataflowPipelineArgs + [ -input: 'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.*', // 1Gb -output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output', -expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710', -num_workers: '10', -autoscaling_algorithm: 'NONE', // Disable autoscale the worker pool. -], -), -new PerformanceTestConfigurations( -jobName : 'beam_PerformanceTests_WordCountIT_Py37', -jobDescription: 'Python SDK Performance Test - Run WordCountIT in Py37 with 1Gb files', -jobTriggerPhrase : 'Run Python37 WordCountIT Performance Test', -resultTable : 'beam_performance.wordcount_py37_pkb_results', -test : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it', -itModule : ':sdks:python:test-suites:dataflow:py37', -extraPipelineArgs : dataflowPipelineArgs + [ -input: 'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.*', // 1Gb -output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output', -expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710', -num_workers: '10', -autoscaling_algorithm: 'NONE', // Disable autoscale the worker pool. -], -), -] - +testConfigurations = [] +pythonVersions = ['27', '35', '36', '37'] + +for (pythonVersion in pythonVersions) { Review comment: I'm not sure if I understand meaning of "dashboards" correctly here, but we are running tasks via proper python modules that have pythonVersion variable already set up, so there is no need to set -PpythonVersion manually. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please
[GitHub] [beam] piotr-szuberski commented on a change in pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …
piotr-szuberski commented on a change in pull request #11661: URL: https://github.com/apache/beam/pull/11661#discussion_r429088004 ## File path: .test-infra/jenkins/job_PerformanceTests_Python.groovy ## @@ -58,117 +26,59 @@ def dataflowPipelineArgs = [ temp_location : 'gs://temp-storage-for-end-to-end-tests/temp-it', ] - -// Configurations of each Jenkins job. -def testConfigurations = [ -new PerformanceTestConfigurations( -jobName : 'beam_PerformanceTests_WordCountIT_Py27', -jobDescription: 'Python SDK Performance Test - Run WordCountIT in Py27 with 1Gb files', -jobTriggerPhrase : 'Run Python27 WordCountIT Performance Test', -resultTable : 'beam_performance.wordcount_py27_pkb_results', -test : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it', -itModule : ':sdks:python:test-suites:dataflow:py2', -extraPipelineArgs : dataflowPipelineArgs + [ -input: 'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.*', // 1Gb -output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output', -expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710', -num_workers: '10', -autoscaling_algorithm: 'NONE', // Disable autoscale the worker pool. -], -), -new PerformanceTestConfigurations( -jobName : 'beam_PerformanceTests_WordCountIT_Py35', -jobDescription: 'Python SDK Performance Test - Run WordCountIT in Py35 with 1Gb files', -jobTriggerPhrase : 'Run Python35 WordCountIT Performance Test', -resultTable : 'beam_performance.wordcount_py35_pkb_results', -test : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it', -itModule : ':sdks:python:test-suites:dataflow:py35', -extraPipelineArgs : dataflowPipelineArgs + [ -input: 'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.*', // 1Gb -output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output', -expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710', -num_workers: '10', -autoscaling_algorithm: 'NONE', // Disable autoscale the worker pool. -], -), -new PerformanceTestConfigurations( -jobName : 'beam_PerformanceTests_WordCountIT_Py36', -jobDescription: 'Python SDK Performance Test - Run WordCountIT in Py36 with 1Gb files', -jobTriggerPhrase : 'Run Python36 WordCountIT Performance Test', -resultTable : 'beam_performance.wordcount_py36_pkb_results', -test : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it', -itModule : ':sdks:python:test-suites:dataflow:py36', -extraPipelineArgs : dataflowPipelineArgs + [ -input: 'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.*', // 1Gb -output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output', -expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710', -num_workers: '10', -autoscaling_algorithm: 'NONE', // Disable autoscale the worker pool. -], -), -new PerformanceTestConfigurations( -jobName : 'beam_PerformanceTests_WordCountIT_Py37', -jobDescription: 'Python SDK Performance Test - Run WordCountIT in Py37 with 1Gb files', -jobTriggerPhrase : 'Run Python37 WordCountIT Performance Test', -resultTable : 'beam_performance.wordcount_py37_pkb_results', -test : 'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it', -itModule : ':sdks:python:test-suites:dataflow:py37', -extraPipelineArgs : dataflowPipelineArgs + [ -input: 'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.*', // 1Gb -output: 'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output', -expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710', -num_workers: '10', -autoscaling_algorithm: 'NONE', // Disable autoscale the worker pool. -], -), -] - +testConfigurations = [] +pythonVersions = ['27', '35', '36', '37'] Review comment: I tried to keep the effect of the job as close to original as possible. I agree that 2 versions of python sound sufficient. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski commented on a change in pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …
piotr-szuberski commented on a change in pull request #11661: URL: https://github.com/apache/beam/pull/11661#discussion_r429087219 ## File path: sdks/python/test-suites/dataflow/py2/build.gradle ## @@ -205,3 +205,20 @@ task chicagoTaxiExample { } } } + +task runPerformanceTest { Review comment: Sure, there can even be more code moved to common.gradle This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ihji commented on pull request #11793: [BEAM-10064] Fix google3 import error for BEAM-9383
ihji commented on pull request #11793: URL: https://github.com/apache/beam/pull/11793#issuecomment-632517277 R: @angoenka This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ihji opened a new pull request #11793: [BEAM-10064] Fix google3 import error for BEAM-9383
ihji opened a new pull request #11793: URL: https://github.com/apache/beam/pull/11793 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
[GitHub] [beam] chadrik commented on a change in pull request #11070: [BEAM-8280] Blog post: Python typing changes
chadrik commented on a change in pull request #11070: URL: https://github.com/apache/beam/pull/11070#discussion_r429063034 ## File path: website/src/_posts/2020-03-06-python-typing.md ## @@ -0,0 +1,117 @@ +--- +layout: post +title: "Python SDK Typing Changes" +date: 2020-03-06 00:00:01 -0800 +excerpt_separator: +categories: blog python typing +authors: + - chadrik + - udim + +--- + + +TODO excerpt + + + +Python supports type annotations on functions (PEP 484). Static type checkers, +such as mypy, are used to verify adherence to these types. +For example: +```py +def f(v: int) -> int: + return v[0] +``` +Running mypy on the above code will give the error: +`Value of type "int" is not indexable`. + +We've recently made changes to Beam in 2 areas: + +Adding type hints throughout Beam. TODO expand + +Second, we've added support for Python 3 type annotations. This allows SDK +users to specify a DoFn's type hints in one place. +We've also expanded Beam's support of `typing` module types. + +For more background see: +[Ensuring Python Type Safety](https://beam.apache.org/documentation/sdks/python-type-safety/). + +# Beam Is Typed + +TODO + +# New Ways to Annotate + +## Python 3 Syntax Annotations + +Coming in Beam 2.21 (BEAM-8280), you will be able to use Python annotation +syntax to specify input and output types. + +For example, this new form: +```py +class MyDoFn(beam.DoFn): + def process(self, element: int) -> typing.Text: +yield str(element) +``` +is equivalent to this: +```py +@beam.typehints.with_input_types(int) +@beam.typehints.with_output_types(typing.Text) +class MyDoFn(beam.DoFn): + def process(self, element): +yield str(element) +``` + +One of the advantages of the new form is that you may already be using it +in tandem with a static type checker such as mypy, thus getting additional +type checking for free. + +This feature will be enabled by default, and there will be 2 mechanisms in +place to disable it: +1. Calling `apache_beam.typehints.disable_type_annotations()` before pipeline +construction will disable the new feature completely. +1. Decorating a function with `@apache_beam.typehints.no_annotations` will +tell Beam to ignore annotations for it. + +Uses of Beam's `with_input_type`, `with_output_type` methods and decorators will +still work and take precedence over annotations. + +Sidebar: + +> You might ask: couldn't we use mypy to type check Beam pipelines? The main issue +is that such a tool would have to understand type relations between +pipeline graph nodes, e.g., that the type of element passed to a transform +should be consistent with its annotated input type. Review comment: I went very deep on making transforms and collections generic, and I got pretty close to making it work, but there was still a need for a plug-in. IIRC, it wasn’t possible for mypy to propagate all of the type information when the apply() method was used. When we get past this current push I’ll revive that old experiment and show you where I got. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] udim commented on pull request #11070: [BEAM-8280] Blog post: Python typing changes
udim commented on pull request #11070: URL: https://github.com/apache/beam/pull/11070#issuecomment-632506918 CC: @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] udim commented on pull request #11070: [BEAM-8280] Blog post: Python typing changes
udim commented on pull request #11070: URL: https://github.com/apache/beam/pull/11070#issuecomment-632506706 PTAL. Preview is here: http://apache-beam-website-pull-requests.storage.googleapis.com/11070/blog/python-typing/index.html This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org