[jira] [Work logged] (BEAM-9678) Introduction Kata | Go SDK Code Katas
[ https://issues.apache.org/jira/browse/BEAM-9678?focusedWorklogId=423245=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423245 ] ASF GitHub Bot logged work on BEAM-9678: Author: ASF GitHub Bot Created on: 16/Apr/20 05:35 Start Date: 16/Apr/20 05:35 Worklog Time Spent: 10m Work Description: damondouglas commented on issue #11340: [BEAM-9678] Create Go SDK introduction kata URL: https://github.com/apache/beam/pull/11340#issuecomment-614425499 @henryken I think everything is ok and if you could help uploading this to Stepik that would be really helpful. I'll email you about hopefully meeting with you to plan out the rest of the katas. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423245) Time Spent: 4.5h (was: 4h 20m) > Introduction Kata | Go SDK Code Katas > - > > Key: BEAM-9678 > URL: https://issues.apache.org/jira/browse/BEAM-9678 > Project: Beam > Issue Type: Sub-task > Components: katas, sdk-go >Reporter: Damon Douglas >Assignee: Damon Douglas >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > > An Introduction kata patterns after > [https://github.com/apache/beam/tree/master/learning/katas/java/Introduction] > where the take away is an individual's ability to start an Apache Beam > pipeline using the Golang SDK. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9678) Introduction Kata | Go SDK Code Katas
[ https://issues.apache.org/jira/browse/BEAM-9678?focusedWorklogId=423244=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423244 ] ASF GitHub Bot logged work on BEAM-9678: Author: ASF GitHub Bot Created on: 16/Apr/20 05:34 Start Date: 16/Apr/20 05:34 Worklog Time Spent: 10m Work Description: damondouglas commented on pull request #11340: [BEAM-9678] Create Go SDK introduction kata URL: https://github.com/apache/beam/pull/11340#discussion_r409292176 ## File path: learning/katas/go/Introduction/Hello Beam/Hello Beam Test/go.mod ## @@ -0,0 +1,25 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +//http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +module hello_beam_test + +go 1.13 + +require ( Review comment: I ran `go mod tidy` in both Hello Beam and Hello Beam Test as well as `go mod verify`. I also removed my entire contents of `$HOME/pkg` and `$HOME/src` and subsequently tested on both IntelliJ and GoLand. I was able to generate a course preview for both IntelliJ and GoLand and verify tasks. As I understand, the go.mod is auto-generated as we code in go. In IntelliJ/GoLand it does it for you automatically. In vim with coc, when we run `:GoImports` (I think) it also does this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423244) Time Spent: 4h 20m (was: 4h 10m) > Introduction Kata | Go SDK Code Katas > - > > Key: BEAM-9678 > URL: https://issues.apache.org/jira/browse/BEAM-9678 > Project: Beam > Issue Type: Sub-task > Components: katas, sdk-go >Reporter: Damon Douglas >Assignee: Damon Douglas >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > An Introduction kata patterns after > [https://github.com/apache/beam/tree/master/learning/katas/java/Introduction] > where the take away is an individual's ability to start an Apache Beam > pipeline using the Golang SDK. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9770) Add BigQuery DeadLetter pattern to Patterns Page
Reza ardeshir rokni created BEAM-9770: - Summary: Add BigQuery DeadLetter pattern to Patterns Page Key: BEAM-9770 URL: https://issues.apache.org/jira/browse/BEAM-9770 Project: Beam Issue Type: New Feature Components: website Reporter: Reza ardeshir rokni -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9678) Introduction Kata | Go SDK Code Katas
[ https://issues.apache.org/jira/browse/BEAM-9678?focusedWorklogId=423237=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423237 ] ASF GitHub Bot logged work on BEAM-9678: Author: ASF GitHub Bot Created on: 16/Apr/20 05:09 Start Date: 16/Apr/20 05:09 Worklog Time Spent: 10m Work Description: damondouglas commented on pull request #11340: [BEAM-9678] Create Go SDK introduction kata URL: https://github.com/apache/beam/pull/11340#discussion_r409285045 ## File path: learning/katas/go/Introduction/Hello Beam/Hello Beam/pkg/task/task.go ## @@ -0,0 +1,24 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +//http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package task + +import ( + "github.com/apache/beam/sdks/go/pkg/beam" +) + +func HelloBeam(s beam.Scope) beam.PCollection { Review comment: Thank you, Henry. I test on both GoLand and IntelliJ in case individuals are coming from either. Thank you for embracing my suggestion :). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423237) Time Spent: 4h 10m (was: 4h) > Introduction Kata | Go SDK Code Katas > - > > Key: BEAM-9678 > URL: https://issues.apache.org/jira/browse/BEAM-9678 > Project: Beam > Issue Type: Sub-task > Components: katas, sdk-go >Reporter: Damon Douglas >Assignee: Damon Douglas >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > An Introduction kata patterns after > [https://github.com/apache/beam/tree/master/learning/katas/java/Introduction] > where the take away is an individual's ability to start an Apache Beam > pipeline using the Golang SDK. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9418) Support ANY_VALUE aggregation functions
[ https://issues.apache.org/jira/browse/BEAM-9418?focusedWorklogId=423235=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423235 ] ASF GitHub Bot logged work on BEAM-9418: Author: ASF GitHub Bot Created on: 16/Apr/20 05:05 Start Date: 16/Apr/20 05:05 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #11333: [BEAM-9418] Support ANY_VALUE aggregation functions URL: https://github.com/apache/beam/pull/11333#issuecomment-614417914 All tests have failed due to programming style violation. Run `./gradlew spotlessApply` can fix that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423235) Time Spent: 1h 10m (was: 1h) > Support ANY_VALUE aggregation functions > --- > > Key: BEAM-9418 > URL: https://issues.apache.org/jira/browse/BEAM-9418 > Project: Beam > Issue Type: Task > Components: dsl-sql >Reporter: Rui Wang >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Support the following functionality in BeamSQL: > {code:java} > "select t.key, ANY_VALUE(t.column) from t group by t.key"; > {code} > Spec link: > https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#any_value -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9678) Introduction Kata | Go SDK Code Katas
[ https://issues.apache.org/jira/browse/BEAM-9678?focusedWorklogId=423231=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423231 ] ASF GitHub Bot logged work on BEAM-9678: Author: ASF GitHub Bot Created on: 16/Apr/20 04:53 Start Date: 16/Apr/20 04:53 Worklog Time Spent: 10m Work Description: henryken commented on issue #11340: [BEAM-9678] Create Go SDK introduction kata URL: https://github.com/apache/beam/pull/11340#issuecomment-614414805 If everything is okay, I can help to upload this course to Stepik. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423231) Time Spent: 4h (was: 3h 50m) > Introduction Kata | Go SDK Code Katas > - > > Key: BEAM-9678 > URL: https://issues.apache.org/jira/browse/BEAM-9678 > Project: Beam > Issue Type: Sub-task > Components: katas, sdk-go >Reporter: Damon Douglas >Assignee: Damon Douglas >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > An Introduction kata patterns after > [https://github.com/apache/beam/tree/master/learning/katas/java/Introduction] > where the take away is an individual's ability to start an Apache Beam > pipeline using the Golang SDK. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9678) Introduction Kata | Go SDK Code Katas
[ https://issues.apache.org/jira/browse/BEAM-9678?focusedWorklogId=423230=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423230 ] ASF GitHub Bot logged work on BEAM-9678: Author: ASF GitHub Bot Created on: 16/Apr/20 04:51 Start Date: 16/Apr/20 04:51 Worklog Time Spent: 10m Work Description: henryken commented on pull request #11340: [BEAM-9678] Create Go SDK introduction kata URL: https://github.com/apache/beam/pull/11340#discussion_r409279710 ## File path: learning/katas/go/Introduction/Hello Beam/Hello Beam Test/go.mod ## @@ -0,0 +1,25 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +//http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +module hello_beam_test + +go 1.13 + +require ( Review comment: My IDE suggests that there are some missing indirect modules, i.e. x/lint, x/tools, go/tools. Are those required to be committed? ```go require ( github.com/apache/beam v2.19.0+incompatible github.com/golang/protobuf v1.3.5 // indirect github.com/google/go-cmp v0.4.0 // indirect golang.org/x/lint v0.0.0-20190313153728-d0100b6bd8b3 // indirect golang.org/x/tools v0.0.0-20190524140312-2c0ae7006135 // indirect google.golang.org/grpc v1.28.1 // indirect honnef.co/go/tools v0.0.0-20190523083050-ea95bdfd59fc // indirect ) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423230) Time Spent: 3h 50m (was: 3h 40m) > Introduction Kata | Go SDK Code Katas > - > > Key: BEAM-9678 > URL: https://issues.apache.org/jira/browse/BEAM-9678 > Project: Beam > Issue Type: Sub-task > Components: katas, sdk-go >Reporter: Damon Douglas >Assignee: Damon Douglas >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > An Introduction kata patterns after > [https://github.com/apache/beam/tree/master/learning/katas/java/Introduction] > where the take away is an individual's ability to start an Apache Beam > pipeline using the Golang SDK. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9678) Introduction Kata | Go SDK Code Katas
[ https://issues.apache.org/jira/browse/BEAM-9678?focusedWorklogId=423229=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423229 ] ASF GitHub Bot logged work on BEAM-9678: Author: ASF GitHub Bot Created on: 16/Apr/20 04:48 Start Date: 16/Apr/20 04:48 Worklog Time Spent: 10m Work Description: henryken commented on pull request #11340: [BEAM-9678] Create Go SDK introduction kata URL: https://github.com/apache/beam/pull/11340#discussion_r409279364 ## File path: learning/katas/go/Introduction/Hello Beam/Hello Beam/pkg/task/task.go ## @@ -0,0 +1,24 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +//http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package task + +import ( + "github.com/apache/beam/sdks/go/pkg/beam" +) + +func HelloBeam(s beam.Scope) beam.PCollection { Review comment: Noted that Damon. I'm not that familiar with Go. Just did some reading and I guess this is good for now. Thanks for clarifying. Also FYI, you may want to consider using GoLand instead of IntelliJ for developing this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423229) Time Spent: 3h 40m (was: 3.5h) > Introduction Kata | Go SDK Code Katas > - > > Key: BEAM-9678 > URL: https://issues.apache.org/jira/browse/BEAM-9678 > Project: Beam > Issue Type: Sub-task > Components: katas, sdk-go >Reporter: Damon Douglas >Assignee: Damon Douglas >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > An Introduction kata patterns after > [https://github.com/apache/beam/tree/master/learning/katas/java/Introduction] > where the take away is an individual's ability to start an Apache Beam > pipeline using the Golang SDK. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9769) Ensure JSON imports are the default behavior for BigQuerySink and WriteToBigQuery in Python
[ https://issues.apache.org/jira/browse/BEAM-9769?focusedWorklogId=423204=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423204 ] ASF GitHub Bot logged work on BEAM-9769: Author: ASF GitHub Bot Created on: 16/Apr/20 02:30 Start Date: 16/Apr/20 02:30 Worklog Time Spent: 10m Work Description: chunyang commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink URL: https://github.com/apache/beam/pull/11433#issuecomment-614377074 Hmm interesting, I agree keeping JSON as default is probably the safer bet. We have seen a case internally where the data provided to WriteToBigQuery is a string-like date, e.g., `"2020-01-01"`. When writing with JSON intermediate format, the data shows up as a DATE column in BigQuery, but we can't get the same behavior with Avro format without doing one of: 1. Specifying schema for that column as DATE and modifying the incoming PCollection to use `datetime.date` or 2. Specifying schema for that column as STRING, in which case it no longer is a DATE column in BigQuery. The 2nd option is problematic when we're appending to an existing table, in which case we have to modify the pipeline to keep appending to it. fastavro 0.22.2 allows writing a string type to a column defined as date logical type (PRs fastavro/fastavro#338 and fastavro/fastavro#349), but seems like Beam pins the fastavro constraint to <0.22, so for now we can't take advantage of that. I believe your comments in CHANGES are accurate, there are some date-like and datetime-like strings that will behave differently in Avro vs JSON format. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423204) Time Spent: 3h (was: 2h 50m) > Ensure JSON imports are the default behavior for BigQuerySink and > WriteToBigQuery in Python > --- > > Key: BEAM-9769 > URL: https://issues.apache.org/jira/browse/BEAM-9769 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Fix For: 2.21.0 > > Time Spent: 3h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9769) Ensure JSON imports are the default behavior for BigQuerySink and WriteToBigQuery in Python
[ https://issues.apache.org/jira/browse/BEAM-9769?focusedWorklogId=423196=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423196 ] ASF GitHub Bot logged work on BEAM-9769: Author: ASF GitHub Bot Created on: 16/Apr/20 02:25 Start Date: 16/Apr/20 02:25 Worklog Time Spent: 10m Work Description: chunyang commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink URL: https://github.com/apache/beam/pull/11433#issuecomment-614377074 Hmm interesting, I agree keeping JSON as default is probably the safer bet. We have seen a case internally where the data provided to WriteToBigQuery is a string-like date, e.g., `"2020-01-01"`. When writing with JSON intermediate format, the data shows up as a DATE column in BigQuery, but we can't get the same behavior with Avro format without doing one of: 1. Specifying schema for that column as DATE and modifying the incoming PCollection to use `datetime.date` or 2. Specifying schema for that column as STRING, in which case it no longer is a DATE column in BigQuery. The 2nd option is problematic when we're appending to an existing table, in which case we have to modify the pipeline to keep appending to it. fastavro 0.22.2 allows writing a string type to a column defined as date logical type (PRs fastavro/fastavro#338 and fastavro/fastavro#349), but seems like Beam pins the fastavro constraint to <0.22, so for not we can't take advantage of that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423196) Time Spent: 2h 50m (was: 2h 40m) > Ensure JSON imports are the default behavior for BigQuerySink and > WriteToBigQuery in Python > --- > > Key: BEAM-9769 > URL: https://issues.apache.org/jira/browse/BEAM-9769 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Fix For: 2.21.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=423193=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423193 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 16/Apr/20 02:15 Start Date: 16/Apr/20 02:15 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423193) Time Spent: 35h 50m (was: 35h 40m) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 35h 50m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=423192=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423192 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 16/Apr/20 02:15 Start Date: 16/Apr/20 02:15 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#issuecomment-614374497 And liftoff This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423192) Time Spent: 35h 40m (was: 35.5h) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 35h 40m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9769) Ensure JSON imports are the default behavior for BigQuerySink and WriteToBigQuery in Python
[ https://issues.apache.org/jira/browse/BEAM-9769?focusedWorklogId=423191=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423191 ] ASF GitHub Bot logged work on BEAM-9769: Author: ASF GitHub Bot Created on: 16/Apr/20 02:15 Start Date: 16/Apr/20 02:15 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink URL: https://github.com/apache/beam/pull/11433#issuecomment-614374425 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423191) Time Spent: 2h 40m (was: 2.5h) > Ensure JSON imports are the default behavior for BigQuerySink and > WriteToBigQuery in Python > --- > > Key: BEAM-9769 > URL: https://issues.apache.org/jira/browse/BEAM-9769 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Fix For: 2.21.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9769) Ensure JSON imports are the default behavior for BigQuerySink and WriteToBigQuery in Python
[ https://issues.apache.org/jira/browse/BEAM-9769?focusedWorklogId=423190=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423190 ] ASF GitHub Bot logged work on BEAM-9769: Author: ASF GitHub Bot Created on: 16/Apr/20 02:13 Start Date: 16/Apr/20 02:13 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink URL: https://github.com/apache/beam/pull/11433#issuecomment-614373975 It may still be reasonable to merge this, so that we can be 100% sure that results will be consistent... but you're right, Chun, that the tests so far show equal behaviour between avro and json. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423190) Time Spent: 2.5h (was: 2h 20m) > Ensure JSON imports are the default behavior for BigQuerySink and > WriteToBigQuery in Python > --- > > Key: BEAM-9769 > URL: https://issues.apache.org/jira/browse/BEAM-9769 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Fix For: 2.21.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9769) Ensure JSON imports are the default behavior for BigQuerySink and WriteToBigQuery in Python
[ https://issues.apache.org/jira/browse/BEAM-9769?focusedWorklogId=423189=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423189 ] ASF GitHub Bot logged work on BEAM-9769: Author: ASF GitHub Bot Created on: 16/Apr/20 02:09 Start Date: 16/Apr/20 02:09 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink URL: https://github.com/apache/beam/pull/11433#issuecomment-614372823 hmm uf - this is truly embarrassing. I inadvertently ran my tests in a branch with changes to the BQ Source. This gave me weird results when running the BigQueryQueryToTable.test_big_query_new_types. I've jsut tested this with both JSON/AVRO alternatives, and the results are the same - as you had clearly verified Chun. It seems like this change is not necessary. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423189) Time Spent: 2h 20m (was: 2h 10m) > Ensure JSON imports are the default behavior for BigQuerySink and > WriteToBigQuery in Python > --- > > Key: BEAM-9769 > URL: https://issues.apache.org/jira/browse/BEAM-9769 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Fix For: 2.21.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=423188=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423188 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 16/Apr/20 02:01 Start Date: 16/Apr/20 02:01 Worklog Time Spent: 10m Work Description: jaketf commented on issue #11339: [BEAM-9468] Fhir io URL: https://github.com/apache/beam/pull/11339#issuecomment-614332334 @lastomato I added [GroupIntoBatches](https://beam.apache.org/releases/javadoc/2.19.0/org/apache/beam/sdk/transforms/GroupIntoBatches.html) in the FhirIO.Import path. The logic is: - buffer `HttpBody`'s to an iterable until we have 1000 of them (this threshold was chosen arbitrarily) - ImportFn updates the ndJson write channel with all 1000 resources - FinishBundle will flush the batch: write to file on GCS and trigger import job This is one way to mitigate the "import job per resource" concern but I'm open to other suggestions for achieving this. Though the language in the docs is: >Elements are buffered until there are batchSize elements buffered, at which point they are output to the output PCollection. Which sounds like if a batch never reaches batchSize it might not be output. GroupIntoBatches behaves as one would expect and if there are extra elements left over they are output as a smaller batch. Verified in unit test added in 8c4d636 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423188) Time Spent: 35.5h (was: 35h 20m) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 35.5h > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9769) Ensure JSON imports are the default behavior for BigQuerySink and WriteToBigQuery in Python
[ https://issues.apache.org/jira/browse/BEAM-9769?focusedWorklogId=423184=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423184 ] ASF GitHub Bot logged work on BEAM-9769: Author: ASF GitHub Bot Created on: 16/Apr/20 01:26 Start Date: 16/Apr/20 01:26 Worklog Time Spent: 10m Work Description: chunyang commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink URL: https://github.com/apache/beam/pull/11433#issuecomment-614360925 Were there some tests you run to notice the data type incompatibilities? Just curious how you spotted them. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423184) Time Spent: 2h 10m (was: 2h) > Ensure JSON imports are the default behavior for BigQuerySink and > WriteToBigQuery in Python > --- > > Key: BEAM-9769 > URL: https://issues.apache.org/jira/browse/BEAM-9769 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Fix For: 2.21.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=423183=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423183 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 16/Apr/20 01:17 Start Date: 16/Apr/20 01:17 Worklog Time Spent: 10m Work Description: jaketf commented on issue #11339: [BEAM-9468] Fhir io URL: https://github.com/apache/beam/pull/11339#issuecomment-614332334 @lastomato I added [GroupIntoBatches](https://beam.apache.org/releases/javadoc/2.19.0/org/apache/beam/sdk/transforms/GroupIntoBatches.html) in the FhirIO.Import path. The logic is: - buffer `HttpBody`'s to an iterable until we have 1000 of them (this threshold was chosen arbitrarily) - ImportFn updates the ndJson write channel with all 1000 resources - FinishBundle will flush the batch: write to file on GCS and trigger import job This is one way to mitigate the "import job per resource" concern but I'm open to other suggestions for achieving this. I need to verify if this will miss the last batch if it isn't full. The language in the docs is >Elements are buffered until there are batchSize elements buffered, at which point they are output to the output PCollection. Which sounds like if a batch never reaches batchSize it will not be output. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423183) Time Spent: 35h 20m (was: 35h 10m) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 35h 20m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9769) Ensure JSON imports are the default behavior for BigQuerySink and WriteToBigQuery in Python
[ https://issues.apache.org/jira/browse/BEAM-9769?focusedWorklogId=423181=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423181 ] ASF GitHub Bot logged work on BEAM-9769: Author: ASF GitHub Bot Created on: 16/Apr/20 01:16 Start Date: 16/Apr/20 01:16 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink URL: https://github.com/apache/beam/pull/11433#issuecomment-614358273 Run Python 3.6 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423181) Time Spent: 2h (was: 1h 50m) > Ensure JSON imports are the default behavior for BigQuerySink and > WriteToBigQuery in Python > --- > > Key: BEAM-9769 > URL: https://issues.apache.org/jira/browse/BEAM-9769 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Fix For: 2.21.0 > > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9769) Ensure JSON imports are the default behavior for BigQuerySink and WriteToBigQuery in Python
[ https://issues.apache.org/jira/browse/BEAM-9769?focusedWorklogId=423179=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423179 ] ASF GitHub Bot logged work on BEAM-9769: Author: ASF GitHub Bot Created on: 16/Apr/20 01:14 Start Date: 16/Apr/20 01:14 Worklog Time Spent: 10m Work Description: chunyang commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink URL: https://github.com/apache/beam/pull/11433#issuecomment-614357801 Gotcha, thanks for the heads up This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423179) Time Spent: 1h 50m (was: 1h 40m) > Ensure JSON imports are the default behavior for BigQuerySink and > WriteToBigQuery in Python > --- > > Key: BEAM-9769 > URL: https://issues.apache.org/jira/browse/BEAM-9769 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Fix For: 2.21.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9769) Ensure JSON imports are the default behavior for BigQuerySink and WriteToBigQuery in Python
[ https://issues.apache.org/jira/browse/BEAM-9769?focusedWorklogId=423178=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423178 ] ASF GitHub Bot logged work on BEAM-9769: Author: ASF GitHub Bot Created on: 16/Apr/20 01:04 Start Date: 16/Apr/20 01:04 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11435: [BEAM-9769] Ensuring JSON is the default export format for BQ sink URL: https://github.com/apache/beam/pull/11435 Cherry-pick of https://github.com/apache/beam/pull/11433 onto branch for 2.21.0 release Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-9769) Ensure JSON imports are the default behavior for BigQuerySink and WriteToBigQuery in Python
[ https://issues.apache.org/jira/browse/BEAM-9769?focusedWorklogId=423171=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423171 ] ASF GitHub Bot logged work on BEAM-9769: Author: ASF GitHub Bot Created on: 16/Apr/20 00:58 Start Date: 16/Apr/20 00:58 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink URL: https://github.com/apache/beam/pull/11433#issuecomment-614353272 LGTM. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423171) Time Spent: 1.5h (was: 1h 20m) > Ensure JSON imports are the default behavior for BigQuerySink and > WriteToBigQuery in Python > --- > > Key: BEAM-9769 > URL: https://issues.apache.org/jira/browse/BEAM-9769 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Fix For: 2.21.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9769) Ensure JSON imports are the default behavior for BigQuerySink and WriteToBigQuery in Python
[ https://issues.apache.org/jira/browse/BEAM-9769?focusedWorklogId=423162=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423162 ] ASF GitHub Bot logged work on BEAM-9769: Author: ASF GitHub Bot Created on: 16/Apr/20 00:42 Start Date: 16/Apr/20 00:42 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink URL: https://github.com/apache/beam/pull/11433#issuecomment-614349385 All tests have passed. I'm now adding change docs to CHANGES.md This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423162) Time Spent: 1h 20m (was: 1h 10m) > Ensure JSON imports are the default behavior for BigQuerySink and > WriteToBigQuery in Python > --- > > Key: BEAM-9769 > URL: https://issues.apache.org/jira/browse/BEAM-9769 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Fix For: 2.21.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9769) Ensure JSON imports are the default behavior for BigQuerySink and WriteToBigQuery in Python
[ https://issues.apache.org/jira/browse/BEAM-9769?focusedWorklogId=423156=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423156 ] ASF GitHub Bot logged work on BEAM-9769: Author: ASF GitHub Bot Created on: 16/Apr/20 00:34 Start Date: 16/Apr/20 00:34 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink URL: https://github.com/apache/beam/pull/11433#issuecomment-614343489 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423156) Time Spent: 1h (was: 50m) > Ensure JSON imports are the default behavior for BigQuerySink and > WriteToBigQuery in Python > --- > > Key: BEAM-9769 > URL: https://issues.apache.org/jira/browse/BEAM-9769 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Fix For: 2.21.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9769) Ensure JSON imports are the default behavior for BigQuerySink and WriteToBigQuery in Python
[ https://issues.apache.org/jira/browse/BEAM-9769?focusedWorklogId=423157=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423157 ] ASF GitHub Bot logged work on BEAM-9769: Author: ASF GitHub Bot Created on: 16/Apr/20 00:34 Start Date: 16/Apr/20 00:34 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink URL: https://github.com/apache/beam/pull/11433#issuecomment-614340906 Run Python 2.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423157) Time Spent: 1h 10m (was: 1h) > Ensure JSON imports are the default behavior for BigQuerySink and > WriteToBigQuery in Python > --- > > Key: BEAM-9769 > URL: https://issues.apache.org/jira/browse/BEAM-9769 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Fix For: 2.21.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9729) Cleanup bundle registration now that SDKs can pull.
[ https://issues.apache.org/jira/browse/BEAM-9729?focusedWorklogId=423154=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423154 ] ASF GitHub Bot logged work on BEAM-9729: Author: ASF GitHub Bot Created on: 16/Apr/20 00:26 Start Date: 16/Apr/20 00:26 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11358: [BEAM-9729, BEAM-8486] Runner-side bundle registration cleanup. URL: https://github.com/apache/beam/pull/11358#issuecomment-614345351 For my edification - what happened here? Are bundles registered with the process bundle request? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423154) Time Spent: 2h (was: 1h 50m) > Cleanup bundle registration now that SDKs can pull. > --- > > Key: BEAM-9729 > URL: https://issues.apache.org/jira/browse/BEAM-9729 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > Once all runners (in particular dataflow) support pull descriptors, we can > clean things up by removing the push registration code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9769) Ensure JSON imports are the default behavior for BigQuerySink and WriteToBigQuery in Python
[ https://issues.apache.org/jira/browse/BEAM-9769?focusedWorklogId=423153=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423153 ] ASF GitHub Bot logged work on BEAM-9769: Author: ASF GitHub Bot Created on: 16/Apr/20 00:19 Start Date: 16/Apr/20 00:19 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink URL: https://github.com/apache/beam/pull/11433#issuecomment-614343489 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423153) Time Spent: 50m (was: 40m) > Ensure JSON imports are the default behavior for BigQuerySink and > WriteToBigQuery in Python > --- > > Key: BEAM-9769 > URL: https://issues.apache.org/jira/browse/BEAM-9769 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Fix For: 2.21.0 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9577) Update artifact staging and retrieval protocols to be dependency aware.
[ https://issues.apache.org/jira/browse/BEAM-9577?focusedWorklogId=423152=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423152 ] ASF GitHub Bot logged work on BEAM-9577: Author: ASF GitHub Bot Created on: 16/Apr/20 00:16 Start Date: 16/Apr/20 00:16 Worklog Time Spent: 10m Work Description: robertwb commented on issue #11432: [BEAM-9577] Small fixes to portable runner staging. URL: https://github.com/apache/beam/pull/11432#issuecomment-614342813 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423152) Time Spent: 15h (was: 14h 50m) > Update artifact staging and retrieval protocols to be dependency aware. > --- > > Key: BEAM-9577 > URL: https://issues.apache.org/jira/browse/BEAM-9577 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 15h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9768) Add a gradle command for running the Python Unified Local Runner.
[ https://issues.apache.org/jira/browse/BEAM-9768?focusedWorklogId=423151=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423151 ] ASF GitHub Bot logged work on BEAM-9768: Author: ASF GitHub Bot Created on: 16/Apr/20 00:14 Start Date: 16/Apr/20 00:14 Worklog Time Spent: 10m Work Description: robertwb commented on issue #11430: [BEAM-9768] Gradle command for Python ULR. URL: https://github.com/apache/beam/pull/11430#issuecomment-614342080 Run PythonDocker PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423151) Time Spent: 0.5h (was: 20m) > Add a gradle command for running the Python Unified Local Runner. > - > > Key: BEAM-9768 > URL: https://issues.apache.org/jira/browse/BEAM-9768 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Robert Bradshaw >Priority: Major > Fix For: 2.22.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=423148=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423148 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 16/Apr/20 00:10 Start Date: 16/Apr/20 00:10 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#issuecomment-614340970 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423148) Time Spent: 35h 10m (was: 35h) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 35h 10m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9769) Ensure JSON imports are the default behavior for BigQuerySink and WriteToBigQuery in Python
[ https://issues.apache.org/jira/browse/BEAM-9769?focusedWorklogId=423146=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423146 ] ASF GitHub Bot logged work on BEAM-9769: Author: ASF GitHub Bot Created on: 16/Apr/20 00:09 Start Date: 16/Apr/20 00:09 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink URL: https://github.com/apache/beam/pull/11433#issuecomment-614340844 3.6 PC: https://builds.apache.org/job/beam_PostCommit_Python36_PR/60/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423146) Time Spent: 0.5h (was: 20m) > Ensure JSON imports are the default behavior for BigQuerySink and > WriteToBigQuery in Python > --- > > Key: BEAM-9769 > URL: https://issues.apache.org/jira/browse/BEAM-9769 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Fix For: 2.21.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9769) Ensure JSON imports are the default behavior for BigQuerySink and WriteToBigQuery in Python
[ https://issues.apache.org/jira/browse/BEAM-9769?focusedWorklogId=423147=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423147 ] ASF GitHub Bot logged work on BEAM-9769: Author: ASF GitHub Bot Created on: 16/Apr/20 00:09 Start Date: 16/Apr/20 00:09 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink URL: https://github.com/apache/beam/pull/11433#issuecomment-614340906 Run Python 2.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423147) Time Spent: 40m (was: 0.5h) > Ensure JSON imports are the default behavior for BigQuerySink and > WriteToBigQuery in Python > --- > > Key: BEAM-9769 > URL: https://issues.apache.org/jira/browse/BEAM-9769 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Fix For: 2.21.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5605) Support Portable SplittableDoFn for batch
[ https://issues.apache.org/jira/browse/BEAM-5605?focusedWorklogId=423145=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423145 ] ASF GitHub Bot logged work on BEAM-5605: Author: ASF GitHub Bot Created on: 16/Apr/20 00:07 Start Date: 16/Apr/20 00:07 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #11414: [BEAM-5605, BEAM-2939] Add support for FnApiDoFnRunner to handle split calls. URL: https://github.com/apache/beam/pull/11414#discussion_r409204369 ## File path: sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java ## @@ -761,72 +795,111 @@ public void processElementForElementAndRestriction( continue; } -// Make sure to get the output watermark before we split to ensure that the lower bound -// applies to both the primary and residual. -KV watermarkAndState = -currentWatermarkEstimator.getWatermarkAndState(); -SplitResult result = currentTracker.trySplit(0); +// Attempt to checkpoint the current restriction. +HandlesSplits.SplitResult splitResult = Review comment: Got it. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423145) Time Spent: 18.5h (was: 18h 20m) > Support Portable SplittableDoFn for batch > - > > Key: BEAM-5605 > URL: https://issues.apache.org/jira/browse/BEAM-5605 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Scott Wegner >Assignee: Luke Cwik >Priority: Major > Labels: portability > Time Spent: 18.5h > Remaining Estimate: 0h > > Roll-up item tracking work towards supporting portable SplittableDoFn for > batch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=423144=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423144 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 15/Apr/20 23:58 Start Date: 15/Apr/20 23:58 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#issuecomment-614337568 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423144) Time Spent: 35h (was: 34h 50m) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 35h > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=423143=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423143 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 15/Apr/20 23:57 Start Date: 15/Apr/20 23:57 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#issuecomment-614337532 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423143) Time Spent: 34h 50m (was: 34h 40m) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 34h 50m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=423138=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423138 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 15/Apr/20 23:38 Start Date: 15/Apr/20 23:38 Worklog Time Spent: 10m Work Description: jaketf commented on issue #11339: [BEAM-9468] Fhir io URL: https://github.com/apache/beam/pull/11339#issuecomment-614332334 @lastomato I added [GroupIntoBatches](https://beam.apache.org/releases/javadoc/2.19.0/org/apache/beam/sdk/transforms/GroupIntoBatches.html) in the FhirIO.Import path. The logic is: - buffer `HttpBody`'s to an iterable until we have 1000 of them (this threshold was chosen arbitrarily) - ImportFn updates the ndJson write channel with all 1000 resources - FinishBundle will flush the batch: write to file on GCS and trigger import job This is one way to mitigate the "import job per resource" concern but I'm open to other suggestions for achieving this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423138) Time Spent: 34h 40m (was: 34.5h) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 34h 40m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9418) Support ANY_VALUE aggregation functions
[ https://issues.apache.org/jira/browse/BEAM-9418?focusedWorklogId=423133=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423133 ] ASF GitHub Bot logged work on BEAM-9418: Author: ASF GitHub Bot Created on: 15/Apr/20 23:22 Start Date: 15/Apr/20 23:22 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #11333: [BEAM-9418] Support ANY_VALUE aggregation functions URL: https://github.com/apache/beam/pull/11333#discussion_r409190482 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamBuiltinAggregations.java ## @@ -33,12 +33,8 @@ import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.schemas.Schema.FieldType; -import org.apache.beam.sdk.transforms.Combine; +import org.apache.beam.sdk.transforms.*; Review comment: Same. If you are using inteliij, I think you will need to disable: https://www.jetbrains.com/help/idea/creating-and-optimizing-imports.html#import-packages-instead-of-single-classes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423133) Time Spent: 1h (was: 50m) > Support ANY_VALUE aggregation functions > --- > > Key: BEAM-9418 > URL: https://issues.apache.org/jira/browse/BEAM-9418 > Project: Beam > Issue Type: Task > Components: dsl-sql >Reporter: Rui Wang >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Support the following functionality in BeamSQL: > {code:java} > "select t.key, ANY_VALUE(t.column) from t group by t.key"; > {code} > Spec link: > https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#any_value -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9418) Support ANY_VALUE aggregation functions
[ https://issues.apache.org/jira/browse/BEAM-9418?focusedWorklogId=423129=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423129 ] ASF GitHub Bot logged work on BEAM-9418: Author: ASF GitHub Bot Created on: 15/Apr/20 23:16 Start Date: 15/Apr/20 23:16 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #11333: [BEAM-9418] Support ANY_VALUE aggregation functions URL: https://github.com/apache/beam/pull/11333#discussion_r409187920 ## File path: sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationTest.java ## @@ -24,11 +24,8 @@ import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertTrue; import static org.junit.internal.matchers.ThrowableMessageMatcher.hasMessage; - import java.math.BigDecimal; -import java.util.Arrays; -import java.util.Iterator; -import java.util.List; +import java.util.*; Review comment: Please import concrete Java imports than `.*` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423129) Time Spent: 40m (was: 0.5h) > Support ANY_VALUE aggregation functions > --- > > Key: BEAM-9418 > URL: https://issues.apache.org/jira/browse/BEAM-9418 > Project: Beam > Issue Type: Task > Components: dsl-sql >Reporter: Rui Wang >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Support the following functionality in BeamSQL: > {code:java} > "select t.key, ANY_VALUE(t.column) from t group by t.key"; > {code} > Spec link: > https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#any_value -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9418) Support ANY_VALUE aggregation functions
[ https://issues.apache.org/jira/browse/BEAM-9418?focusedWorklogId=423128=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423128 ] ASF GitHub Bot logged work on BEAM-9418: Author: ASF GitHub Bot Created on: 15/Apr/20 23:16 Start Date: 15/Apr/20 23:16 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #11333: [BEAM-9418] Support ANY_VALUE aggregation functions URL: https://github.com/apache/beam/pull/11333#discussion_r409188033 ## File path: sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationTest.java ## @@ -241,6 +238,53 @@ private void runAggregationFunctions(PCollection input) throws Exception { pipeline.run().waitUntilFinish(); } + /** GROUP-BY with the any_value aggregation function. */ + @Test + public void testAnyValueFunction() throws Exception { +pipeline.enableAbandonedNodeEnforcement(false); + +Schema schema = +Schema.builder().addInt32Field("key").addInt32Field("col").build(); + +PCollection inputRows = Review comment: run `./gradlew spotlessApply` to fix style issues. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423128) Time Spent: 0.5h (was: 20m) > Support ANY_VALUE aggregation functions > --- > > Key: BEAM-9418 > URL: https://issues.apache.org/jira/browse/BEAM-9418 > Project: Beam > Issue Type: Task > Components: dsl-sql >Reporter: Rui Wang >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Support the following functionality in BeamSQL: > {code:java} > "select t.key, ANY_VALUE(t.column) from t group by t.key"; > {code} > Spec link: > https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#any_value -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9418) Support ANY_VALUE aggregation functions
[ https://issues.apache.org/jira/browse/BEAM-9418?focusedWorklogId=423130=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423130 ] ASF GitHub Bot logged work on BEAM-9418: Author: ASF GitHub Bot Created on: 15/Apr/20 23:16 Start Date: 15/Apr/20 23:16 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #11333: [BEAM-9418] Support ANY_VALUE aggregation functions URL: https://github.com/apache/beam/pull/11333#issuecomment-614326216 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423130) Time Spent: 50m (was: 40m) > Support ANY_VALUE aggregation functions > --- > > Key: BEAM-9418 > URL: https://issues.apache.org/jira/browse/BEAM-9418 > Project: Beam > Issue Type: Task > Components: dsl-sql >Reporter: Rui Wang >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Support the following functionality in BeamSQL: > {code:java} > "select t.key, ANY_VALUE(t.column) from t group by t.key"; > {code} > Spec link: > https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#any_value -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9765) :vendor:calcite-1_20_0:validateVendoring fails
[ https://issues.apache.org/jira/browse/BEAM-9765?focusedWorklogId=423127=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423127 ] ASF GitHub Bot logged work on BEAM-9765: Author: ASF GitHub Bot Created on: 15/Apr/20 23:13 Start Date: 15/Apr/20 23:13 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #11429: [BEAM-9765] Exclude module-info.class from vendored Calcite. URL: https://github.com/apache/beam/pull/11429#issuecomment-614325133 LGTM Also cc @lukecwik in case there is other opinion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423127) Time Spent: 20m (was: 10m) > :vendor:calcite-1_20_0:validateVendoring fails > -- > > Key: BEAM-9765 > URL: https://issues.apache.org/jira/browse/BEAM-9765 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: 2.21.0 > > Time Spent: 20m > Remaining Estimate: 0h > > This error was reported on Slack: > https://the-asf.slack.com/archives/C9H0YNP3P/p1586911958184200 > "I encountered this error when I built Beam from master branch. Is it a known > issue? It happens in beam 2.21 and master branch, but the build works fine in > 2.20." > --- > * What went wrong: > Execution failed for task ':vendor:calcite-1_20_0:validateVendoring'. > > /home/yangzhan/oss/beam/vendor/calcite-1_20_0/build/libs/beam-vendor-calcite-1_20_0-0.2.jar > > exposed classes outside of org.apache.beam namespace: > > [/home/yangzhan/oss/beam/vendor/calcite-1_20_0/build/tmp/expandedArchives/beam-vendor-calcite-1_20_0-0.2.jar_ee40b0aab4e7709d8d80d205ee8852ba/module-info.class] > * Try: > Run with --stacktrace option to get the stack trace. Run with --info or > --debug option to get more log output. Run with --scan to get full insights. > == -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=423126=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423126 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 15/Apr/20 23:12 Start Date: 15/Apr/20 23:12 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#issuecomment-614325022 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423126) Time Spent: 34.5h (was: 34h 20m) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 34.5h > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8646) PR #9814 appears to cause failures in fnapi_runner tests on Windows
[ https://issues.apache.org/jira/browse/BEAM-8646?focusedWorklogId=423125=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423125 ] ASF GitHub Bot logged work on BEAM-8646: Author: ASF GitHub Bot Created on: 15/Apr/20 23:09 Start Date: 15/Apr/20 23:09 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #11431: [BEAM-8646] Fix external environment on OS X as well. URL: https://github.com/apache/beam/pull/11431#discussion_r409186220 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/worker_handlers.py ## @@ -614,8 +615,8 @@ def stop_worker(self): pass def host_from_worker(self): -# TODO(BEAM-8646): Reconcile the behavior on Windows platform. -if sys.platform == 'win32': +# TODO(BEAM-8646): Reconcile the across platforms. Review comment: nit: extra 'the'. Also, can you please comment on the bug which issues it causes on OS X? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423125) Time Spent: 1h 10m (was: 1h) > PR #9814 appears to cause failures in fnapi_runner tests on Windows > --- > > Key: BEAM-8646 > URL: https://issues.apache.org/jira/browse/BEAM-8646 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Wanqi Lyu >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > It appears that changes in > > https://github.com/apache/beam/commit/d6bcb03f586b5430c30f6ca4a1af9e42711e529c > cause test failures in Beam test suite on Windows, for example: > python setup.py nosetests --tests > apache_beam/runners/portability/portable_runner_test.py:PortableRunnerTestWithExternalEnv.test_callbacks_with_exception > > does not finish on a Windows VM machine within at least 60 seconds but passes > within a second if we change host_from_worker to return 'localhost' in [1]. > [~violalyu] , do you think you could take a look? Thanks! > cc: [~chadrik] [~thw] > [1] > https://github.com/apache/beam/blob/808cb35018cd228a59b152234b655948da2455fa/sdks/python/apache_beam/runners/portability/fn_api_runner.py#L1377. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9769) Ensure JSON imports are the default behavior for BigQuerySink and WriteToBigQuery in Python
[ https://issues.apache.org/jira/browse/BEAM-9769?focusedWorklogId=423122=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423122 ] ASF GitHub Bot logged work on BEAM-9769: Author: ASF GitHub Bot Created on: 15/Apr/20 22:56 Start Date: 15/Apr/20 22:56 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink URL: https://github.com/apache/beam/pull/11433#issuecomment-614320399 Run Python 3.6 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423122) Time Spent: 20m (was: 10m) > Ensure JSON imports are the default behavior for BigQuerySink and > WriteToBigQuery in Python > --- > > Key: BEAM-9769 > URL: https://issues.apache.org/jira/browse/BEAM-9769 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Fix For: 2.21.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9769) Ensure JSON imports are the default behavior for BigQuerySink and WriteToBigQuery in Python
[ https://issues.apache.org/jira/browse/BEAM-9769?focusedWorklogId=423121=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423121 ] ASF GitHub Bot logged work on BEAM-9769: Author: ASF GitHub Bot Created on: 15/Apr/20 22:55 Start Date: 15/Apr/20 22:55 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink URL: https://github.com/apache/beam/pull/11433 TODOs: - [ ] Document the changes in CHANGES.md - [ ] Ensure Pydoc makes sense - [ ] Run PostCommits Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[jira] [Updated] (BEAM-9769) Ensure JSON imports are the default behavior for BigQuerySink and WriteToBigQuery in Python
[ https://issues.apache.org/jira/browse/BEAM-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Estrada updated BEAM-9769: Fix Version/s: 2.21.0 > Ensure JSON imports are the default behavior for BigQuerySink and > WriteToBigQuery in Python > --- > > Key: BEAM-9769 > URL: https://issues.apache.org/jira/browse/BEAM-9769 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Fix For: 2.21.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9769) Ensure JSON imports are the default behavior for BigQuerySink and WriteToBigQuery in Python
[ https://issues.apache.org/jira/browse/BEAM-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Estrada updated BEAM-9769: Status: Open (was: Triage Needed) > Ensure JSON imports are the default behavior for BigQuerySink and > WriteToBigQuery in Python > --- > > Key: BEAM-9769 > URL: https://issues.apache.org/jira/browse/BEAM-9769 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9769) Ensure JSON imports are the default behavior for BigQuerySink and WriteToBigQuery in Python
Pablo Estrada created BEAM-9769: --- Summary: Ensure JSON imports are the default behavior for BigQuerySink and WriteToBigQuery in Python Key: BEAM-9769 URL: https://issues.apache.org/jira/browse/BEAM-9769 Project: Beam Issue Type: Bug Components: io-py-gcp Reporter: Pablo Estrada Assignee: Pablo Estrada -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=423118=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423118 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 15/Apr/20 22:35 Start Date: 15/Apr/20 22:35 Worklog Time Spent: 10m Work Description: jaketf commented on issue #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#issuecomment-614313665 @pabloem reshuffle added and ITs passing locally as of c50df5f This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423118) Time Spent: 34h 20m (was: 34h 10m) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 34h 20m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=423116=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423116 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 15/Apr/20 22:30 Start Date: 15/Apr/20 22:30 Worklog Time Spent: 10m Work Description: jaketf commented on issue #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#issuecomment-614311975 @pabloem please do not retest this until I say so. reshuffle is messing up something to do w/ coders in my ITs. will investigate and let you know when it's safe to re run tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423116) Time Spent: 34h 10m (was: 34h) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 34h 10m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=423114=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423114 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 15/Apr/20 22:24 Start Date: 15/Apr/20 22:24 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#issuecomment-614310149 You can also consider adding an option to not add the Reshuffle to avoid adding any additional shuffle for anyone who already will have a GBK downstream. See here: https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L823 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423114) Time Spent: 34h (was: 33h 50m) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 34h > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=423112=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423112 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 15/Apr/20 22:21 Start Date: 15/Apr/20 22:21 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#issuecomment-614309223 You can ignore the deprecation warning for Reshuffle.viaRandomKey(). The deprecation was just because the behavior across runners for Reshuffle is not well defined. Transform is not going away. Many runners add a fusion break after GBK. So this will allow subsequent steps to parallelize better. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423112) Time Spent: 33h 50m (was: 33h 40m) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 33h 50m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=423110=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423110 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 15/Apr/20 22:13 Start Date: 15/Apr/20 22:13 Worklog Time Spent: 10m Work Description: jaketf commented on issue #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#issuecomment-614306261 >Yeah, this can be a performance bottleneck and this whole operation will be limited to a single machine. Usually sources need an additional level of parallelism due to being high fanout. BTW it might sense to add a Reshuffle at the end here just to allow any subsequent steps to parallelize. @chamikaramj that sounds like a good idea. However, I get deprecation working on `org.apache.beam.sdk.transforms.Reshuffle` is there a new blessed way of doing this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423110) Time Spent: 33h 40m (was: 33.5h) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 33h 40m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9577) Update artifact staging and retrieval protocols to be dependency aware.
[ https://issues.apache.org/jira/browse/BEAM-9577?focusedWorklogId=423109=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423109 ] ASF GitHub Bot logged work on BEAM-9577: Author: ASF GitHub Bot Created on: 15/Apr/20 22:08 Start Date: 15/Apr/20 22:08 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11432: [BEAM-9577] Small fixes to portable runner staging. URL: https://github.com/apache/beam/pull/11432 * Create parent directories if not present in local filesystem. * Deep protobuf copy. R: @ihji Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-8646) PR #9814 appears to cause failures in fnapi_runner tests on Windows
[ https://issues.apache.org/jira/browse/BEAM-8646?focusedWorklogId=423108=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423108 ] ASF GitHub Bot logged work on BEAM-8646: Author: ASF GitHub Bot Created on: 15/Apr/20 22:07 Start Date: 15/Apr/20 22:07 Worklog Time Spent: 10m Work Description: robertwb commented on issue #11431: [BEAM-8646] Fix external environment on OS X as well. URL: https://github.com/apache/beam/pull/11431#issuecomment-614304091 R: @tvalentyn This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423108) Time Spent: 1h (was: 50m) > PR #9814 appears to cause failures in fnapi_runner tests on Windows > --- > > Key: BEAM-8646 > URL: https://issues.apache.org/jira/browse/BEAM-8646 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Wanqi Lyu >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > It appears that changes in > > https://github.com/apache/beam/commit/d6bcb03f586b5430c30f6ca4a1af9e42711e529c > cause test failures in Beam test suite on Windows, for example: > python setup.py nosetests --tests > apache_beam/runners/portability/portable_runner_test.py:PortableRunnerTestWithExternalEnv.test_callbacks_with_exception > > does not finish on a Windows VM machine within at least 60 seconds but passes > within a second if we change host_from_worker to return 'localhost' in [1]. > [~violalyu] , do you think you could take a look? Thanks! > cc: [~chadrik] [~thw] > [1] > https://github.com/apache/beam/blob/808cb35018cd228a59b152234b655948da2455fa/sdks/python/apache_beam/runners/portability/fn_api_runner.py#L1377. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8646) PR #9814 appears to cause failures in fnapi_runner tests on Windows
[ https://issues.apache.org/jira/browse/BEAM-8646?focusedWorklogId=423107=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423107 ] ASF GitHub Bot logged work on BEAM-8646: Author: ASF GitHub Bot Created on: 15/Apr/20 22:06 Start Date: 15/Apr/20 22:06 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11431: [BEAM-8646] Fix external environment on OS X as well. URL: https://github.com/apache/beam/pull/11431 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python |
[jira] [Work logged] (BEAM-9768) Add a gradle command for running the Python Unified Local Runner.
[ https://issues.apache.org/jira/browse/BEAM-9768?focusedWorklogId=423106=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423106 ] ASF GitHub Bot logged work on BEAM-9768: Author: ASF GitHub Bot Created on: 15/Apr/20 22:05 Start Date: 15/Apr/20 22:05 Worklog Time Spent: 10m Work Description: robertwb commented on issue #11430: [BEAM-9768] Gradle command for Python ULR. URL: https://github.com/apache/beam/pull/11430#issuecomment-614303398 R: @youngoli This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423106) Time Spent: 20m (was: 10m) > Add a gradle command for running the Python Unified Local Runner. > - > > Key: BEAM-9768 > URL: https://issues.apache.org/jira/browse/BEAM-9768 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Robert Bradshaw >Priority: Major > Fix For: 2.22.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9768) Add a gradle command for running the Python Unified Local Runner.
[ https://issues.apache.org/jira/browse/BEAM-9768?focusedWorklogId=423105=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423105 ] ASF GitHub Bot logged work on BEAM-9768: Author: ASF GitHub Bot Created on: 15/Apr/20 22:04 Start Date: 15/Apr/20 22:04 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11430: [BEAM-9768] Gradle command for Python ULR. URL: https://github.com/apache/beam/pull/11430 One can now run ./gradlew startPortableRunner to build and start the Python ULR. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-9765) :vendor:calcite-1_20_0:validateVendoring fails
[ https://issues.apache.org/jira/browse/BEAM-9765?focusedWorklogId=423104=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423104 ] ASF GitHub Bot logged work on BEAM-9765: Author: ASF GitHub Bot Created on: 15/Apr/20 22:04 Start Date: 15/Apr/20 22:04 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11429: [BEAM-9765] Exclude module-info.class from vendored Calcite. URL: https://github.com/apache/beam/pull/11429 cc @suztomo Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[jira] [Created] (BEAM-9768) Add a gradle command for running the Python Unified Local Runner.
Robert Bradshaw created BEAM-9768: - Summary: Add a gradle command for running the Python Unified Local Runner. Key: BEAM-9768 URL: https://issues.apache.org/jira/browse/BEAM-9768 Project: Beam Issue Type: Improvement Components: build-system Reporter: Robert Bradshaw Fix For: 2.22.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9764) :sdks:java:container:generateThirdPartyLicenses failing
[ https://issues.apache.org/jira/browse/BEAM-9764?focusedWorklogId=423093=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423093 ] ASF GitHub Bot logged work on BEAM-9764: Author: ASF GitHub Bot Created on: 15/Apr/20 22:00 Start Date: 15/Apr/20 22:00 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on issue #11428: [BEAM-9764] fix Java license failures with Python2_PVR_Flink PreCommit URL: https://github.com/apache/beam/pull/11428#issuecomment-614301680 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423093) Remaining Estimate: 0h Time Spent: 10m > :sdks:java:container:generateThirdPartyLicenses failing > --- > > Key: BEAM-9764 > URL: https://issues.apache.org/jira/browse/BEAM-9764 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, test-failures >Reporter: Udi Meiri >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 10m > Remaining Estimate: 0h > > https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/774/console > The traceback is interspersed with other logs: > {code} > Traceback (most recent call last): > Successfully pulled > java_third_party_licenses/protobuf-java-util-3.11.1.jar/LICENSE from > https://opensource.org/licenses/BSD-3-Clause > Successfully pulled java_third_party_licenses/protoc-3.11.0.jar/LICENSE from > http://www.apache.org/licenses/LICENSE-2.0.txt > File "sdks/java/container/license_scripts/pull_licenses_java.py", line 138, > in > Successfully pulled java_third_party_licenses/protoc-3.11.1.jar/LICENSE from > http://www.apache.org/licenses/LICENSE-2.0.txt > license_url = dep['moduleLicenseUrl'] > Successfully pulled java_third_party_licenses/zetasketch-0.1.0.jar/LICENSE > from http://www.apache.org/licenses/LICENSE-2.0.txt > KeyError: 'moduleLicenseUrl' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9765) :vendor:calcite-1_20_0:validateVendoring fails
[ https://issues.apache.org/jira/browse/BEAM-9765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver reassigned BEAM-9765: - Assignee: Kyle Weaver > :vendor:calcite-1_20_0:validateVendoring fails > -- > > Key: BEAM-9765 > URL: https://issues.apache.org/jira/browse/BEAM-9765 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: 2.21.0 > > > This error was reported on Slack: > https://the-asf.slack.com/archives/C9H0YNP3P/p1586911958184200 > "I encountered this error when I built Beam from master branch. Is it a known > issue? It happens in beam 2.21 and master branch, but the build works fine in > 2.20." > --- > * What went wrong: > Execution failed for task ':vendor:calcite-1_20_0:validateVendoring'. > > /home/yangzhan/oss/beam/vendor/calcite-1_20_0/build/libs/beam-vendor-calcite-1_20_0-0.2.jar > > exposed classes outside of org.apache.beam namespace: > > [/home/yangzhan/oss/beam/vendor/calcite-1_20_0/build/tmp/expandedArchives/beam-vendor-calcite-1_20_0-0.2.jar_ee40b0aab4e7709d8d80d205ee8852ba/module-info.class] > * Try: > Run with --stacktrace option to get the stack trace. Run with --info or > --debug option to get more log output. Run with --scan to get full insights. > == -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9765) :vendor:calcite-1_20_0:validateVendoring fails
[ https://issues.apache.org/jira/browse/BEAM-9765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-9765: -- Status: Open (was: Triage Needed) > :vendor:calcite-1_20_0:validateVendoring fails > -- > > Key: BEAM-9765 > URL: https://issues.apache.org/jira/browse/BEAM-9765 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: 2.21.0 > > > This error was reported on Slack: > https://the-asf.slack.com/archives/C9H0YNP3P/p1586911958184200 > "I encountered this error when I built Beam from master branch. Is it a known > issue? It happens in beam 2.21 and master branch, but the build works fine in > 2.20." > --- > * What went wrong: > Execution failed for task ':vendor:calcite-1_20_0:validateVendoring'. > > /home/yangzhan/oss/beam/vendor/calcite-1_20_0/build/libs/beam-vendor-calcite-1_20_0-0.2.jar > > exposed classes outside of org.apache.beam namespace: > > [/home/yangzhan/oss/beam/vendor/calcite-1_20_0/build/tmp/expandedArchives/beam-vendor-calcite-1_20_0-0.2.jar_ee40b0aab4e7709d8d80d205ee8852ba/module-info.class] > * Try: > Run with --stacktrace option to get the stack trace. Run with --info or > --debug option to get more log output. Run with --scan to get full insights. > == -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=423083=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423083 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 15/Apr/20 21:43 Start Date: 15/Apr/20 21:43 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#discussion_r409150054 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java ## @@ -0,0 +1,597 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.healthcare; + +import com.google.api.services.healthcare.v1beta1.model.Message; +import com.google.auto.value.AutoValue; +import java.io.IOException; +import java.text.ParseException; +import java.util.Collection; +import java.util.List; +import java.util.Map; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.gcp.pubsub.PubsubIO; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Metrics; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.util.Sleeper; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.PInput; +import org.apache.beam.sdk.values.POutput; +import org.apache.beam.sdk.values.PValue; +import org.apache.beam.sdk.values.TupleTag; +import org.apache.beam.sdk.values.TupleTagList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Throwables; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link HL7v2IO} provides an API for reading from and writing to https://cloud.google.com/healthcare/docs/concepts/hl7v2;>Google Cloud Healthcare HL7v2 API. + * + * + * Read + * + * HL7v2 Messages can be fetched from the HL7v2 store in two ways Message Fetching and Message + * Listing. + * + * Message Fetching + * + * Message Fetching with {@link HL7v2IO.Read} supports use cases where you have a ${@link + * PCollection} of message IDS. This is appropriate for reading the HL7v2 notifications from a + * Pub/Sub subscription with {@link PubsubIO#readStrings()} or in cases where you have a manually + * prepared list of messages that you need to process (e.g. in a text file read with {@link + * org.apache.beam.sdk.io.TextIO}) . + * + * Fetch Message contents from HL7v2 Store based on the {@link PCollection} of message ID strings + * {@link HL7v2IO.Read.Result} where one can call {@link Read.Result#getMessages()} to retrived a + * {@link PCollection} containing the successfully fetched {@link HL7v2Message}s and/or {@link + * Read.Result#getFailedReads()} to retrieve a {@link PCollection} of {@link HealthcareIOError} + * containing the msgID that could not be fetched and the exception as a {@link HealthcareIOError}, + * this can be used to write to the dead letter storage system of your choosing. This error handling + * is mainly to catch scenarios where the upstream {@link PCollection} contains IDs that are not + * valid or are not reachable due to permissions issues. + * + * Message Listing Message Listing with {@link HL7v2IO.ListHL7v2Messages} supports batch use + * cases where you want to process all the messages in an HL7v2 store or those matching a + * filter @see https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.hl7V2Stores.messages/list#query-parameters + * This paginates through results of a Messages.List call @see https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.hl7V2Stores.messages/list + * and outputs directly to a {@link PCollection} of {@link
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=423084=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423084 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 15/Apr/20 21:43 Start Date: 15/Apr/20 21:43 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#discussion_r409150823 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java ## @@ -0,0 +1,597 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.healthcare; + +import com.google.api.services.healthcare.v1beta1.model.Message; +import com.google.auto.value.AutoValue; +import java.io.IOException; +import java.text.ParseException; +import java.util.Collection; +import java.util.List; +import java.util.Map; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.gcp.pubsub.PubsubIO; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Metrics; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.util.Sleeper; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.PInput; +import org.apache.beam.sdk.values.POutput; +import org.apache.beam.sdk.values.PValue; +import org.apache.beam.sdk.values.TupleTag; +import org.apache.beam.sdk.values.TupleTagList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Throwables; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link HL7v2IO} provides an API for reading from and writing to https://cloud.google.com/healthcare/docs/concepts/hl7v2;>Google Cloud Healthcare HL7v2 API. + * + * + * Read + * + * HL7v2 Messages can be fetched from the HL7v2 store in two ways Message Fetching and Message + * Listing. + * + * Message Fetching + * + * Message Fetching with {@link HL7v2IO.Read} supports use cases where you have a ${@link + * PCollection} of message IDS. This is appropriate for reading the HL7v2 notifications from a + * Pub/Sub subscription with {@link PubsubIO#readStrings()} or in cases where you have a manually + * prepared list of messages that you need to process (e.g. in a text file read with {@link + * org.apache.beam.sdk.io.TextIO}) . + * + * Fetch Message contents from HL7v2 Store based on the {@link PCollection} of message ID strings + * {@link HL7v2IO.Read.Result} where one can call {@link Read.Result#getMessages()} to retrived a + * {@link PCollection} containing the successfully fetched {@link HL7v2Message}s and/or {@link + * Read.Result#getFailedReads()} to retrieve a {@link PCollection} of {@link HealthcareIOError} + * containing the msgID that could not be fetched and the exception as a {@link HealthcareIOError}, + * this can be used to write to the dead letter storage system of your choosing. This error handling + * is mainly to catch scenarios where the upstream {@link PCollection} contains IDs that are not + * valid or are not reachable due to permissions issues. + * + * Message Listing Message Listing with {@link HL7v2IO.ListHL7v2Messages} supports batch use + * cases where you want to process all the messages in an HL7v2 store or those matching a + * filter @see https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.hl7V2Stores.messages/list#query-parameters + * This paginates through results of a Messages.List call @see https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.hl7V2Stores.messages/list + * and outputs directly to a {@link PCollection} of {@link
[jira] [Commented] (BEAM-9764) :sdks:java:container:generateThirdPartyLicenses failing
[ https://issues.apache.org/jira/browse/BEAM-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084398#comment-17084398 ] Udi Meiri commented on BEAM-9764: - Or was that intentional? > :sdks:java:container:generateThirdPartyLicenses failing > --- > > Key: BEAM-9764 > URL: https://issues.apache.org/jira/browse/BEAM-9764 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, test-failures >Reporter: Udi Meiri >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > > https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/774/console > The traceback is interspersed with other logs: > {code} > Traceback (most recent call last): > Successfully pulled > java_third_party_licenses/protobuf-java-util-3.11.1.jar/LICENSE from > https://opensource.org/licenses/BSD-3-Clause > Successfully pulled java_third_party_licenses/protoc-3.11.0.jar/LICENSE from > http://www.apache.org/licenses/LICENSE-2.0.txt > File "sdks/java/container/license_scripts/pull_licenses_java.py", line 138, > in > Successfully pulled java_third_party_licenses/protoc-3.11.1.jar/LICENSE from > http://www.apache.org/licenses/LICENSE-2.0.txt > license_url = dep['moduleLicenseUrl'] > Successfully pulled java_third_party_licenses/zetasketch-0.1.0.jar/LICENSE > from http://www.apache.org/licenses/LICENSE-2.0.txt > KeyError: 'moduleLicenseUrl' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9729) Cleanup bundle registration now that SDKs can pull.
[ https://issues.apache.org/jira/browse/BEAM-9729?focusedWorklogId=423076=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423076 ] ASF GitHub Bot logged work on BEAM-9729: Author: ASF GitHub Bot Created on: 15/Apr/20 21:26 Start Date: 15/Apr/20 21:26 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11358: [BEAM-9729, BEAM-8486] Runner-side bundle registration cleanup. URL: https://github.com/apache/beam/pull/11358 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423076) Time Spent: 1h 50m (was: 1h 40m) > Cleanup bundle registration now that SDKs can pull. > --- > > Key: BEAM-9729 > URL: https://issues.apache.org/jira/browse/BEAM-9729 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > Once all runners (in particular dataflow) support pull descriptors, we can > clean things up by removing the push registration code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9764) :sdks:java:container:generateThirdPartyLicenses failing
[ https://issues.apache.org/jira/browse/BEAM-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084392#comment-17084392 ] Udi Meiri commented on BEAM-9764: - I think one issue is that pull_from_url doesn't retry on "Invalid url:" errors (missing raise at the end of the branch). > :sdks:java:container:generateThirdPartyLicenses failing > --- > > Key: BEAM-9764 > URL: https://issues.apache.org/jira/browse/BEAM-9764 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, test-failures >Reporter: Udi Meiri >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > > https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/774/console > The traceback is interspersed with other logs: > {code} > Traceback (most recent call last): > Successfully pulled > java_third_party_licenses/protobuf-java-util-3.11.1.jar/LICENSE from > https://opensource.org/licenses/BSD-3-Clause > Successfully pulled java_third_party_licenses/protoc-3.11.0.jar/LICENSE from > http://www.apache.org/licenses/LICENSE-2.0.txt > File "sdks/java/container/license_scripts/pull_licenses_java.py", line 138, > in > Successfully pulled java_third_party_licenses/protoc-3.11.1.jar/LICENSE from > http://www.apache.org/licenses/LICENSE-2.0.txt > license_url = dep['moduleLicenseUrl'] > Successfully pulled java_third_party_licenses/zetasketch-0.1.0.jar/LICENSE > from http://www.apache.org/licenses/LICENSE-2.0.txt > KeyError: 'moduleLicenseUrl' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9085) Performance regression in np.random.RandomState() skews performance test results across Python 2/3 on Dataflow
[ https://issues.apache.org/jira/browse/BEAM-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084385#comment-17084385 ] Valentyn Tymofieiev commented on BEAM-9085: --- No comments - the graphs in first message show an improvement in metrics, and Py3 Dataflow benchmarks are on par or better than Py2. > Performance regression in np.random.RandomState() skews performance test > results across Python 2/3 on Dataflow > -- > > Key: BEAM-9085 > URL: https://issues.apache.org/jira/browse/BEAM-9085 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 9h 20m > Remaining Estimate: 0h > > Tests show that the performance of core Beam operations in Python 3.x on > Dataflow can be a few time slower than in Python 2.7. We should investigate > what's the cause of the problem. > Currently, we have one ParDo test that is run both in Py3 and Py2 [1]. A > dashboard with runtime results can be found here [2]. > [1] sdks/python/apache_beam/testing/load_tests/pardo_test.py > [2] https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-9085) Performance regression in np.random.RandomState() skews performance test results across Python 2/3 on Dataflow
[ https://issues.apache.org/jira/browse/BEAM-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev closed BEAM-9085. - Fix Version/s: Not applicable Resolution: Fixed > Performance regression in np.random.RandomState() skews performance test > results across Python 2/3 on Dataflow > -- > > Key: BEAM-9085 > URL: https://issues.apache.org/jira/browse/BEAM-9085 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Fix For: Not applicable > > Time Spent: 9h 20m > Remaining Estimate: 0h > > Tests show that the performance of core Beam operations in Python 3.x on > Dataflow can be a few time slower than in Python 2.7. We should investigate > what's the cause of the problem. > Currently, we have one ParDo test that is run both in Py3 and Py2 [1]. A > dashboard with runtime results can be found here [2]. > [1] sdks/python/apache_beam/testing/load_tests/pardo_test.py > [2] https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9639) Abstract bundle execution logic from stage execution logic
[ https://issues.apache.org/jira/browse/BEAM-9639?focusedWorklogId=423061=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423061 ] ASF GitHub Bot logged work on BEAM-9639: Author: ASF GitHub Bot Created on: 15/Apr/20 20:56 Start Date: 15/Apr/20 20:56 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11270: [BEAM-9639][BEAM-9608] Improvements for FnApiRunner URL: https://github.com/apache/beam/pull/11270#issuecomment-614275628 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423061) Time Spent: 4.5h (was: 4h 20m) > Abstract bundle execution logic from stage execution logic > -- > > Key: BEAM-9639 > URL: https://issues.apache.org/jira/browse/BEAM-9639 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > > The FnApiRunner currently works on a per-stage manner, and does not abstract > single-bundle execution much. This work item is to clearly define the code to > execute a single bundle. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9650) Add consistent slowly changing side inputs support
[ https://issues.apache.org/jira/browse/BEAM-9650?focusedWorklogId=423060=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423060 ] ASF GitHub Bot logged work on BEAM-9650: Author: ASF GitHub Bot Created on: 15/Apr/20 20:54 Start Date: 15/Apr/20 20:54 Worklog Time Spent: 10m Work Description: soyrice commented on issue #11415: [BEAM-9650] Cleanup documentation on side inputs patterns URL: https://github.com/apache/beam/pull/11415#issuecomment-614275077 LGTM (modulo one last comment). Thanks again for opening this PR and doing an editorial review This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423060) Time Spent: 7h 10m (was: 7h) > Add consistent slowly changing side inputs support > -- > > Key: BEAM-9650 > URL: https://issues.apache.org/jira/browse/BEAM-9650 > Project: Beam > Issue Type: Bug > Components: io-ideas >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 7h 10m > Remaining Estimate: 0h > > Add implementation for slowly changing dimentions based on [design > doc](https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg/edit] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9650) Add consistent slowly changing side inputs support
[ https://issues.apache.org/jira/browse/BEAM-9650?focusedWorklogId=423058=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423058 ] ASF GitHub Bot logged work on BEAM-9650: Author: ASF GitHub Bot Created on: 15/Apr/20 20:53 Start Date: 15/Apr/20 20:53 Worklog Time Spent: 10m Work Description: soyrice commented on pull request #11415: [BEAM-9650] Cleanup documentation on side inputs patterns URL: https://github.com/apache/beam/pull/11415#discussion_r409127336 ## File path: website/src/documentation/patterns/side-inputs.md ## @@ -46,29 +54,34 @@ For instance, the following code sample uses a `Map` to create a `DoFn`. The `Ma {% github_sample /apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/snippets/Snippets.java tag:SideInputPatternSlowUpdateGlobalWindowSnip1 %} ``` +```py +No sample present. +``` ## Slowly updating side input using windowing -You can read side input pcollection periodically into distinct windows. -Later, when you apply side input to your main input, windows will be matched automatically 1:1. -This way, you can guarantee side input consistency on the duration of the single window. - -To do this, you can utilize PeriodicSequence PTransform that will generate infinite sequence -of elements with some real-time period: - -1. Use the PeriodicImpulse transform to generate windowed periodic sequence. - -a. MAX_TIMESTAMP can be replaced with some closer boundary if you want to stop generating elements at some point. +You can read side input data periodically into distinct PCollection windows. +Later, when you apply the side input to your main input, each main input Review comment: I think we should remove "Later" because this is a part of the overall workflow described in the previous sentence, rather than a secondary step. So this should be: "When you apply the side input to your main input..." This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423058) Time Spent: 7h (was: 6h 50m) > Add consistent slowly changing side inputs support > -- > > Key: BEAM-9650 > URL: https://issues.apache.org/jira/browse/BEAM-9650 > Project: Beam > Issue Type: Bug > Components: io-ideas >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 7h > Remaining Estimate: 0h > > Add implementation for slowly changing dimentions based on [design > doc](https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg/edit] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9692) Clean Python DataflowRunner to use portable pipelines
[ https://issues.apache.org/jira/browse/BEAM-9692?focusedWorklogId=423056=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423056 ] ASF GitHub Bot logged work on BEAM-9692: Author: ASF GitHub Bot Created on: 15/Apr/20 20:50 Start Date: 15/Apr/20 20:50 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #11335: [BEAM-9692]: Make CombineValues portable URL: https://github.com/apache/beam/pull/11335#discussion_r409125734 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner_test.py ## @@ -566,6 +566,19 @@ def test_get_default_gcp_region_ignores_error( result = runner.get_default_gcp_region() self.assertIsNone(result) + def test_combine_values_translation(self): +runner = DataflowRunner() + +with beam.Pipeline(runner=runner, + options=PipelineOptions(self.default_properties)) as p: + ( # pylint: disable=expression-not-assigned + p + | beam.Create([('a', [1, 2]), ('b', [3, 4])]) + | beam.CombineValues(lambda v, _: sum(v))) + +job_dict = json.loads(str(runner.job)) +self.assertEqual(job_dict[u'steps'][1][u'kind'], u'CombineValues') Review comment: Done, changed to assertIn This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423056) Time Spent: 1h 40m (was: 1.5h) > Clean Python DataflowRunner to use portable pipelines > - > > Key: BEAM-9692 > URL: https://issues.apache.org/jira/browse/BEAM-9692 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9650) Add consistent slowly changing side inputs support
[ https://issues.apache.org/jira/browse/BEAM-9650?focusedWorklogId=423054=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423054 ] ASF GitHub Bot logged work on BEAM-9650: Author: ASF GitHub Bot Created on: 15/Apr/20 20:49 Start Date: 15/Apr/20 20:49 Worklog Time Spent: 10m Work Description: soyrice commented on pull request #11415: [BEAM-9650] Cleanup documentation on side inputs patterns URL: https://github.com/apache/beam/pull/11415#discussion_r409124997 ## File path: website/src/documentation/patterns/side-inputs.md ## @@ -46,29 +54,36 @@ For instance, the following code sample uses a `Map` to create a `DoFn`. The `Ma {% github_sample /apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/snippets/Snippets.java tag:SideInputPatternSlowUpdateGlobalWindowSnip1 %} ``` +```py +No sample present. +``` ## Slowly updating side input using windowing -You can read side input pcollection periodically into distinct windows. -Later, when you apply side input to your main input, windows will be matched automatically 1:1. -This way, you can guarantee side input consistency on the duration of the single window. +You can read side input data periodically into distinct PCollection windows. +Later, when you apply the side input to your main input, each main input +window is automatically matched to a single side input window. +This guarantees consistency on the duration of the single window, +meaning that each window on the main input will be matched to a single +version of side input data. -To do this, you can utilize PeriodicSequence PTransform that will generate infinite sequence -of elements with some real-time period: +Described approach can be implemented using combination of +PeriodicSequence or PeriodicImpulse PTransforms and SDF Read or ReadAll Review comment: I probably just missed the word. Looks good in staging. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423054) Time Spent: 6h 50m (was: 6h 40m) > Add consistent slowly changing side inputs support > -- > > Key: BEAM-9650 > URL: https://issues.apache.org/jira/browse/BEAM-9650 > Project: Beam > Issue Type: Bug > Components: io-ideas >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 6h 50m > Remaining Estimate: 0h > > Add implementation for slowly changing dimentions based on [design > doc](https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg/edit] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9639) Abstract bundle execution logic from stage execution logic
[ https://issues.apache.org/jira/browse/BEAM-9639?focusedWorklogId=423050=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423050 ] ASF GitHub Bot logged work on BEAM-9639: Author: ASF GitHub Bot Created on: 15/Apr/20 20:43 Start Date: 15/Apr/20 20:43 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11270: [BEAM-9639][BEAM-9608] Improvements for FnApiRunner URL: https://github.com/apache/beam/pull/11270#issuecomment-614269899 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423050) Time Spent: 4h 20m (was: 4h 10m) > Abstract bundle execution logic from stage execution logic > -- > > Key: BEAM-9639 > URL: https://issues.apache.org/jira/browse/BEAM-9639 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > The FnApiRunner currently works on a per-stage manner, and does not abstract > single-bundle execution much. This work item is to clearly define the code to > execute a single bundle. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-9764) :sdks:java:container:generateThirdPartyLicenses failing
[ https://issues.apache.org/jira/browse/BEAM-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-9764 started by Hannah Jiang. -- > :sdks:java:container:generateThirdPartyLicenses failing > --- > > Key: BEAM-9764 > URL: https://issues.apache.org/jira/browse/BEAM-9764 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, test-failures >Reporter: Udi Meiri >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > > https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/774/console > The traceback is interspersed with other logs: > {code} > Traceback (most recent call last): > Successfully pulled > java_third_party_licenses/protobuf-java-util-3.11.1.jar/LICENSE from > https://opensource.org/licenses/BSD-3-Clause > Successfully pulled java_third_party_licenses/protoc-3.11.0.jar/LICENSE from > http://www.apache.org/licenses/LICENSE-2.0.txt > File "sdks/java/container/license_scripts/pull_licenses_java.py", line 138, > in > Successfully pulled java_third_party_licenses/protoc-3.11.1.jar/LICENSE from > http://www.apache.org/licenses/LICENSE-2.0.txt > license_url = dep['moduleLicenseUrl'] > Successfully pulled java_third_party_licenses/zetasketch-0.1.0.jar/LICENSE > from http://www.apache.org/licenses/LICENSE-2.0.txt > KeyError: 'moduleLicenseUrl' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=423048=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423048 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 15/Apr/20 20:41 Start Date: 15/Apr/20 20:41 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#issuecomment-614269167 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423048) Time Spent: 33h 10m (was: 33h) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 33h 10m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=423047=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423047 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 15/Apr/20 20:41 Start Date: 15/Apr/20 20:41 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#issuecomment-614268998 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423047) Time Spent: 33h (was: 32h 50m) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 33h > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9639) Abstract bundle execution logic from stage execution logic
[ https://issues.apache.org/jira/browse/BEAM-9639?focusedWorklogId=423046=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423046 ] ASF GitHub Bot logged work on BEAM-9639: Author: ASF GitHub Bot Created on: 15/Apr/20 20:40 Start Date: 15/Apr/20 20:40 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11270: [BEAM-9639][BEAM-9608] Improvements for FnApiRunner URL: https://github.com/apache/beam/pull/11270#issuecomment-614268696 failed test is streaming wordcount test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423046) Time Spent: 4h 10m (was: 4h) > Abstract bundle execution logic from stage execution logic > -- > > Key: BEAM-9639 > URL: https://issues.apache.org/jira/browse/BEAM-9639 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > The FnApiRunner currently works on a per-stage manner, and does not abstract > single-bundle execution much. This work item is to clearly define the code to > execute a single bundle. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9674) "Selected fields list too long" error when calling tables.get in BigQueryStorageTableSource
[ https://issues.apache.org/jira/browse/BEAM-9674?focusedWorklogId=423043=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423043 ] ASF GitHub Bot logged work on BEAM-9674: Author: ASF GitHub Bot Created on: 15/Apr/20 20:38 Start Date: 15/Apr/20 20:38 Worklog Time Spent: 10m Work Description: kmjung commented on pull request #11292: [BEAM-9674] Don't specify selected fields when fetching BigQuery table size URL: https://github.com/apache/beam/pull/11292#discussion_r409117833 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageSourceBase.java ## @@ -149,11 +151,14 @@ return ImmutableList.of(); } +Schema sessionSchema = new Schema.Parser().parse(readSession.getAvroSchema().getSchema()); Review comment: With this change, we're no longer specifying the list of selected fields to the tables.get call from which the BigQuery schema is taken; as a result, we get the entire table schema back, so we have to trim it on the client side in the case where the client has specified selected fields. The Avro schema is returned as part of the read session and contains only the selected fields, so we use it as the basis for trimming the BigQuery schema. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423043) Time Spent: 4h 20m (was: 4h 10m) > "Selected fields list too long" error when calling tables.get in > BigQueryStorageTableSource > --- > > Key: BEAM-9674 > URL: https://issues.apache.org/jira/browse/BEAM-9674 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.19.0 >Reporter: Kenneth Jung >Assignee: Kenneth Jung >Priority: Minor > Time Spent: 4h 20m > Remaining Estimate: 0h > > Customers experience errors similar to the following: > Caused by: > com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad > Request { "code" : 400, "errors" : [ > { "domain" : "global", "message" : "Selected fields too long: must > be less than 16384 characters.", "reason" : "invalid" } > ], "message" : "Selected fields too long: must be less than 16384 > characters.", "status" : "INVALID_ARGUMENT" } > com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146) > > com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113) > > com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40) > > com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321) > com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1097) > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419) > > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352) > > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469) > > org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.executeWithRetries(BigQueryServicesImpl.java:938) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9674) "Selected fields list too long" error when calling tables.get in BigQueryStorageTableSource
[ https://issues.apache.org/jira/browse/BEAM-9674?focusedWorklogId=423044=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423044 ] ASF GitHub Bot logged work on BEAM-9674: Author: ASF GitHub Bot Created on: 15/Apr/20 20:38 Start Date: 15/Apr/20 20:38 Worklog Time Spent: 10m Work Description: kmjung commented on pull request #11292: [BEAM-9674] Don't specify selected fields when fetching BigQuery table size URL: https://github.com/apache/beam/pull/11292#discussion_r409118981 ## File path: sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageReadTest.java ## @@ -1368,19 +1368,14 @@ public void testStreamSourceSplitAtFractionFailsWhenParentIsPastSplitPoint() thr public void testReadFromBigQueryIO() throws Exception { fakeDatasetService.createDataset("foo.com:project", "dataset", "", "", null); TableReference tableRef = BigQueryHelpers.parseTableSpec("foo.com:project:dataset.table"); - -Table table = -new Table().setTableReference(tableRef).setNumBytes(10L).setSchema(new TableSchema()); - +Table table = new Table().setTableReference(tableRef).setNumBytes(10L).setSchema(TABLE_SCHEMA); fakeDatasetService.createTable(table); CreateReadSessionRequest expectedCreateReadSessionRequest = CreateReadSessionRequest.newBuilder() .setParent("projects/project-id") .setTableReference(BigQueryHelpers.toTableRefProto(tableRef)) .setRequestedStreams(10) -.setReadOptions( Review comment: I'm not sure that I understand your question here. We used to have e2e tests for BigQueryIO with the read API only for the case where the caller specified all fields as selected fields; with this change, now we have two tests -- one which covers the case where no selected fields are covered (which is effectively the same as the case where all fields are specified), and another which covers the case where only a subset of fields are specified. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423044) > "Selected fields list too long" error when calling tables.get in > BigQueryStorageTableSource > --- > > Key: BEAM-9674 > URL: https://issues.apache.org/jira/browse/BEAM-9674 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.19.0 >Reporter: Kenneth Jung >Assignee: Kenneth Jung >Priority: Minor > Time Spent: 4h 20m > Remaining Estimate: 0h > > Customers experience errors similar to the following: > Caused by: > com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad > Request { "code" : 400, "errors" : [ > { "domain" : "global", "message" : "Selected fields too long: must > be less than 16384 characters.", "reason" : "invalid" } > ], "message" : "Selected fields too long: must be less than 16384 > characters.", "status" : "INVALID_ARGUMENT" } > com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146) > > com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113) > > com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40) > > com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321) > com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1097) > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419) > > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352) > > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469) > > org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.executeWithRetries(BigQueryServicesImpl.java:938) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9749) beam_PostCommit_SQL failing (missing region)
[ https://issues.apache.org/jira/browse/BEAM-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver resolved BEAM-9749. --- Fix Version/s: 2.21.0 Resolution: Fixed > beam_PostCommit_SQL failing (missing region) > > > Key: BEAM-9749 > URL: https://issues.apache.org/jira/browse/BEAM-9749 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: 2.21.0 > > > java.lang.IllegalArgumentException: Class interface > org.apache.beam.sdk.testing.TestPipelineOptions missing a property named > 'region'. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9744) Python performance tests failing
[ https://issues.apache.org/jira/browse/BEAM-9744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver resolved BEAM-9744. --- Fix Version/s: 2.21.0 Resolution: Fixed > Python performance tests failing > > > Key: BEAM-9744 > URL: https://issues.apache.org/jira/browse/BEAM-9744 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: 2.21.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > beam_PerformanceTests_WordCountIT_Py* failing because --region is missing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9756) beam_PostCommit_Java_Nexmark (non-Dataflow) failing
[ https://issues.apache.org/jira/browse/BEAM-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver resolved BEAM-9756. --- Fix Version/s: 2.21.0 Resolution: Fixed > beam_PostCommit_Java_Nexmark (non-Dataflow) failing > --- > > Key: BEAM-9756 > URL: https://issues.apache.org/jira/browse/BEAM-9756 > Project: Beam > Issue Type: Bug > Components: test-failures, testing-nexmark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: 2.21.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > 12:02:24 Exception in thread "main" java.lang.IllegalArgumentException: Class > interface org.apache.beam.sdk.nexmark.NexmarkOptions missing a property named > 'region'. > 12:02:24 at > org.apache.beam.sdk.options.PipelineOptionsFactory.parseObjects(PipelineOptionsFactory.java:1625) > 12:02:24 at > org.apache.beam.sdk.options.PipelineOptionsFactory.access$400(PipelineOptionsFactory.java:115) > 12:02:24 at > org.apache.beam.sdk.options.PipelineOptionsFactory$Builder.as(PipelineOptionsFactory.java:298) > 12:02:24 at org.apache.beam.sdk.nexmark.Main.runAll(Main.java:98) > 12:02:24 at org.apache.beam.sdk.nexmark.Main.main(Main.java:415) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9756) beam_PostCommit_Java_Nexmark (non-Dataflow) failing
[ https://issues.apache.org/jira/browse/BEAM-9756?focusedWorklogId=423042=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423042 ] ASF GitHub Bot logged work on BEAM-9756: Author: ASF GitHub Bot Created on: 15/Apr/20 20:34 Start Date: 15/Apr/20 20:34 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11426: [BEAM-9756] [cherry-pick] Nexmark: only use --region in Dataflow. URL: https://github.com/apache/beam/pull/11426 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423042) Time Spent: 2h 10m (was: 2h) > beam_PostCommit_Java_Nexmark (non-Dataflow) failing > --- > > Key: BEAM-9756 > URL: https://issues.apache.org/jira/browse/BEAM-9756 > Project: Beam > Issue Type: Bug > Components: test-failures, testing-nexmark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > 12:02:24 Exception in thread "main" java.lang.IllegalArgumentException: Class > interface org.apache.beam.sdk.nexmark.NexmarkOptions missing a property named > 'region'. > 12:02:24 at > org.apache.beam.sdk.options.PipelineOptionsFactory.parseObjects(PipelineOptionsFactory.java:1625) > 12:02:24 at > org.apache.beam.sdk.options.PipelineOptionsFactory.access$400(PipelineOptionsFactory.java:115) > 12:02:24 at > org.apache.beam.sdk.options.PipelineOptionsFactory$Builder.as(PipelineOptionsFactory.java:298) > 12:02:24 at org.apache.beam.sdk.nexmark.Main.runAll(Main.java:98) > 12:02:24 at org.apache.beam.sdk.nexmark.Main.main(Main.java:415) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9767) test_streaming_wordcount flaky timeouts
[ https://issues.apache.org/jira/browse/BEAM-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9767: --- Status: Open (was: Triage Needed) > test_streaming_wordcount flaky timeouts > --- > > Key: BEAM-9767 > URL: https://issues.apache.org/jira/browse/BEAM-9767 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Udi Meiri >Assignee: Sam Rohde >Priority: Major > > Timed out after 600s, typically completes in 2.8s on my workstation. > https://builds.apache.org/job/beam_PreCommit_Python_Commit/12376/ > {code} > self = > testMethod=test_streaming_wordcount> > @unittest.skipIf( > sys.version_info < (3, 5, 3), > 'The tests require at least Python 3.6 to work.') > def test_streaming_wordcount(self): > class WordExtractingDoFn(beam.DoFn): > def process(self, element): > text_line = element.strip() > words = text_line.split() > return words > > # Add the TestStream so that it can be cached. > ib.options.capturable_sources.add(TestStream) > ib.options.capture_duration = timedelta(seconds=5) > > p = beam.Pipeline( > runner=interactive_runner.InteractiveRunner(), > options=StandardOptions(streaming=True)) > > data = ( > p > | TestStream() > .advance_watermark_to(0) > .advance_processing_time(1) > .add_elements(['to', 'be', 'or', 'not', 'to', 'be']) > .advance_watermark_to(20) > .advance_processing_time(1) > .add_elements(['that', 'is', 'the', 'question']) > | beam.WindowInto(beam.window.FixedWindows(10))) # yapf: disable > > counts = ( > data > | 'split' >> beam.ParDo(WordExtractingDoFn()) > | 'pair_with_one' >> beam.Map(lambda x: (x, 1)) > | 'group' >> beam.GroupByKey() > | 'count' >> beam.Map(lambda wordones: (wordones[0], > sum(wordones[1] > > # Watch the local scope for Interactive Beam so that referenced > PCollections > # will be cached. > ib.watch(locals()) > > # This is normally done in the interactive_utils when a transform is > # applied but needs an IPython environment. So we manually run this > here. > ie.current_env().track_user_pipelines() > > # Create a fake limiter that cancels the BCJ once the main job receives > the > # expected amount of results. > class FakeLimiter: > def __init__(self, p, pcoll): > self.p = p > self.pcoll = pcoll > > def is_triggered(self): > result = ie.current_env().pipeline_result(self.p) > if result: > try: > results = result.get(self.pcoll) > except ValueError: > return False > return len(results) >= 10 > return False > > # This sets the limiters to stop reading when the test receives 10 > elements > # or after 5 seconds have elapsed (to eliminate the possibility of > hanging). > ie.current_env().options.capture_control.set_limiters_for_test( > [FakeLimiter(p, data), DurationLimiter(timedelta(seconds=5))]) > > # This tests that the data was correctly cached. > pane_info = PaneInfo(True, True, PaneInfoTiming.UNKNOWN, 0, 0) > expected_data_df = pd.DataFrame([ > ('to', 0, [IntervalWindow(0, 10)], pane_info), > ('be', 0, [IntervalWindow(0, 10)], pane_info), > ('or', 0, [IntervalWindow(0, 10)], pane_info), > ('not', 0, [IntervalWindow(0, 10)], pane_info), > ('to', 0, [IntervalWindow(0, 10)], pane_info), > ('be', 0, [IntervalWindow(0, 10)], pane_info), > ('that', 2000, [IntervalWindow(20, 30)], pane_info), > ('is', 2000, [IntervalWindow(20, 30)], pane_info), > ('the', 2000, [IntervalWindow(20, 30)], pane_info), > ('question', 2000, [IntervalWindow(20, 30)], pane_info) > ], columns=[0, 'event_time', 'windows', 'pane_info']) # yapf: disable > > > data_df = ib.collect(data, include_window_info=True) > apache_beam/runners/interactive/interactive_runner_test.py:237: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > apache_beam/runners/interactive/interactive_beam.py:451: in collect > return head(pcoll, n=-1, include_window_info=include_window_info) > apache_beam/runners/interactive/utils.py:204: in run_within_progress_indicator > return func(*args, **kwargs) > apache_beam/runners/interactive/interactive_beam.py:515: in head > result.wait_until_finish() >
[jira] [Comment Edited] (BEAM-9764) :sdks:java:container:generateThirdPartyLicenses failing
[ https://issues.apache.org/jira/browse/BEAM-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084338#comment-17084338 ] Hannah Jiang edited comment on BEAM-9764 at 4/15/20, 8:32 PM: -- Log: {code:java} 05:19:17 Invalid url: https://git.tukaani.org/?p=xz-java.git;a=blob_plain;f=COPYING;h=c1d404dc7a6f06a0437bf1055fedaa4a4c89d728;hb=HEAD 05:19:17 Invalid url: https://git.tukaani.org/?p=xz-java.git;a=blob_plain;f=COPYING;h=c1d404dc7a6f06a0437bf1055fedaa4a4c89d728;hb=HEAD {code} Error: {code:java} 05:19:21 Traceback (most recent call last): 05:19:21 File "sdks/java/container/license_scripts/pull_licenses_java.py", line 225, in 05:19:21 error_msg) 05:19:21 RuntimeError: ('1 error(s) occurred.', [' Licenses were not able to be pulled automatically for some dependencies. Please search source code of the dependencies on the internet and add "license" and "notice" (if available) field to sdks/java/container/license_scripts/dep_urls_java.yaml for each missing license. Dependency List: [xz-1.5,xz-1.8]']) {code} The URLs are valid and they worked fine several times. Need to see why they are invalid with this run. was (Author: hannahjiang): Log: {code:java} 05:19:17 Invalid url: https://git.tukaani.org/?p=xz-java.git;a=blob_plain;f=COPYING;h=c1d404dc7a6f06a0437bf1055fedaa4a4c89d728;hb=HEAD 05:19:17 Invalid url: https://git.tukaani.org/?p=xz-java.git;a=blob_plain;f=COPYING;h=c1d404dc7a6f06a0437bf1055fedaa4a4c89d728;hb=HEAD {code} ``` Error: {code:java} 05:19:21 Traceback (most recent call last): 05:19:21 File "sdks/java/container/license_scripts/pull_licenses_java.py", line 225, in 05:19:21 error_msg) 05:19:21 RuntimeError: ('1 error(s) occurred.', [' Licenses were not able to be pulled automatically for some dependencies. Please search source code of the dependencies on the internet and add "license" and "notice" (if available) field to sdks/java/container/license_scripts/dep_urls_java.yaml for each missing license. Dependency List: [xz-1.5,xz-1.8]']) {code} The URLs are valid and they worked fine several times. Need to see why they are invalid with this run. > :sdks:java:container:generateThirdPartyLicenses failing > --- > > Key: BEAM-9764 > URL: https://issues.apache.org/jira/browse/BEAM-9764 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, test-failures >Reporter: Udi Meiri >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > > https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/774/console > The traceback is interspersed with other logs: > {code} > Traceback (most recent call last): > Successfully pulled > java_third_party_licenses/protobuf-java-util-3.11.1.jar/LICENSE from > https://opensource.org/licenses/BSD-3-Clause > Successfully pulled java_third_party_licenses/protoc-3.11.0.jar/LICENSE from > http://www.apache.org/licenses/LICENSE-2.0.txt > File "sdks/java/container/license_scripts/pull_licenses_java.py", line 138, > in > Successfully pulled java_third_party_licenses/protoc-3.11.1.jar/LICENSE from > http://www.apache.org/licenses/LICENSE-2.0.txt > license_url = dep['moduleLicenseUrl'] > Successfully pulled java_third_party_licenses/zetasketch-0.1.0.jar/LICENSE > from http://www.apache.org/licenses/LICENSE-2.0.txt > KeyError: 'moduleLicenseUrl' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=423038=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423038 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 15/Apr/20 20:28 Start Date: 15/Apr/20 20:28 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#issuecomment-614263256 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423038) Time Spent: 32h 50m (was: 32h 40m) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 32h 50m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9674) "Selected fields list too long" error when calling tables.get in BigQueryStorageTableSource
[ https://issues.apache.org/jira/browse/BEAM-9674?focusedWorklogId=423036=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423036 ] ASF GitHub Bot logged work on BEAM-9674: Author: ASF GitHub Bot Created on: 15/Apr/20 20:24 Start Date: 15/Apr/20 20:24 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11292: [BEAM-9674] Don't specify selected fields when fetching BigQuery table size URL: https://github.com/apache/beam/pull/11292#discussion_r409107026 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageSourceBase.java ## @@ -149,11 +151,14 @@ return ImmutableList.of(); } +Schema sessionSchema = new Schema.Parser().parse(readSession.getAvroSchema().getSchema()); Review comment: Not sure how this trims the schema ? Does readSession.getAvroSchema() somehow has a smaller number of fields than targetTable.getSchema() ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423036) Time Spent: 4h (was: 3h 50m) > "Selected fields list too long" error when calling tables.get in > BigQueryStorageTableSource > --- > > Key: BEAM-9674 > URL: https://issues.apache.org/jira/browse/BEAM-9674 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.19.0 >Reporter: Kenneth Jung >Assignee: Kenneth Jung >Priority: Minor > Time Spent: 4h > Remaining Estimate: 0h > > Customers experience errors similar to the following: > Caused by: > com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad > Request { "code" : 400, "errors" : [ > { "domain" : "global", "message" : "Selected fields too long: must > be less than 16384 characters.", "reason" : "invalid" } > ], "message" : "Selected fields too long: must be less than 16384 > characters.", "status" : "INVALID_ARGUMENT" } > com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146) > > com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113) > > com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40) > > com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321) > com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1097) > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419) > > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352) > > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469) > > org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.executeWithRetries(BigQueryServicesImpl.java:938) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9674) "Selected fields list too long" error when calling tables.get in BigQueryStorageTableSource
[ https://issues.apache.org/jira/browse/BEAM-9674?focusedWorklogId=423037=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423037 ] ASF GitHub Bot logged work on BEAM-9674: Author: ASF GitHub Bot Created on: 15/Apr/20 20:24 Start Date: 15/Apr/20 20:24 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11292: [BEAM-9674] Don't specify selected fields when fetching BigQuery table size URL: https://github.com/apache/beam/pull/11292#discussion_r409111886 ## File path: sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageReadTest.java ## @@ -1368,19 +1368,14 @@ public void testStreamSourceSplitAtFractionFailsWhenParentIsPastSplitPoint() thr public void testReadFromBigQueryIO() throws Exception { fakeDatasetService.createDataset("foo.com:project", "dataset", "", "", null); TableReference tableRef = BigQueryHelpers.parseTableSpec("foo.com:project:dataset.table"); - -Table table = -new Table().setTableReference(tableRef).setNumBytes(10L).setSchema(new TableSchema()); - +Table table = new Table().setTableReference(tableRef).setNumBytes(10L).setSchema(TABLE_SCHEMA); fakeDatasetService.createTable(table); CreateReadSessionRequest expectedCreateReadSessionRequest = CreateReadSessionRequest.newBuilder() .setParent("projects/project-id") .setTableReference(BigQueryHelpers.toTableRefProto(tableRef)) .setRequestedStreams(10) -.setReadOptions( Review comment: Is this a feature regression for the storage API based read path ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423037) Time Spent: 4h 10m (was: 4h) > "Selected fields list too long" error when calling tables.get in > BigQueryStorageTableSource > --- > > Key: BEAM-9674 > URL: https://issues.apache.org/jira/browse/BEAM-9674 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.19.0 >Reporter: Kenneth Jung >Assignee: Kenneth Jung >Priority: Minor > Time Spent: 4h 10m > Remaining Estimate: 0h > > Customers experience errors similar to the following: > Caused by: > com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad > Request { "code" : 400, "errors" : [ > { "domain" : "global", "message" : "Selected fields too long: must > be less than 16384 characters.", "reason" : "invalid" } > ], "message" : "Selected fields too long: must be less than 16384 > characters.", "status" : "INVALID_ARGUMENT" } > com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146) > > com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113) > > com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40) > > com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321) > com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1097) > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419) > > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352) > > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469) > > org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.executeWithRetries(BigQueryServicesImpl.java:938) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8472) Get default GCP region from gcloud
[ https://issues.apache.org/jira/browse/BEAM-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084359#comment-17084359 ] Kyle Weaver commented on BEAM-8472: --- > Dataflow doesn't currently support the Go SDK so this won't be prioritized > above current work any time soon. Understood. > Just to be clear, the protocol is to check the environment variables, and > then execute the gcloud command? Yes, that's correct. For reference (helpful for testing): https://cloud.google.com/compute/docs/gcloud-compute#set_default_zone_and_region_in_your_local_client > Get default GCP region from gcloud > -- > > Key: BEAM-8472 > URL: https://issues.apache.org/jira/browse/BEAM-8472 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow, sdk-go >Reporter: Kyle Weaver >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > Currently, we default to us-central1 if --region flag is not set. The Google > Cloud SDK generally tries to get a default value in this case for > convenience, which we should follow. > [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties] > Update 11/12: this is complete for Python and Java, Go remains. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=423034=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423034 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 15/Apr/20 20:21 Start Date: 15/Apr/20 20:21 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#issuecomment-614260458 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423034) Time Spent: 32h 40m (was: 32.5h) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 32h 40m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9765) :vendor:calcite-1_20_0:validateVendoring fails
[ https://issues.apache.org/jira/browse/BEAM-9765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084352#comment-17084352 ] Kyle Weaver commented on BEAM-9765: --- Marking this as a release blocker since IIUC this will be a problem when we try to publish artifacts. It looks like there has been some back-and-forth on this line: https://github.com/apache/beam/blob/4a7eb329734131e1ef90419f405986de94a30846/buildSrc/src/main/groovy/org/apache/beam/gradle/VendorJavaPlugin.groovy#L142 Changing it to `exclude "**/module-info.class"` fixes the issue, however I am not sure if we should change this line or if we should somehow prevent module-info.class from entering the jar. > :vendor:calcite-1_20_0:validateVendoring fails > -- > > Key: BEAM-9765 > URL: https://issues.apache.org/jira/browse/BEAM-9765 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kyle Weaver >Priority: Major > Fix For: 2.21.0 > > > This error was reported on Slack: > https://the-asf.slack.com/archives/C9H0YNP3P/p1586911958184200 > "I encountered this error when I built Beam from master branch. Is it a known > issue? It happens in beam 2.21 and master branch, but the build works fine in > 2.20." > --- > * What went wrong: > Execution failed for task ':vendor:calcite-1_20_0:validateVendoring'. > > /home/yangzhan/oss/beam/vendor/calcite-1_20_0/build/libs/beam-vendor-calcite-1_20_0-0.2.jar > > exposed classes outside of org.apache.beam namespace: > > [/home/yangzhan/oss/beam/vendor/calcite-1_20_0/build/tmp/expandedArchives/beam-vendor-calcite-1_20_0-0.2.jar_ee40b0aab4e7709d8d80d205ee8852ba/module-info.class] > * Try: > Run with --stacktrace option to get the stack trace. Run with --info or > --debug option to get more log output. Run with --scan to get full insights. > == -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9765) :vendor:calcite-1_20_0:validateVendoring fails
[ https://issues.apache.org/jira/browse/BEAM-9765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-9765: -- Fix Version/s: 2.21.0 > :vendor:calcite-1_20_0:validateVendoring fails > -- > > Key: BEAM-9765 > URL: https://issues.apache.org/jira/browse/BEAM-9765 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kyle Weaver >Priority: Major > Fix For: 2.21.0 > > > This error was reported on Slack: > https://the-asf.slack.com/archives/C9H0YNP3P/p1586911958184200 > "I encountered this error when I built Beam from master branch. Is it a known > issue? It happens in beam 2.21 and master branch, but the build works fine in > 2.20." > --- > * What went wrong: > Execution failed for task ':vendor:calcite-1_20_0:validateVendoring'. > > /home/yangzhan/oss/beam/vendor/calcite-1_20_0/build/libs/beam-vendor-calcite-1_20_0-0.2.jar > > exposed classes outside of org.apache.beam namespace: > > [/home/yangzhan/oss/beam/vendor/calcite-1_20_0/build/tmp/expandedArchives/beam-vendor-calcite-1_20_0-0.2.jar_ee40b0aab4e7709d8d80d205ee8852ba/module-info.class] > * Try: > Run with --stacktrace option to get the stack trace. Run with --info or > --debug option to get more log output. Run with --scan to get full insights. > == -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9766) Add support for dynamic destinations when writing to Kinesis
[ https://issues.apache.org/jira/browse/BEAM-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084350#comment-17084350 ] Ismaël Mejía edited comment on BEAM-9766 at 4/15/20, 8:05 PM: -- Let's try to follow closely the dynamic destinations pattern for this case, I had forgotten that the 'proper' version for Kafka was still missing. was (Author: iemejia): Let's try to follow closely the dynamic destinations pattern for this case, I have forgotten that the 'proper' version for Kafka was still missing. > Add support for dynamic destinations when writing to Kinesis > > > Key: BEAM-9766 > URL: https://issues.apache.org/jira/browse/BEAM-9766 > Project: Beam > Issue Type: Improvement > Components: io-java-kinesis >Reporter: Dan Ladd >Priority: Major > > KinesisIO is only able to write to a single named stream. > It would be great if we could dynamically write to different Kinesis streams. > > I believe this functionality is available in KafkaIO with writeRecords() > where the topic name is defined in WriteRecord. > https://the-asf.slack.com/archives/C9H0YNP3P/p1586966769192700 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9766) Add support for dynamic destinations when writing to Kinesis
[ https://issues.apache.org/jira/browse/BEAM-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084350#comment-17084350 ] Ismaël Mejía commented on BEAM-9766: Let's try to follow closely the dynamic destinations pattern for this case, I have forgotten that the 'proper' version for Kafka was still missing. > Add support for dynamic destinations when writing to Kinesis > > > Key: BEAM-9766 > URL: https://issues.apache.org/jira/browse/BEAM-9766 > Project: Beam > Issue Type: Improvement > Components: io-java-kinesis >Reporter: Dan Ladd >Priority: Major > > KinesisIO is only able to write to a single named stream. > It would be great if we could dynamically write to different Kinesis streams. > > I believe this functionality is available in KafkaIO with writeRecords() > where the topic name is defined in WriteRecord. > https://the-asf.slack.com/archives/C9H0YNP3P/p1586966769192700 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9766) Add support for dynamic destinations when writing to Kinesis
[ https://issues.apache.org/jira/browse/BEAM-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9766: --- Summary: Add support for dynamic destinations when writing to Kinesis (was: KinesisIO Write to Multiple Streams) > Add support for dynamic destinations when writing to Kinesis > > > Key: BEAM-9766 > URL: https://issues.apache.org/jira/browse/BEAM-9766 > Project: Beam > Issue Type: Improvement > Components: io-java-kinesis >Reporter: Dan Ladd >Priority: Major > > KinesisIO is only able to write to a single named stream. > It would be great if we could dynamically write to different Kinesis streams. > > I believe this functionality is available in KafkaIO with writeRecords() > where the topic name is defined in WriteRecord. > https://the-asf.slack.com/archives/C9H0YNP3P/p1586966769192700 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=423022=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423022 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 15/Apr/20 20:01 Start Date: 15/Apr/20 20:01 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#issuecomment-614251554 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 423022) Time Spent: 32.5h (was: 32h 20m) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 32.5h > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9764) :sdks:java:container:generateThirdPartyLicenses failing
[ https://issues.apache.org/jira/browse/BEAM-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084348#comment-17084348 ] Hannah Jiang commented on BEAM-9764: The next run pulled from the same urls successfully. https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/775/console {code} 11:46:40 Successfully pulled java_third_party_licenses/xz-1.5.jar/LICENSE from https://git.tukaani.org/?p=xz-java.git;a=blob_plain;f=COPYING;h=c1d404dc7a6f06a0437bf1055fedaa4a4c89d728;hb=HEAD 11:46:40 Successfully pulled java_third_party_licenses/xz-1.8.jar/LICENSE from https://git.tukaani.org/?p=xz-java.git;a=blob_plain;f=COPYING;h=c1d404dc7a6f06a0437bf1055fedaa4a4c89d728;hb=HEAD {code} I tried locally to pull from the urls and it worked for more than 20 times. Will add trace back print to get more error messages. > :sdks:java:container:generateThirdPartyLicenses failing > --- > > Key: BEAM-9764 > URL: https://issues.apache.org/jira/browse/BEAM-9764 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, test-failures >Reporter: Udi Meiri >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > > https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/774/console > The traceback is interspersed with other logs: > {code} > Traceback (most recent call last): > Successfully pulled > java_third_party_licenses/protobuf-java-util-3.11.1.jar/LICENSE from > https://opensource.org/licenses/BSD-3-Clause > Successfully pulled java_third_party_licenses/protoc-3.11.0.jar/LICENSE from > http://www.apache.org/licenses/LICENSE-2.0.txt > File "sdks/java/container/license_scripts/pull_licenses_java.py", line 138, > in > Successfully pulled java_third_party_licenses/protoc-3.11.1.jar/LICENSE from > http://www.apache.org/licenses/LICENSE-2.0.txt > license_url = dep['moduleLicenseUrl'] > Successfully pulled java_third_party_licenses/zetasketch-0.1.0.jar/LICENSE > from http://www.apache.org/licenses/LICENSE-2.0.txt > KeyError: 'moduleLicenseUrl' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)