[jira] [Work logged] (BEAM-9678) Introduction Kata | Go SDK Code Katas
[ https://issues.apache.org/jira/browse/BEAM-9678?focusedWorklogId=422520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422520 ] ASF GitHub Bot logged work on BEAM-9678: Author: ASF GitHub Bot Created on: 15/Apr/20 05:10 Start Date: 15/Apr/20 05:10 Worklog Time Spent: 10m Work Description: damondouglas commented on pull request #11340: [BEAM-9678] Create Go SDK introduction kata URL: https://github.com/apache/beam/pull/11340#discussion_r408584291 ## File path: learning/katas/go/Introduction/Hello Beam/Hello Beam/pkg/task/task.go ## @@ -0,0 +1,24 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +//http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package task + +import ( + "github.com/apache/beam/sdks/go/pkg/beam" +) + +func HelloBeam(s beam.Scope) beam.PCollection { Review comment: May we consider keeping this? When I consolidated everything into the same file, I received the error `flag redefined: runner`. IntelliJ Edu constrains us to having tests in the test folder in order to achieve lesson task validation, so we can't do any task validation with tests in any other folder. Also, while there doesn't seem to be a consensus around go project structure in the community, this repo seems to follows the `./{cmd,pkg}/*` structure. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422520) Time Spent: 3.5h (was: 3h 20m) > Introduction Kata | Go SDK Code Katas > - > > Key: BEAM-9678 > URL: https://issues.apache.org/jira/browse/BEAM-9678 > Project: Beam > Issue Type: Sub-task > Components: katas, sdk-go >Reporter: Damon Douglas >Assignee: Damon Douglas >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > An Introduction kata patterns after > [https://github.com/apache/beam/tree/master/learning/katas/java/Introduction] > where the take away is an individual's ability to start an Apache Beam > pipeline using the Golang SDK. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9642) Add SDF execution-time runners
[ https://issues.apache.org/jira/browse/BEAM-9642?focusedWorklogId=422476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422476 ] ASF GitHub Bot logged work on BEAM-9642: Author: ASF GitHub Bot Created on: 15/Apr/20 01:17 Start Date: 15/Apr/20 01:17 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #11423: [BEAM-9642] Fix infinite recursion. URL: https://github.com/apache/beam/pull/11423 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422476) Time Spent: 5h 20m (was: 5h 10m) > Add SDF execution-time runners > -- > > Key: BEAM-9642 > URL: https://issues.apache.org/jira/browse/BEAM-9642 > Project: Beam > Issue Type: Sub-task > Components: sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > Adds execution-time SDF runner units to the exec package, and any unit tests > + helpers required. > This is needed to get the expanded SDF URNs to execute in the runner harness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9642) Add SDF execution-time runners
[ https://issues.apache.org/jira/browse/BEAM-9642?focusedWorklogId=422475=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422475 ] ASF GitHub Bot logged work on BEAM-9642: Author: ASF GitHub Bot Created on: 15/Apr/20 01:16 Start Date: 15/Apr/20 01:16 Worklog Time Spent: 10m Work Description: youngoli commented on issue #11423: [BEAM-9642] Fix infinite recursion. URL: https://github.com/apache/beam/pull/11423#issuecomment-613760696 Agreed. I did what validation I could, but neither lint or gofmt caught this one. I think go vet would've caught it, but it was annoying to use (if I ran it on this file specifically, it would complain about not finding the symbols from other files in the same package. If I ran it on the entire directory, I'd get a million errors about the missing documentation for all the autogenerated code). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422475) Time Spent: 5h 10m (was: 5h) > Add SDF execution-time runners > -- > > Key: BEAM-9642 > URL: https://issues.apache.org/jira/browse/BEAM-9642 > Project: Beam > Issue Type: Sub-task > Components: sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 5h 10m > Remaining Estimate: 0h > > Adds execution-time SDF runner units to the exec package, and any unit tests > + helpers required. > This is needed to get the expanded SDF URNs to execute in the runner harness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9642) Add SDF execution-time runners
[ https://issues.apache.org/jira/browse/BEAM-9642?focusedWorklogId=422474=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422474 ] ASF GitHub Bot logged work on BEAM-9642: Author: ASF GitHub Bot Created on: 15/Apr/20 01:10 Start Date: 15/Apr/20 01:10 Worklog Time Spent: 10m Work Description: lostluck commented on issue #11423: [BEAM-9642] Fix infinite recursion. URL: https://github.com/apache/beam/pull/11423#issuecomment-613759173 Thanks! If we could get some kind of github static check to have caught that for us that would have been great. Good thing we had a quick import after the merge though. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422474) Time Spent: 5h (was: 4h 50m) > Add SDF execution-time runners > -- > > Key: BEAM-9642 > URL: https://issues.apache.org/jira/browse/BEAM-9642 > Project: Beam > Issue Type: Sub-task > Components: sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > Adds execution-time SDF runner units to the exec package, and any unit tests > + helpers required. > This is needed to get the expanded SDF URNs to execute in the runner harness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9642) Add SDF execution-time runners
[ https://issues.apache.org/jira/browse/BEAM-9642?focusedWorklogId=422467=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422467 ] ASF GitHub Bot logged work on BEAM-9642: Author: ASF GitHub Bot Created on: 15/Apr/20 00:53 Start Date: 15/Apr/20 00:53 Worklog Time Spent: 10m Work Description: youngoli commented on issue #11423: [BEAM-9642] Fix infinite recursion. URL: https://github.com/apache/beam/pull/11423#issuecomment-613754243 R: @lostluck This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422467) Time Spent: 4h 50m (was: 4h 40m) > Add SDF execution-time runners > -- > > Key: BEAM-9642 > URL: https://issues.apache.org/jira/browse/BEAM-9642 > Project: Beam > Issue Type: Sub-task > Components: sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > Adds execution-time SDF runner units to the exec package, and any unit tests > + helpers required. > This is needed to get the expanded SDF URNs to execute in the runner harness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9642) Add SDF execution-time runners
[ https://issues.apache.org/jira/browse/BEAM-9642?focusedWorklogId=422466=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422466 ] ASF GitHub Bot logged work on BEAM-9642: Author: ASF GitHub Bot Created on: 15/Apr/20 00:53 Start Date: 15/Apr/20 00:53 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #11423: [BEAM-9642] Fix infinite recursion. URL: https://github.com/apache/beam/pull/11423 Small fix. I debated bundling this in a larger PR but I was worried I might forget. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-9746) [Go SDK] Empty side inputs causing spurious zero elements
[ https://issues.apache.org/jira/browse/BEAM-9746?focusedWorklogId=422459=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422459 ] ASF GitHub Bot logged work on BEAM-9746: Author: ASF GitHub Bot Created on: 15/Apr/20 00:33 Start Date: 15/Apr/20 00:33 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #11413: [BEAM-9746] check for 0 length copies from state URL: https://github.com/apache/beam/pull/11413 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422459) Time Spent: 3h 10m (was: 3h) > [Go SDK] Empty side inputs causing spurious zero elements > - > > Key: BEAM-9746 > URL: https://issues.apache.org/jira/browse/BEAM-9746 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > A user discovered that empty side inputs would spuriously provide a single > zero element. > The error was narrowed down to the Go SDK's state manager code copying the > stateGetResponse data wasn't checking that the original data source even had > any bytes in it, leading it in particular to interpret length prefixed data > as having 0 length, which would cause zero value elements to be generated. > Notably, this caused empty strings. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9746) [Go SDK] Empty side inputs causing spurious zero elements
[ https://issues.apache.org/jira/browse/BEAM-9746?focusedWorklogId=422453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422453 ] ASF GitHub Bot logged work on BEAM-9746: Author: ASF GitHub Bot Created on: 15/Apr/20 00:18 Start Date: 15/Apr/20 00:18 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #11413: [BEAM-9746] check for 0 length copies from state URL: https://github.com/apache/beam/pull/11413#discussion_r408509101 ## File path: sdks/go/pkg/beam/core/runtime/harness/statemgr_test.go ## @@ -258,6 +261,167 @@ func TestStateChannel(t *testing.T) { } } +// TestStateKeyReader validates ordinary Read cases +func TestStateKeyReader(t *testing.T) { + const readLen = 4 + tests := []struct { + name string + buflens []int // sizes of the buffers received on the state channel. + numReads int + closed bool // tries to read from closed reader + noGetbool // tries to read from nil get response reader + }{ + { + name: "emptyData", + buflens: []int{-1}, + numReads: 1, + }, { + name: "singleBufferSingleRead", + buflens: []int{readLen}, + numReads: 2, + }, { + name: "singleBufferMultipleReads", + buflens: []int{2 * readLen}, + numReads: 3, + }, { + name: "singleBufferShortRead", + buflens: []int{readLen - 1}, + numReads: 2, + }, { + name: "multiBuffer", + buflens: []int{readLen, readLen}, + numReads: 3, + }, { + name: "multiBuffer-short-reads", + buflens: []int{readLen - 1, readLen - 1, readLen - 2}, + numReads: 4, + }, { + name: "emptyDataFirst", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{-1, readLen, readLen}, + numReads: 4, + }, { + name: "emptyDataMid", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1, readLen}, + numReads: 5, + }, { + name: "emptyDataLast", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1}, + numReads: 3, + }, { + name: "emptyDataLast-short", + buflens: []int{3*readLen - 2, -1}, + numReads: 4, + }, { + name: "closed", + buflens: []int{-1, -1}, + numReads: 1, + closed: true, + }, { + name: "noGet", + buflens: []int{-1}, + numReads: 1, + noGet:true, + }, + } + for _, test := range tests { + t.Run(test.name, func(t *testing.T) { + ctx, cancelFn := context.WithCancel(context.Background()) + ch := { + id:"test", + requests: make(chan *fnpb.StateRequest), + responses: make(map[string]chan<- *fnpb.StateResponse), + cancelFn: cancelFn, + DoneCh:ctx.Done(), + } + + // Handle the channel behavior asynchronously. + go func() { + for i, buflen := range test.buflens { + token := []byte(strconv.Itoa(i)) + var buf []byte + if buflen >= 0 { + buf = bytes.Repeat([]byte{42}, buflen) + } + // On the last request response pair, send no token. + if i+1 == len(test.buflens) { + token = nil + } + + req := <-ch.requests + + if test.noGet { +
[jira] [Work logged] (BEAM-9746) [Go SDK] Empty side inputs causing spurious zero elements
[ https://issues.apache.org/jira/browse/BEAM-9746?focusedWorklogId=422452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422452 ] ASF GitHub Bot logged work on BEAM-9746: Author: ASF GitHub Bot Created on: 15/Apr/20 00:18 Start Date: 15/Apr/20 00:18 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #11413: [BEAM-9746] check for 0 length copies from state URL: https://github.com/apache/beam/pull/11413#discussion_r408509078 ## File path: sdks/go/pkg/beam/core/runtime/harness/statemgr_test.go ## @@ -258,6 +261,167 @@ func TestStateChannel(t *testing.T) { } } +// TestStateKeyReader validates ordinary Read cases +func TestStateKeyReader(t *testing.T) { + const readLen = 4 + tests := []struct { + name string + buflens []int // sizes of the buffers received on the state channel. + numReads int + closed bool // tries to read from closed reader + noGetbool // tries to read from nil get response reader + }{ + { + name: "emptyData", + buflens: []int{-1}, + numReads: 1, + }, { + name: "singleBufferSingleRead", + buflens: []int{readLen}, + numReads: 2, + }, { + name: "singleBufferMultipleReads", + buflens: []int{2 * readLen}, + numReads: 3, + }, { + name: "singleBufferShortRead", + buflens: []int{readLen - 1}, + numReads: 2, + }, { + name: "multiBuffer", + buflens: []int{readLen, readLen}, + numReads: 3, + }, { + name: "multiBuffer-short-reads", + buflens: []int{readLen - 1, readLen - 1, readLen - 2}, + numReads: 4, + }, { + name: "emptyDataFirst", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{-1, readLen, readLen}, + numReads: 4, + }, { + name: "emptyDataMid", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1, readLen}, + numReads: 5, + }, { + name: "emptyDataLast", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1}, + numReads: 3, + }, { + name: "emptyDataLast-short", + buflens: []int{3*readLen - 2, -1}, + numReads: 4, + }, { + name: "closed", + buflens: []int{-1, -1}, + numReads: 1, + closed: true, + }, { + name: "noGet", + buflens: []int{-1}, + numReads: 1, + noGet:true, + }, + } + for _, test := range tests { + t.Run(test.name, func(t *testing.T) { + ctx, cancelFn := context.WithCancel(context.Background()) + ch := { + id:"test", + requests: make(chan *fnpb.StateRequest), + responses: make(map[string]chan<- *fnpb.StateResponse), + cancelFn: cancelFn, + DoneCh:ctx.Done(), + } + + // Handle the channel behavior asynchronously. + go func() { + for i, buflen := range test.buflens { + token := []byte(strconv.Itoa(i)) + var buf []byte + if buflen >= 0 { + buf = bytes.Repeat([]byte{42}, buflen) + } + // On the last request response pair, send no token. + if i+1 == len(test.buflens) { + token = nil + } + + req := <-ch.requests + + if test.noGet { +
[jira] [Work logged] (BEAM-9639) Abstract bundle execution logic from stage execution logic
[ https://issues.apache.org/jira/browse/BEAM-9639?focusedWorklogId=422451=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422451 ] ASF GitHub Bot logged work on BEAM-9639: Author: ASF GitHub Bot Created on: 15/Apr/20 00:17 Start Date: 15/Apr/20 00:17 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11270: [BEAM-9639][BEAM-9608] Improvements for FnApiRunner URL: https://github.com/apache/beam/pull/11270#discussion_r408508964 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/translations.py ## @@ -75,21 +75,27 @@ IMPULSE_BUFFER = b'impulse' +# SideInputId is identified by a consumer ParDo + tag. +SideInputId = Tuple[str, str] + +DataSideInput = Dict[SideInputId, + Tuple[bytes, beam_runner_api_pb2.FunctionSpec]] + class Stage(object): """A set of Transforms that can be sent to the worker for processing.""" def __init__(self, name, # type: str transforms, # type: List[beam_runner_api_pb2.PTransform] - downstream_side_inputs=None, # type: Optional[FrozenSet[str]] + downstream_side_inputs=None, # type: Optional[Dict[str, SideInputId]] Review comment: Hm so this change breaks that, so the memory requirements would be larger. I would think that they would not be too bad, since most graphs don't have many side inputs going many places. What do you think? I'm willing to find a better solution for this, but I wonder if it's worth the extra time. The reason that this is made into a dict is to contain more information about downstream side inputs. specifically, it contains which transforms will consume the side inputs. this is used to commit the side inputs to state after they are calculated (rather than before they are consumed). This will be necessary for streaming, because side inputs will need to be added to state as they are computed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422451) Time Spent: 3h 40m (was: 3.5h) > Abstract bundle execution logic from stage execution logic > -- > > Key: BEAM-9639 > URL: https://issues.apache.org/jira/browse/BEAM-9639 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > The FnApiRunner currently works on a per-stage manner, and does not abstract > single-bundle execution much. This work item is to clearly define the code to > execute a single bundle. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9639) Abstract bundle execution logic from stage execution logic
[ https://issues.apache.org/jira/browse/BEAM-9639?focusedWorklogId=422442=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422442 ] ASF GitHub Bot logged work on BEAM-9639: Author: ASF GitHub Bot Created on: 15/Apr/20 00:13 Start Date: 15/Apr/20 00:13 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11270: [BEAM-9639][BEAM-9608] Improvements for FnApiRunner URL: https://github.com/apache/beam/pull/11270#discussion_r408507669 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/translations.py ## @@ -149,15 +155,26 @@ def no_overlap(a, b): return ( not consumer.forced_root and not self in consumer.must_follow and not self.is_runner_urn(context) and -not consumer.is_runner_urn(context) and -no_overlap(self.downstream_side_inputs, consumer.side_inputs())) +not consumer.is_runner_urn(context) and no_overlap( +set(self.downstream_side_inputs.keys()), +{i + for i, _, _ in consumer.side_inputs()})) + + def _fuse_downstream_side_inputs(self, other): +res = dict(self.downstream_side_inputs) +for si, other_si_ids in other.downstream_side_inputs.items(): + if si in res: +res[si] = union(res[si], other_si_ids) + else: +res[si] = other_si_ids +return res def fuse(self, other): # type: (Stage) -> Stage return Stage( "(%s)+(%s)" % (self.name, other.name), self.transforms + other.transforms, -union(self.downstream_side_inputs, other.downstream_side_inputs), +self._fuse_downstream_side_inputs(other), Review comment: renamed to `_get_fused_downstream_side_inputs` thoughts? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422442) Time Spent: 2h 40m (was: 2.5h) > Abstract bundle execution logic from stage execution logic > -- > > Key: BEAM-9639 > URL: https://issues.apache.org/jira/browse/BEAM-9639 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > The FnApiRunner currently works on a per-stage manner, and does not abstract > single-bundle execution much. This work item is to clearly define the code to > execute a single bundle. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9639) Abstract bundle execution logic from stage execution logic
[ https://issues.apache.org/jira/browse/BEAM-9639?focusedWorklogId=422447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422447 ] ASF GitHub Bot logged work on BEAM-9639: Author: ASF GitHub Bot Created on: 15/Apr/20 00:13 Start Date: 15/Apr/20 00:13 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11270: [BEAM-9639][BEAM-9608] Improvements for FnApiRunner URL: https://github.com/apache/beam/pull/11270#discussion_r408507914 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py ## @@ -914,14 +898,16 @@ def process_bundle(self, expected_outputs, # type: DataOutput fired_timers, # type: Mapping[Tuple[str, str], PartitionableBuffer] expected_output_timers # type: Dict[str, Dict[str, str]] + dry_run=False Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422447) Time Spent: 3.5h (was: 3h 20m) > Abstract bundle execution logic from stage execution logic > -- > > Key: BEAM-9639 > URL: https://issues.apache.org/jira/browse/BEAM-9639 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > The FnApiRunner currently works on a per-stage manner, and does not abstract > single-bundle execution much. This work item is to clearly define the code to > execute a single bundle. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9639) Abstract bundle execution logic from stage execution logic
[ https://issues.apache.org/jira/browse/BEAM-9639?focusedWorklogId=422444=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422444 ] ASF GitHub Bot logged work on BEAM-9639: Author: ASF GitHub Bot Created on: 15/Apr/20 00:13 Start Date: 15/Apr/20 00:13 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11270: [BEAM-9639][BEAM-9608] Improvements for FnApiRunner URL: https://github.com/apache/beam/pull/11270#discussion_r408507730 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/execution.py ## @@ -367,6 +413,73 @@ def _build_process_bundle_descriptor(self): state_api_service_descriptor=self.state_api_service_descriptor(), timer_api_service_descriptor=self.data_api_service_descriptor()) + def commit_output_views_to_state(self): +"""Commit bundle outputs to state to be consumed as side inputs later. + +Only the outputs that should be side inputs are committed to state. +""" +data_side_input = {} # type: DataSideInput +for pcoll, si_ids in self.stage.downstream_side_inputs.items(): + for (consumer_transform_name, tag), access_pattern in si_ids.items(): +data_side_input[consumer_transform_name, tag] = ( +translations.create_buffer_id(pcoll), access_pattern) +self.execution_context.commit_side_inputs_to_state(data_side_input) + + def extract_bundle_inputs(self): +# type: (...) -> Tuple[Dict[str, PartitionableBuffer], DataOutput] + +"""Returns maps of transform names to PCollection identifiers. + +Also mutates IO stages to point to the data ApiServiceDescriptor. + +Returns: + A tuple of (data_input, data_output) dictionaries. +`data_input` is a dictionary mapping (transform_name, output_name) to a +PCollection buffer; `data_output` is a dictionary mapping +(transform_name, output_name) to a PCollection ID. +""" +data_input = {} # type: Dict[str, PartitionableBuffer] +data_output = {} # type: DataOutput +# A mapping of {(transform_id, timer_family_id) : buffer_id} +expected_timer_output = {} # type: Dict[Tuple(str, str), str] +for transform in self.stage.transforms: + if transform.spec.urn in (bundle_processor.DATA_INPUT_URN, +bundle_processor.DATA_OUTPUT_URN): +pcoll_id = transform.spec.payload +if transform.spec.urn == bundle_processor.DATA_INPUT_URN: + coder_id = self.execution_context.data_channel_coders[only_element( + transform.outputs.values())] + coder = self.execution_context.pipeline_context.coders[ + self.execution_context.safe_coders.get(coder_id, coder_id)] + if pcoll_id == translations.IMPULSE_BUFFER: +data_input[transform.unique_name] = ListBuffer( +coder_impl=coder.get_impl()) +data_input[transform.unique_name].append(ENCODED_IMPULSE_VALUE) + else: +if pcoll_id not in self.execution_context.pcoll_buffers: + self.execution_context.pcoll_buffers[pcoll_id] = ListBuffer( + coder_impl=coder.get_impl()) +data_input[transform.unique_name] = \ + self.execution_context.pcoll_buffers[pcoll_id] +elif transform.spec.urn == bundle_processor.DATA_OUTPUT_URN: + data_output[transform.unique_name] = pcoll_id + coder_id = self.execution_context.data_channel_coders[only_element( + transform.inputs.values())] +else: + raise NotImplementedError +data_spec = beam_fn_api_pb2.RemoteGrpcPort(coder_id=coder_id) +data_api_service_descriptor = \ + self.data_api_service_descriptor() +if data_api_service_descriptor: + data_spec.api_service_descriptor.url = ( + data_api_service_descriptor.url) +transform.spec.payload = data_spec.SerializeToString() + elif transform.spec.urn in translations.PAR_DO_URNS: +for timer_family_id in payload.timer_family_specs.keys(): + expected_timer_output[(transform.unique_name, timer_family_id)] = ( + create_buffer_id(timer_family_id, 'timers')) +return data_input, data_output, expected_timer_output Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422444) Time Spent: 3h (was: 2h 50m) > Abstract bundle execution logic from stage execution logic > -- > >
[jira] [Work logged] (BEAM-9639) Abstract bundle execution logic from stage execution logic
[ https://issues.apache.org/jira/browse/BEAM-9639?focusedWorklogId=422445=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422445 ] ASF GitHub Bot logged work on BEAM-9639: Author: ASF GitHub Bot Created on: 15/Apr/20 00:13 Start Date: 15/Apr/20 00:13 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11270: [BEAM-9639][BEAM-9608] Improvements for FnApiRunner URL: https://github.com/apache/beam/pull/11270#discussion_r408507780 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/execution.py ## @@ -367,6 +413,73 @@ def _build_process_bundle_descriptor(self): state_api_service_descriptor=self.state_api_service_descriptor(), timer_api_service_descriptor=self.data_api_service_descriptor()) + def commit_output_views_to_state(self): +"""Commit bundle outputs to state to be consumed as side inputs later. + +Only the outputs that should be side inputs are committed to state. +""" +data_side_input = {} # type: DataSideInput +for pcoll, si_ids in self.stage.downstream_side_inputs.items(): + for (consumer_transform_name, tag), access_pattern in si_ids.items(): +data_side_input[consumer_transform_name, tag] = ( +translations.create_buffer_id(pcoll), access_pattern) +self.execution_context.commit_side_inputs_to_state(data_side_input) + + def extract_bundle_inputs(self): Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422445) Time Spent: 3h 10m (was: 3h) > Abstract bundle execution logic from stage execution logic > -- > > Key: BEAM-9639 > URL: https://issues.apache.org/jira/browse/BEAM-9639 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > The FnApiRunner currently works on a per-stage manner, and does not abstract > single-bundle execution much. This work item is to clearly define the code to > execute a single bundle. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9639) Abstract bundle execution logic from stage execution logic
[ https://issues.apache.org/jira/browse/BEAM-9639?focusedWorklogId=422446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422446 ] ASF GitHub Bot logged work on BEAM-9639: Author: ASF GitHub Bot Created on: 15/Apr/20 00:13 Start Date: 15/Apr/20 00:13 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11270: [BEAM-9639][BEAM-9608] Improvements for FnApiRunner URL: https://github.com/apache/beam/pull/11270#discussion_r408507854 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/execution.py ## @@ -326,8 +327,8 @@ def commit_side_inputs_to_state( data_side_input, # type: DataSideInput ): # type: (...) -> None -for (consuming_transform_id, tag), (buffer_id, func_spec) \ -in data_side_input.items(): +for (consuming_transform_id, tag), (buffer_id, Review comment: it did not : ( hehe This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422446) Time Spent: 3h 20m (was: 3h 10m) > Abstract bundle execution logic from stage execution logic > -- > > Key: BEAM-9639 > URL: https://issues.apache.org/jira/browse/BEAM-9639 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > The FnApiRunner currently works on a per-stage manner, and does not abstract > single-bundle execution much. This work item is to clearly define the code to > execute a single bundle. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9639) Abstract bundle execution logic from stage execution logic
[ https://issues.apache.org/jira/browse/BEAM-9639?focusedWorklogId=422443=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422443 ] ASF GitHub Bot logged work on BEAM-9639: Author: ASF GitHub Bot Created on: 15/Apr/20 00:13 Start Date: 15/Apr/20 00:13 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11270: [BEAM-9639][BEAM-9608] Improvements for FnApiRunner URL: https://github.com/apache/beam/pull/11270#discussion_r408507693 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/execution.py ## @@ -367,6 +413,73 @@ def _build_process_bundle_descriptor(self): state_api_service_descriptor=self.state_api_service_descriptor(), timer_api_service_descriptor=self.data_api_service_descriptor()) + def commit_output_views_to_state(self): Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422443) Time Spent: 2h 50m (was: 2h 40m) > Abstract bundle execution logic from stage execution logic > -- > > Key: BEAM-9639 > URL: https://issues.apache.org/jira/browse/BEAM-9639 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > The FnApiRunner currently works on a per-stage manner, and does not abstract > single-bundle execution much. This work item is to clearly define the code to > execute a single bundle. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9746) [Go SDK] Empty side inputs causing spurious zero elements
[ https://issues.apache.org/jira/browse/BEAM-9746?focusedWorklogId=422439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422439 ] ASF GitHub Bot logged work on BEAM-9746: Author: ASF GitHub Bot Created on: 15/Apr/20 00:12 Start Date: 15/Apr/20 00:12 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #11413: [BEAM-9746] check for 0 length copies from state URL: https://github.com/apache/beam/pull/11413#discussion_r408507507 ## File path: sdks/go/pkg/beam/core/runtime/harness/statemgr_test.go ## @@ -258,6 +261,167 @@ func TestStateChannel(t *testing.T) { } } +// TestStateKeyReader validates ordinary Read cases +func TestStateKeyReader(t *testing.T) { + const readLen = 4 + tests := []struct { + name string + buflens []int // sizes of the buffers received on the state channel. + numReads int Review comment: Consider: What kind of change to the stateKeyReader implementation would change the expected number of reads, based on how stateKeyReader gets its data? There are two kinds of unit test. What you've described is black box testing, where the testing code has no knowledge of internal details of the implementation. This is valuable when there are accessible API calls to set the unknown internal state of a struct, and in doing so, there are checkable expectations of how other API calls will behave as a result. As a rule, this is preferable, as it's resilient to implementation changes, but with a broad enough API surface, it can be very very expensive since it's simply checking the same code paths over and over and over again on different tests. Black box testing by necessity ends up having alot of redundancy. We do have black box testing in general for Beam Go, since we use the direct runner to validate large portions of the exec package in a "pipeline" context. You can find them via any of the test files that have their package name to include _test. There's also white box testing, where the internal details are known to the test implementer. These tests are an example of the latter. There is no user side API to configure the internal state of stateKeyReader.Read(), and in general for the io.Reader interface, which is how users (AKA other parts of the beam framework), will be using stateKeyReader. Since we can't configure things with a user side API, we must manipulate some other internal state, to set the initial conditions, and then from those conditions, check that we understand the code under test. Further, white box testing is the only reasonable option for such a general API like Read(). It gets passed in a buffer, and it's expected to write up to len(buffer) bytes to it, and tell you what it did. It doesn't say anything about the bytes themselves. The implementations vary dramatically depending on the purpose, and only the implementation can check that. The API in this case doesn't dictate the tests, otherwise there could be a "io.ReaderTester" that we could instead of running these. So, while numReads in general is inscrutable to the user, it's not inscrutable to us, the implementers of this test case, and the code. We have our own expectations for the code In fact, numReads a deterministic number based on the lengths of the backing buffers vs the lengths of the reads. Since it's something we can check, and know based on the initial conditions, and our understanding of the code, it's fine. numReads is a second order expectation based on the initial conditions, but it's also one that's very easy to check. To be very blunt, numReads is checking that the code behaves the way we think it behaves. In this case, had we been testing that the "nil" data buffer case required 1 read to return EOF, we wouldn't have had the bug in question. Your underlying concern about a "change state" test is valid though. Tests that require a specific implementation in order to pass, can be a nuisance. But that doesn't apply so much to second order expectations like numReads. If the number of reads changes when the implementation changes, I certainly would like to know about it, because that could be an *unexpected* side effect of the change being made. Conversely, if a refactorer believes they can change the implementation to reduce the number of calls to Read the test can expect, they can simply adjust the number, and in Test Driven Development style, use that to validate that the changes they're making accomplish the intended goal. I'm on the side of I'd rather have tests fail if a dramatic change in behavior occurs, and the change author need to adjust them since then it verifies that they ran the tests, and that the tests are actually working. Having
[jira] [Work logged] (BEAM-9639) Abstract bundle execution logic from stage execution logic
[ https://issues.apache.org/jira/browse/BEAM-9639?focusedWorklogId=422441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422441 ] ASF GitHub Bot logged work on BEAM-9639: Author: ASF GitHub Bot Created on: 15/Apr/20 00:12 Start Date: 15/Apr/20 00:12 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11270: [BEAM-9639][BEAM-9608] Improvements for FnApiRunner URL: https://github.com/apache/beam/pull/11270#discussion_r408507610 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/translations.py ## @@ -75,21 +75,27 @@ IMPULSE_BUFFER = b'impulse' +# SideInputId is identified by a consumer ParDo + tag. +SideInputId = Tuple[str, str] + +DataSideInput = Dict[SideInputId, Review comment: The value is a tuple with the encoded data. I've moved these to translations.py, and updated the comment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422441) Time Spent: 2.5h (was: 2h 20m) > Abstract bundle execution logic from stage execution logic > -- > > Key: BEAM-9639 > URL: https://issues.apache.org/jira/browse/BEAM-9639 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > The FnApiRunner currently works on a per-stage manner, and does not abstract > single-bundle execution much. This work item is to clearly define the code to > execute a single bundle. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9639) Abstract bundle execution logic from stage execution logic
[ https://issues.apache.org/jira/browse/BEAM-9639?focusedWorklogId=422440=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422440 ] ASF GitHub Bot logged work on BEAM-9639: Author: ASF GitHub Bot Created on: 15/Apr/20 00:12 Start Date: 15/Apr/20 00:12 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11270: [BEAM-9639][BEAM-9608] Improvements for FnApiRunner URL: https://github.com/apache/beam/pull/11270#discussion_r408507558 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/translations.py ## @@ -149,15 +155,26 @@ def no_overlap(a, b): return ( not consumer.forced_root and not self in consumer.must_follow and not self.is_runner_urn(context) and -not consumer.is_runner_urn(context) and -no_overlap(self.downstream_side_inputs, consumer.side_inputs())) +not consumer.is_runner_urn(context) and no_overlap( +set(self.downstream_side_inputs.keys()), +{i + for i, _, _ in consumer.side_inputs()})) + + def _fuse_downstream_side_inputs(self, other): +res = dict(self.downstream_side_inputs) +for si, other_si_ids in other.downstream_side_inputs.items(): + if si in res: +res[si] = union(res[si], other_si_ids) Review comment: woah this is a bug. downstream side input is a dictionary mapping to dictionaries. Dict[Output Pcollection, Dict[Side input ID, Access pattern]] Where Side input ID is Tuple[consumer ptransform, input index]. Added appropriate annotations, and fixed the bug. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422440) Time Spent: 2h 20m (was: 2h 10m) > Abstract bundle execution logic from stage execution logic > -- > > Key: BEAM-9639 > URL: https://issues.apache.org/jira/browse/BEAM-9639 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > The FnApiRunner currently works on a per-stage manner, and does not abstract > single-bundle execution much. This work item is to clearly define the code to > execute a single bundle. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=422436=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422436 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 14/Apr/20 23:57 Start Date: 14/Apr/20 23:57 Worklog Time Spent: 10m Work Description: jaketf commented on issue #11339: [BEAM-9468] [WIP] Fhir io URL: https://github.com/apache/beam/pull/11339#issuecomment-611125298 TODOs: - [x] ValueProvider support - [x] Add example usage to javadoc - [x] Unit test for FhirIO dead letter handling - [x] Migrate ITs to parameterized tests to DRY up ITs against different FHIR versions (improves maintainability) - [ ] Add IT for FhirIO.Read - [x] implement scaffolding for test (currently always passes because initial PCollection is empty) - [ ] Needs convenience method to "read all resource IDs from a FHIR store to populate initial PCollection of resource IDs - [ ] Benchmark / load test the FhirIO.Import This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422436) Time Spent: 27h 50m (was: 27h 40m) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 27h 50m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=422432=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422432 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 14/Apr/20 23:56 Start Date: 14/Apr/20 23:56 Worklog Time Spent: 10m Work Description: jaketf commented on issue #11339: [BEAM-9468] [WIP] Fhir io URL: https://github.com/apache/beam/pull/11339#issuecomment-611125298 TODOs: - [x] ValueProvider support - [x] Add example usage to javadoc - [x] Unit test for FhirIO dead letter handling - [x] Migrate ITs to parameterized tests to DRY up ITs against different FHIR versions (improves maintainability) - [ ] Add IT for FhirIO.Read - [ ] Benchmark / load test the FhirIO.Import This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422432) Time Spent: 27h 20m (was: 27h 10m) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 27h 20m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9737) beam_PostCommit_Website_Test failing
[ https://issues.apache.org/jira/browse/BEAM-9737?focusedWorklogId=422435=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422435 ] ASF GitHub Bot logged work on BEAM-9737: Author: ASF GitHub Bot Created on: 14/Apr/20 23:56 Start Date: 14/Apr/20 23:56 Worklog Time Spent: 10m Work Description: udim commented on issue #11386: [BEAM-9737] Fix website postcommit URL: https://github.com/apache/beam/pull/11386#issuecomment-613738430 Run Full Website Test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422435) Time Spent: 2h 20m (was: 2h 10m) > beam_PostCommit_Website_Test failing > > > Key: BEAM-9737 > URL: https://issues.apache.org/jira/browse/BEAM-9737 > Project: Beam > Issue Type: Bug > Components: test-failures, website >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > Also failing: beam_PostCommit_Website_Publish (same failure) > {code} > > Task :website:buildLocalWebsite > `/` is not writable. > Bundler will use `/tmp/bundler/home/unknown' as your home directory > temporarily. > Configuration file: /repo/website/_config.yml > Configuration file: /repo/website/_config_test.yml > Configuration file: /tmp/_config_branch_repo.yml > Source: /repo/website/src >Destination: generated-local-content > Incremental build: enabled > Generating... > jekyll 3.6.3 | Error: Permission denied @ dir_s_mkdir - > /repo/build/website/generated-local-content/security > {code} > https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_Website_Test/3676/console > Possible culprit: https://github.com/apache/beam/pull/11232/files -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=422433=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422433 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 14/Apr/20 23:56 Start Date: 14/Apr/20 23:56 Worklog Time Spent: 10m Work Description: jaketf commented on issue #11339: [BEAM-9468] [WIP] Fhir io URL: https://github.com/apache/beam/pull/11339#issuecomment-611125298 TODOs: - [x] ValueProvider support - [x] Add example usage to javadoc - [x] Unit test for FhirIO dead letter handling - [x] Migrate ITs to parameterized tests to DRY up ITs against different FHIR versions (improves maintainability) - [x] Add IT for FhirIO.Read - [ ] Benchmark / load test the FhirIO.Import This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422433) Time Spent: 27.5h (was: 27h 20m) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 27.5h > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=422434=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422434 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 14/Apr/20 23:56 Start Date: 14/Apr/20 23:56 Worklog Time Spent: 10m Work Description: jaketf commented on issue #11339: [BEAM-9468] [WIP] Fhir io URL: https://github.com/apache/beam/pull/11339#issuecomment-611125298 TODOs: - [x] ValueProvider support - [x] Add example usage to javadoc - [x] Unit test for FhirIO dead letter handling - [x] Migrate ITs to parameterized tests to DRY up ITs against different FHIR versions (improves maintainability) - [ ] Add IT for FhirIO.Read - [ ] Benchmark / load test the FhirIO.Import This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422434) Time Spent: 27h 40m (was: 27.5h) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 27h 40m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=422430=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422430 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 14/Apr/20 23:54 Start Date: 14/Apr/20 23:54 Worklog Time Spent: 10m Work Description: jaketf commented on issue #11339: [BEAM-9468] [WIP] Fhir io URL: https://github.com/apache/beam/pull/11339#issuecomment-611125298 TODOs: - [x] ValueProvider support - [x] Add example usage to javadoc - [x] Unit test for FhirIO dead letter handling - [ ] Add IT for FhirIO.Read - [ ] Migrate ITs to parameterized tests to DRY up ITs against different FHIR versions (improves maintainability) - [ ] Benchmark / load test the FhirIO.Import This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422430) Time Spent: 27h (was: 26h 50m) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 27h > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=422431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422431 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 14/Apr/20 23:54 Start Date: 14/Apr/20 23:54 Worklog Time Spent: 10m Work Description: jaketf commented on issue #11339: [BEAM-9468] [WIP] Fhir io URL: https://github.com/apache/beam/pull/11339#issuecomment-611125298 TODOs: - [x] ValueProvider support - [x] Add example usage to javadoc - [x] Unit test for FhirIO dead letter handling - [ ] Migrate ITs to parameterized tests to DRY up ITs against different FHIR versions (improves maintainability) - [ ] Add IT for FhirIO.Read - [ ] Benchmark / load test the FhirIO.Import This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422431) Time Spent: 27h 10m (was: 27h) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 27h 10m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9737) beam_PostCommit_Website_Test failing
[ https://issues.apache.org/jira/browse/BEAM-9737?focusedWorklogId=422429=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422429 ] ASF GitHub Bot logged work on BEAM-9737: Author: ASF GitHub Bot Created on: 14/Apr/20 23:53 Start Date: 14/Apr/20 23:53 Worklog Time Spent: 10m Work Description: udim commented on issue #11386: [BEAM-9737] Fix website postcommit URL: https://github.com/apache/beam/pull/11386#issuecomment-613737579 Run Full Website Test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422429) Time Spent: 2h 10m (was: 2h) > beam_PostCommit_Website_Test failing > > > Key: BEAM-9737 > URL: https://issues.apache.org/jira/browse/BEAM-9737 > Project: Beam > Issue Type: Bug > Components: test-failures, website >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > Also failing: beam_PostCommit_Website_Publish (same failure) > {code} > > Task :website:buildLocalWebsite > `/` is not writable. > Bundler will use `/tmp/bundler/home/unknown' as your home directory > temporarily. > Configuration file: /repo/website/_config.yml > Configuration file: /repo/website/_config_test.yml > Configuration file: /tmp/_config_branch_repo.yml > Source: /repo/website/src >Destination: generated-local-content > Incremental build: enabled > Generating... > jekyll 3.6.3 | Error: Permission denied @ dir_s_mkdir - > /repo/build/website/generated-local-content/security > {code} > https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_Website_Test/3676/console > Possible culprit: https://github.com/apache/beam/pull/11232/files -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9757) Flake in JavaPortabilityApi precommit
[ https://issues.apache.org/jira/browse/BEAM-9757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9757: --- Status: Open (was: Triage Needed) > Flake in JavaPortabilityApi precommit > - > > Key: BEAM-9757 > URL: https://issues.apache.org/jira/browse/BEAM-9757 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Boyuan Zhang >Priority: Major > > org.apache.beam.runners.dataflow.worker.util.MemoryMonitorTest.detectGCThrashing > Error Message > java.lang.AssertionError > Stacktrace > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:87) > at org.junit.Assert.assertTrue(Assert.java:42) > at org.junit.Assert.assertTrue(Assert.java:53) > at > org.apache.beam.runners.dataflow.worker.util.MemoryMonitorTest.detectGCThrashing(MemoryMonitorTest.java:93) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > Standard Error > Apr 13, 2020 11:38:54 PM > org.apache.beam.runners.dataflow.worker.util.MemoryMonitor run > INFO: Memory is used/total/max = 189/1511/1820 MB, GC last/max = 0.00/0.00 %, > #pushbacks=0, gc thrashing=false > Apr 13, 2020 11:38:56 PM > org.apache.beam.runners.dataflow.worker.util.MemoryMonitor dumpHeap > WARNING: Heap dumped to > /tmp/junit4868537985750235077/junit1782833439507703448/heap_dump.hprof > Apr 13, 2020 11:38:58 PM > org.apache.beam.runners.dataflow.worker.util.MemoryMonitor run > INFO: Memory is used/total/max = 210/1511/1820 MB, GC last/max = 0.00/0.00 %, > #pushbacks=0, gc thrashing=false > Apr 13, 2020 11:38:58 PM > org.apache.beam.runners.dataflow.worker.util.MemoryMonitor dumpHeap > WARNING: Heap dumped to > /tmp/junit5110608932509920991/junit3762538188841792209/heap_dump.hprof > Apr 13, 2020 11:38:59 PM > org.apache.beam.runners.dataflow.worker.util.MemoryMonitor run > INFO: Memory is used/total/max = 231/1511/1820 MB, GC last/max = 0.00/0.00 %, > #pushbacks=0, gc thrashing=false > Apr 13, 2020 11:38:59 PM > org.apache.beam.runners.dataflow.worker.util.MemoryMonitor dumpHeap > WARNING: Heap dumped to > /tmp/junit419632844612311154/junit923609148290982010/heap_dump.hprof > Apr 13, 2020 11:38:59 PM > org.apache.beam.runners.dataflow.worker.util.MemoryMonitor > tryUploadHeapDumpIfItExists > INFO: Looking for heap dump at > /tmp/junit419632844612311154/junit923609148290982010/heap_dump.hprof > Apr 13, 2020 11:38:59 PM > org.apache.beam.runners.dataflow.worker.util.MemoryMonitor > tryUploadHeapDumpIfItExists > WARNING: Heap dump > /tmp/junit419632844612311154/junit923609148290982010/heap_dump.hprof > detected, attempting to upload to GCS > Apr 13, 2020 11:39:00 PM > org.apache.beam.runners.dataflow.worker.util.MemoryMonitor > tryUploadHeapDumpIfItExists > WARNING: Heap dump > /tmp/junit419632844612311154/junit923609148290982010/heap_dump.hprof uploaded > to > /tmp/junit419632844612311154/junit2761624807817345489/heap_dumpa8934b66-834c-4c26-8b05-47c995150ef8.hprof > Apr 13, 2020 11:39:00 PM > org.apache.beam.runners.dataflow.worker.util.MemoryMonitor > tryUploadHeapDumpIfItExists > INFO: Deleted local heap dump > /tmp/junit419632844612311154/junit923609148290982010/heap_dump.hprof > Apr 13, 2020 11:39:00 PM > org.apache.beam.runners.dataflow.worker.util.MemoryMonitor run > INFO: Memory is used/total/max = 247/1511/1820 MB, GC last/max = > 120.00/120.00 %, #pushbacks=0, gc thrashing=false > Apr 13, 2020 11:39:00 PM > org.apache.beam.runners.dataflow.worker.util.MemoryMonitor waitForResources > INFO: Waiting for resources for Test2. Memory is used/total/max = > 247/1511/1820 MB, GC last/max = 100.00/120.00 %, #pushbacks=1, gc > thrashing=true > Apr 13, 2020 11:39:00 PM > org.apache.beam.runners.dataflow.worker.util.MemoryMonitor waitForResources > INFO: Resources granted
[jira] [Work logged] (BEAM-9756) beam_PostCommit_Java_Nexmark (non-Dataflow) failing
[ https://issues.apache.org/jira/browse/BEAM-9756?focusedWorklogId=422422=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422422 ] ASF GitHub Bot logged work on BEAM-9756: Author: ASF GitHub Bot Created on: 14/Apr/20 23:17 Start Date: 14/Apr/20 23:17 Worklog Time Spent: 10m Work Description: ibzib commented on issue #11417: [BEAM-9756] Nexmark: only use --region in Dataflow. URL: https://github.com/apache/beam/pull/11417#issuecomment-613727368 Run Dataflow Runner Nexmark Tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422422) Time Spent: 1h 10m (was: 1h) > beam_PostCommit_Java_Nexmark (non-Dataflow) failing > --- > > Key: BEAM-9756 > URL: https://issues.apache.org/jira/browse/BEAM-9756 > Project: Beam > Issue Type: Bug > Components: test-failures, testing-nexmark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > 12:02:24 Exception in thread "main" java.lang.IllegalArgumentException: Class > interface org.apache.beam.sdk.nexmark.NexmarkOptions missing a property named > 'region'. > 12:02:24 at > org.apache.beam.sdk.options.PipelineOptionsFactory.parseObjects(PipelineOptionsFactory.java:1625) > 12:02:24 at > org.apache.beam.sdk.options.PipelineOptionsFactory.access$400(PipelineOptionsFactory.java:115) > 12:02:24 at > org.apache.beam.sdk.options.PipelineOptionsFactory$Builder.as(PipelineOptionsFactory.java:298) > 12:02:24 at org.apache.beam.sdk.nexmark.Main.runAll(Main.java:98) > 12:02:24 at org.apache.beam.sdk.nexmark.Main.main(Main.java:415) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5605) Support Portable SplittableDoFn for batch
[ https://issues.apache.org/jira/browse/BEAM-5605?focusedWorklogId=422418=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422418 ] ASF GitHub Bot logged work on BEAM-5605: Author: ASF GitHub Bot Created on: 14/Apr/20 23:10 Start Date: 14/Apr/20 23:10 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11414: [BEAM-5605, BEAM-2939] Add support for FnApiDoFnRunner to handle split calls. URL: https://github.com/apache/beam/pull/11414#discussion_r408487984 ## File path: sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java ## @@ -761,72 +795,111 @@ public void processElementForElementAndRestriction( continue; } -// Make sure to get the output watermark before we split to ensure that the lower bound -// applies to both the primary and residual. -KV watermarkAndState = -currentWatermarkEstimator.getWatermarkAndState(); -SplitResult result = currentTracker.trySplit(0); +// Attempt to checkpoint the current restriction. +HandlesSplits.SplitResult splitResult = Review comment: The DoFn returned `ProcessContinuation#resume` and not `ProcessContinuation#stop`. The current restriction within the restriction tracker represents the entire restriction so splitting it provides the remainder and should update the restriction tracker so that the primary is in a done state so that `checkDone` down below can succeed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422418) Time Spent: 18h 10m (was: 18h) > Support Portable SplittableDoFn for batch > - > > Key: BEAM-5605 > URL: https://issues.apache.org/jira/browse/BEAM-5605 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Scott Wegner >Assignee: Luke Cwik >Priority: Major > Labels: portability > Time Spent: 18h 10m > Remaining Estimate: 0h > > Roll-up item tracking work towards supporting portable SplittableDoFn for > batch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8603) Add Python SqlTransform MVP
[ https://issues.apache.org/jira/browse/BEAM-8603?focusedWorklogId=422417=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422417 ] ASF GitHub Bot logged work on BEAM-8603: Author: ASF GitHub Bot Created on: 14/Apr/20 23:01 Start Date: 14/Apr/20 23:01 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #10055: [BEAM-8603] Add Python SqlTransform URL: https://github.com/apache/beam/pull/10055#issuecomment-613722634 cc: @Ardagan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422417) Time Spent: 9h 10m (was: 9h) > Add Python SqlTransform MVP > --- > > Key: BEAM-8603 > URL: https://issues.apache.org/jira/browse/BEAM-8603 > Project: Beam > Issue Type: Improvement > Components: dsl-sql, sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 9h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9748) Add Reshuffle.ForSequentiallyGeneratedInput transform
[ https://issues.apache.org/jira/browse/BEAM-9748?focusedWorklogId=422415=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422415 ] ASF GitHub Bot logged work on BEAM-9748: Author: ASF GitHub Bot Created on: 14/Apr/20 23:00 Start Date: 14/Apr/20 23:00 Worklog Time Spent: 10m Work Description: jkff commented on pull request #11406: [BEAM-9748] Move Reparallelize transform to Reshuffle URL: https://github.com/apache/beam/pull/11406#discussion_r408484775 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reshuffle.java ## @@ -65,6 +66,16 @@ private Reshuffle() {} return new ViaRandomKey<>(); } + @Experimental Review comment: Annotation should probably be below the comment. Also suggest rephrasing a bit, to explain when one should use one or the other, and what are the consequences of choosing wrong. I'm not sure that "sequentially generated" is clear enough to a casual user. Maybe something like this: ``` Materializes the input and prepares it to be consumed in a highly parallel fashion. This version is tailored to the case when input was produced in an extremely sequential way - typically by a ParDo that emits millions of outputs _per input element_, e.g., executing a large database query or a large simulation and emitting all of their results. Internally, this version first materializes the input at a moderate cost before reshuffling it internally using viaRandomKey(), making the reshuffling itself significantly cheaper in these extreme cases on some runners. Use this over viaRandomKey() only if your benchmarks show an improvement. ``` And mention this at the class-level documentation of Reshuffle for visibility. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422415) Time Spent: 1h 50m (was: 1h 40m) > Add Reshuffle.ForSequentiallyGeneratedInput transform > - > > Key: BEAM-9748 > URL: https://issues.apache.org/jira/browse/BEAM-9748 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 1h 50m > Remaining Estimate: 0h > > Some DoFn based IOs like JdbcIO and RedisIO rely on a different approach to > Reparallelize outputs using a combination of a an empty PCollectionView to > force materialization and Reshuffle.viaRandomkey to reparallelize a > PCollection. This issue extracts this transform and expose it as part of the > Reshuffle to avoid repeating the code for transforms (notably IOs) that > produce lots of sequentially generated data where and benefit of this > alternative approach to perform better reparallelization of its output. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9623) Add support for configuring TableProviders in Python SqlTransform
[ https://issues.apache.org/jira/browse/BEAM-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette updated BEAM-9623: Summary: Add support for configuring TableProviders in Python SqlTransform (was: Add support for TableProviders in Python SqlTransform) > Add support for configuring TableProviders in Python SqlTransform > - > > Key: BEAM-9623 > URL: https://issues.apache.org/jira/browse/BEAM-9623 > Project: Beam > Issue Type: Improvement > Components: dsl-sql, sdk-py-core >Reporter: Brian Hulette >Priority: Major > > It should be possible to use e.g. DataCatalogTableProvider and access > BigQuery, PubSub, and GCS in queries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9699) Add ability to use ZetaSQL in Python SqlTransform
[ https://issues.apache.org/jira/browse/BEAM-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette updated BEAM-9699: Status: Open (was: Triage Needed) > Add ability to use ZetaSQL in Python SqlTransform > - > > Key: BEAM-9699 > URL: https://issues.apache.org/jira/browse/BEAM-9699 > Project: Beam > Issue Type: Improvement > Components: dsl-sql, sdk-py-core >Reporter: Brian Hulette >Priority: Major > > This may just work when the [plannerName pipeline > option|https://github.com/apache/beam/blob/1e52e4298085eda8e88e1215c7a73d52658b31f1/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlPipelineOptions.java#L29] > is exposed to Python -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8872) Add support for splitting at fractions > 0 to org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker
[ https://issues.apache.org/jira/browse/BEAM-8872?focusedWorklogId=422414=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422414 ] ASF GitHub Bot logged work on BEAM-8872: Author: ASF GitHub Bot Created on: 14/Apr/20 22:57 Start Date: 14/Apr/20 22:57 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #11418: [BEAM-8872] Support split at fraction for OffsetRangeTracker URL: https://github.com/apache/beam/pull/11418#discussion_r408483552 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java ## @@ -49,13 +50,22 @@ public OffsetRange currentRestriction() { @Override public SplitResult trySplit(double fractionOfRemainder) { -// TODO(BEAM-8872): Add support for splitting off a fixed amount of work for this restriction -// instead of only supporting checkpointing. - -checkState( -lastClaimedOffset != null, "Can't checkpoint before any offset was successfully claimed"); -OffsetRange res = new OffsetRange(lastClaimedOffset + 1, range.getTo()); -this.range = new OffsetRange(range.getFrom(), lastClaimedOffset + 1); +checkState(lastClaimedOffset != null, "Can't split before any offset was successfully claimed"); +// No more split should be performed if checkpoint has happened. +if (checkpointed) { + return null; +} +Long splitPos = +lastClaimedOffset ++ Math.max(1L, (long) ((range.getTo() - lastClaimedOffset) * fractionOfRemainder)); +if (splitPos >= range.getTo()) { + return null; +} +if (fractionOfRemainder == 0.0) { + checkpointed = true; Review comment: Just return early since we know there is no more split after checkpointing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422414) Time Spent: 1h (was: 50m) > Add support for splitting at fractions > 0 to > org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker > -- > > Key: BEAM-8872 > URL: https://issues.apache.org/jira/browse/BEAM-8872 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker only > supports checkpointing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8872) Add support for splitting at fractions > 0 to org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker
[ https://issues.apache.org/jira/browse/BEAM-8872?focusedWorklogId=422413=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422413 ] ASF GitHub Bot logged work on BEAM-8872: Author: ASF GitHub Bot Created on: 14/Apr/20 22:56 Start Date: 14/Apr/20 22:56 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #11418: [BEAM-8872] Support split at fraction for OffsetRangeTracker URL: https://github.com/apache/beam/pull/11418#discussion_r408483381 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java ## @@ -49,13 +50,22 @@ public OffsetRange currentRestriction() { @Override public SplitResult trySplit(double fractionOfRemainder) { -// TODO(BEAM-8872): Add support for splitting off a fixed amount of work for this restriction -// instead of only supporting checkpointing. - -checkState( -lastClaimedOffset != null, "Can't checkpoint before any offset was successfully claimed"); -OffsetRange res = new OffsetRange(lastClaimedOffset + 1, range.getTo()); -this.range = new OffsetRange(range.getFrom(), lastClaimedOffset + 1); +checkState(lastClaimedOffset != null, "Can't split before any offset was successfully claimed"); Review comment: Allowing split before first claiming makes sense to me. Python has already allowed that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422413) Time Spent: 50m (was: 40m) > Add support for splitting at fractions > 0 to > org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker > -- > > Key: BEAM-8872 > URL: https://issues.apache.org/jira/browse/BEAM-8872 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker only > supports checkpointing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9746) [Go SDK] Empty side inputs causing spurious zero elements
[ https://issues.apache.org/jira/browse/BEAM-9746?focusedWorklogId=422411=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422411 ] ASF GitHub Bot logged work on BEAM-9746: Author: ASF GitHub Bot Created on: 14/Apr/20 22:55 Start Date: 14/Apr/20 22:55 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #11413: [BEAM-9746] check for 0 length copies from state URL: https://github.com/apache/beam/pull/11413#discussion_r408481881 ## File path: sdks/go/pkg/beam/core/runtime/harness/statemgr_test.go ## @@ -258,6 +261,167 @@ func TestStateChannel(t *testing.T) { } } +// TestStateKeyReader validates ordinary Read cases +func TestStateKeyReader(t *testing.T) { + const readLen = 4 + tests := []struct { + name string + buflens []int // sizes of the buffers received on the state channel. + numReads int Review comment: Is the number of reads something that needs to be predictable to whatever code uses the reader? To me it seems like one of those implementation details that doesn't need to be unit tested because it's invisible to the caller. The reason this stood out to me is that the `numReads` in all the test cases below seem a bit inscrutable, and it would cause these tests to fail if the implementation of the `stateKeyReader` was changed slightly, which is just an annoyance unless it would actually break code. On the other hand, if the number of reads _is_ a detail that users should know, and correctness does rely on it staying consistent, that seems like it should be explicitly documented on `stateKeyReader.Read`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422411) > [Go SDK] Empty side inputs causing spurious zero elements > - > > Key: BEAM-9746 > URL: https://issues.apache.org/jira/browse/BEAM-9746 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > A user discovered that empty side inputs would spuriously provide a single > zero element. > The error was narrowed down to the Go SDK's state manager code copying the > stateGetResponse data wasn't checking that the original data source even had > any bytes in it, leading it in particular to interpret length prefixed data > as having 0 length, which would cause zero value elements to be generated. > Notably, this caused empty strings. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9746) [Go SDK] Empty side inputs causing spurious zero elements
[ https://issues.apache.org/jira/browse/BEAM-9746?focusedWorklogId=422410=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422410 ] ASF GitHub Bot logged work on BEAM-9746: Author: ASF GitHub Bot Created on: 14/Apr/20 22:55 Start Date: 14/Apr/20 22:55 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #11413: [BEAM-9746] check for 0 length copies from state URL: https://github.com/apache/beam/pull/11413#discussion_r408477833 ## File path: sdks/go/pkg/beam/core/runtime/harness/statemgr_test.go ## @@ -258,6 +261,167 @@ func TestStateChannel(t *testing.T) { } } +// TestStateKeyReader validates ordinary Read cases +func TestStateKeyReader(t *testing.T) { + const readLen = 4 + tests := []struct { + name string + buflens []int // sizes of the buffers received on the state channel. + numReads int + closed bool // tries to read from closed reader + noGetbool // tries to read from nil get response reader + }{ + { + name: "emptyData", + buflens: []int{-1}, + numReads: 1, + }, { + name: "singleBufferSingleRead", + buflens: []int{readLen}, + numReads: 2, + }, { + name: "singleBufferMultipleReads", + buflens: []int{2 * readLen}, + numReads: 3, + }, { + name: "singleBufferShortRead", + buflens: []int{readLen - 1}, + numReads: 2, + }, { + name: "multiBuffer", + buflens: []int{readLen, readLen}, + numReads: 3, + }, { + name: "multiBuffer-short-reads", + buflens: []int{readLen - 1, readLen - 1, readLen - 2}, + numReads: 4, + }, { + name: "emptyDataFirst", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{-1, readLen, readLen}, + numReads: 4, + }, { + name: "emptyDataMid", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1, readLen}, + numReads: 5, + }, { + name: "emptyDataLast", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1}, + numReads: 3, + }, { + name: "emptyDataLast-short", + buflens: []int{3*readLen - 2, -1}, + numReads: 4, + }, { + name: "closed", + buflens: []int{-1, -1}, + numReads: 1, + closed: true, + }, { + name: "noGet", + buflens: []int{-1}, + numReads: 1, + noGet:true, + }, + } + for _, test := range tests { + t.Run(test.name, func(t *testing.T) { + ctx, cancelFn := context.WithCancel(context.Background()) + ch := { + id:"test", + requests: make(chan *fnpb.StateRequest), + responses: make(map[string]chan<- *fnpb.StateResponse), + cancelFn: cancelFn, + DoneCh:ctx.Done(), + } + + // Handle the channel behavior asynchronously. + go func() { + for i, buflen := range test.buflens { + token := []byte(strconv.Itoa(i)) + var buf []byte + if buflen >= 0 { + buf = bytes.Repeat([]byte{42}, buflen) + } + // On the last request response pair, send no token. + if i+1 == len(test.buflens) { + token = nil + } + + req := <-ch.requests + + if test.noGet { +
[jira] [Work logged] (BEAM-9746) [Go SDK] Empty side inputs causing spurious zero elements
[ https://issues.apache.org/jira/browse/BEAM-9746?focusedWorklogId=422409=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422409 ] ASF GitHub Bot logged work on BEAM-9746: Author: ASF GitHub Bot Created on: 14/Apr/20 22:55 Start Date: 14/Apr/20 22:55 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #11413: [BEAM-9746] check for 0 length copies from state URL: https://github.com/apache/beam/pull/11413#discussion_r408477556 ## File path: sdks/go/pkg/beam/core/runtime/harness/statemgr_test.go ## @@ -258,6 +261,167 @@ func TestStateChannel(t *testing.T) { } } +// TestStateKeyReader validates ordinary Read cases +func TestStateKeyReader(t *testing.T) { + const readLen = 4 + tests := []struct { + name string + buflens []int // sizes of the buffers received on the state channel. + numReads int + closed bool // tries to read from closed reader + noGetbool // tries to read from nil get response reader + }{ + { + name: "emptyData", + buflens: []int{-1}, + numReads: 1, + }, { + name: "singleBufferSingleRead", + buflens: []int{readLen}, + numReads: 2, + }, { + name: "singleBufferMultipleReads", + buflens: []int{2 * readLen}, + numReads: 3, + }, { + name: "singleBufferShortRead", + buflens: []int{readLen - 1}, + numReads: 2, + }, { + name: "multiBuffer", + buflens: []int{readLen, readLen}, + numReads: 3, + }, { + name: "multiBuffer-short-reads", + buflens: []int{readLen - 1, readLen - 1, readLen - 2}, + numReads: 4, + }, { + name: "emptyDataFirst", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{-1, readLen, readLen}, + numReads: 4, + }, { + name: "emptyDataMid", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1, readLen}, + numReads: 5, + }, { + name: "emptyDataLast", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1}, + numReads: 3, + }, { + name: "emptyDataLast-short", + buflens: []int{3*readLen - 2, -1}, + numReads: 4, + }, { + name: "closed", + buflens: []int{-1, -1}, + numReads: 1, + closed: true, + }, { + name: "noGet", + buflens: []int{-1}, + numReads: 1, + noGet:true, + }, + } + for _, test := range tests { + t.Run(test.name, func(t *testing.T) { + ctx, cancelFn := context.WithCancel(context.Background()) + ch := { + id:"test", + requests: make(chan *fnpb.StateRequest), + responses: make(map[string]chan<- *fnpb.StateResponse), + cancelFn: cancelFn, + DoneCh:ctx.Done(), + } + + // Handle the channel behavior asynchronously. + go func() { + for i, buflen := range test.buflens { + token := []byte(strconv.Itoa(i)) + var buf []byte + if buflen >= 0 { + buf = bytes.Repeat([]byte{42}, buflen) + } + // On the last request response pair, send no token. + if i+1 == len(test.buflens) { + token = nil + } + + req := <-ch.requests + + if test.noGet { +
[jira] [Work logged] (BEAM-5605) Support Portable SplittableDoFn for batch
[ https://issues.apache.org/jira/browse/BEAM-5605?focusedWorklogId=422408=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422408 ] ASF GitHub Bot logged work on BEAM-5605: Author: ASF GitHub Bot Created on: 14/Apr/20 22:53 Start Date: 14/Apr/20 22:53 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #11414: [BEAM-5605, BEAM-2939] Add support for FnApiDoFnRunner to handle split calls. URL: https://github.com/apache/beam/pull/11414#discussion_r408482142 ## File path: sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java ## @@ -761,72 +795,111 @@ public void processElementForElementAndRestriction( continue; } -// Make sure to get the output watermark before we split to ensure that the lower bound -// applies to both the primary and residual. -KV watermarkAndState = -currentWatermarkEstimator.getWatermarkAndState(); -SplitResult result = currentTracker.trySplit(0); +// Attempt to checkpoint the current restriction. +HandlesSplits.SplitResult splitResult = Review comment: Why do we want to checkpoint after processElement finishing? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422408) Time Spent: 18h (was: 17h 50m) > Support Portable SplittableDoFn for batch > - > > Key: BEAM-5605 > URL: https://issues.apache.org/jira/browse/BEAM-5605 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Scott Wegner >Assignee: Luke Cwik >Priority: Major > Labels: portability > Time Spent: 18h > Remaining Estimate: 0h > > Roll-up item tracking work towards supporting portable SplittableDoFn for > batch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8603) Add Python SqlTransform MVP
[ https://issues.apache.org/jira/browse/BEAM-8603?focusedWorklogId=422407=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422407 ] ASF GitHub Bot logged work on BEAM-8603: Author: ASF GitHub Bot Created on: 14/Apr/20 22:53 Start Date: 14/Apr/20 22:53 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #10055: [BEAM-8603] Add Python SqlTransform URL: https://github.com/apache/beam/pull/10055#issuecomment-613720045 It looks like PreCommit failures are just flakes or persistent failures at this point. I think I've addressed all of the PR comments, I just need an answer on two things: - @robertwb: Is it acceptable to skip sql_test for the default runner for now, since it doesn't support java_artifacts? - @ihji: Are the groovy changes ok given that they're outside of the `.each` block? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422407) Time Spent: 9h (was: 8h 50m) > Add Python SqlTransform MVP > --- > > Key: BEAM-8603 > URL: https://issues.apache.org/jira/browse/BEAM-8603 > Project: Beam > Issue Type: Improvement > Components: dsl-sql, sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 9h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9746) [Go SDK] Empty side inputs causing spurious zero elements
[ https://issues.apache.org/jira/browse/BEAM-9746?focusedWorklogId=422403=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422403 ] ASF GitHub Bot logged work on BEAM-9746: Author: ASF GitHub Bot Created on: 14/Apr/20 22:51 Start Date: 14/Apr/20 22:51 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #11413: [BEAM-9746] check for 0 length copies from state URL: https://github.com/apache/beam/pull/11413#discussion_r408481474 ## File path: sdks/go/pkg/beam/core/runtime/harness/statemgr_test.go ## @@ -258,6 +260,166 @@ func TestStateChannel(t *testing.T) { } } +// TestStateKeyReader validates ordinary Read cases +func TestStateKeyReader(t *testing.T) { + const readLen = 4 + tests := []struct { + name string + buflens []int // sizes of the buffers received on the state channel. + numReads int + closed bool // tries to read from closed reader + noGetbool // tries to read from nil get response reader + }{ + { + name: "emptyData", + buflens: []int{-1}, + numReads: 1, + }, { + name: "singleBufferSingleRead", + buflens: []int{readLen}, + numReads: 2, + }, { + name: "singleBufferMultipleReads", + buflens: []int{2 * readLen}, + numReads: 3, + }, { + name: "singleBufferShortRead", + buflens: []int{readLen - 1}, + numReads: 2, + }, { + name: "multiBuffer", + buflens: []int{readLen, readLen}, + numReads: 3, + }, { + name: "multiBuffer-short-reads", + buflens: []int{readLen - 1, readLen - 1, readLen - 2}, + numReads: 4, + }, { + name: "emptyDataFirst", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{-1, readLen, readLen}, + numReads: 4, + }, { + name: "emptyDataMid", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1, readLen}, + numReads: 5, + }, { + name: "emptyDataLast", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1}, + numReads: 3, + }, { + name: "emptyDataLast-short", + buflens: []int{3*readLen - 2, -1}, + numReads: 4, + }, { + name: "closed", + buflens: []int{-1, -1}, + numReads: 1, + closed: true, + }, { + name: "noGet", + buflens: []int{-1}, + numReads: 1, + noGet:true, + }, + } + for _, test := range tests { + t.Run(test.name, func(t *testing.T) { + ctx, cancelFn := context.WithCancel(context.Background()) + ch := { + id:"test", + requests: make(chan *fnpb.StateRequest), + responses: make(map[string]chan<- *fnpb.StateResponse), + cancelFn: cancelFn, + DoneCh:ctx.Done(), + } + + // Handle the channel behavior asynchronously. + go func() { + if test.noGet { + req := <-ch.requests + ch.responses[req.Id] <- { + Id: req.Id, + } + return + } + for i, buflen := range test.buflens { + var buf []byte + if buflen >= 0 { + buf = bytes.Repeat([]byte{42}, buflen) + } + token := []byte(fmt.Sprint(i)) + if i+1 == len(test.buflens) { +
[jira] [Work logged] (BEAM-9746) [Go SDK] Empty side inputs causing spurious zero elements
[ https://issues.apache.org/jira/browse/BEAM-9746?focusedWorklogId=422398=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422398 ] ASF GitHub Bot logged work on BEAM-9746: Author: ASF GitHub Bot Created on: 14/Apr/20 22:48 Start Date: 14/Apr/20 22:48 Worklog Time Spent: 10m Work Description: thetorpedodog commented on pull request #11413: [BEAM-9746] check for 0 length copies from state URL: https://github.com/apache/beam/pull/11413#discussion_r408478965 ## File path: sdks/go/pkg/beam/core/runtime/harness/statemgr_test.go ## @@ -258,6 +261,167 @@ func TestStateChannel(t *testing.T) { } } +// TestStateKeyReader validates ordinary Read cases +func TestStateKeyReader(t *testing.T) { + const readLen = 4 + tests := []struct { + name string + buflens []int // sizes of the buffers received on the state channel. + numReads int + closed bool // tries to read from closed reader + noGetbool // tries to read from nil get response reader + }{ + { + name: "emptyData", + buflens: []int{-1}, + numReads: 1, + }, { + name: "singleBufferSingleRead", + buflens: []int{readLen}, + numReads: 2, + }, { + name: "singleBufferMultipleReads", + buflens: []int{2 * readLen}, + numReads: 3, + }, { + name: "singleBufferShortRead", + buflens: []int{readLen - 1}, + numReads: 2, + }, { + name: "multiBuffer", + buflens: []int{readLen, readLen}, + numReads: 3, + }, { + name: "multiBuffer-short-reads", + buflens: []int{readLen - 1, readLen - 1, readLen - 2}, + numReads: 4, + }, { + name: "emptyDataFirst", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{-1, readLen, readLen}, + numReads: 4, + }, { + name: "emptyDataMid", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1, readLen}, + numReads: 5, + }, { + name: "emptyDataLast", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1}, + numReads: 3, + }, { + name: "emptyDataLast-short", + buflens: []int{3*readLen - 2, -1}, + numReads: 4, + }, { + name: "closed", + buflens: []int{-1, -1}, + numReads: 1, + closed: true, + }, { + name: "noGet", + buflens: []int{-1}, + numReads: 1, + noGet:true, + }, + } + for _, test := range tests { + t.Run(test.name, func(t *testing.T) { + ctx, cancelFn := context.WithCancel(context.Background()) + ch := { + id:"test", + requests: make(chan *fnpb.StateRequest), + responses: make(map[string]chan<- *fnpb.StateResponse), + cancelFn: cancelFn, + DoneCh:ctx.Done(), + } + + // Handle the channel behavior asynchronously. + go func() { + for i := 0; i < len(test.buflens); i++ { + token := []byte(strconv.Itoa(i)) + var buf []byte + if test.buflens[i] >= 0 { + buf = bytes.Repeat([]byte{42}, test.buflens[i]) + } + // On the last request response pair, send no token. + if i+1 == len(test.buflens) { + token = nil + } + + req := <-ch.requests + + if test.noGet { +
[jira] [Work logged] (BEAM-9746) [Go SDK] Empty side inputs causing spurious zero elements
[ https://issues.apache.org/jira/browse/BEAM-9746?focusedWorklogId=422399=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422399 ] ASF GitHub Bot logged work on BEAM-9746: Author: ASF GitHub Bot Created on: 14/Apr/20 22:48 Start Date: 14/Apr/20 22:48 Worklog Time Spent: 10m Work Description: thetorpedodog commented on pull request #11413: [BEAM-9746] check for 0 length copies from state URL: https://github.com/apache/beam/pull/11413#discussion_r408479609 ## File path: sdks/go/pkg/beam/core/runtime/harness/statemgr_test.go ## @@ -258,6 +260,166 @@ func TestStateChannel(t *testing.T) { } } +// TestStateKeyReader validates ordinary Read cases +func TestStateKeyReader(t *testing.T) { + const readLen = 4 + tests := []struct { + name string + buflens []int // sizes of the buffers received on the state channel. + numReads int + closed bool // tries to read from closed reader + noGetbool // tries to read from nil get response reader + }{ + { + name: "emptyData", + buflens: []int{-1}, + numReads: 1, + }, { + name: "singleBufferSingleRead", + buflens: []int{readLen}, + numReads: 2, + }, { + name: "singleBufferMultipleReads", + buflens: []int{2 * readLen}, + numReads: 3, + }, { + name: "singleBufferShortRead", + buflens: []int{readLen - 1}, + numReads: 2, + }, { + name: "multiBuffer", + buflens: []int{readLen, readLen}, + numReads: 3, + }, { + name: "multiBuffer-short-reads", + buflens: []int{readLen - 1, readLen - 1, readLen - 2}, + numReads: 4, + }, { + name: "emptyDataFirst", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{-1, readLen, readLen}, + numReads: 4, + }, { + name: "emptyDataMid", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1, readLen}, + numReads: 5, + }, { + name: "emptyDataLast", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1}, + numReads: 3, + }, { + name: "emptyDataLast-short", + buflens: []int{3*readLen - 2, -1}, + numReads: 4, + }, { + name: "closed", + buflens: []int{-1, -1}, + numReads: 1, + closed: true, + }, { + name: "noGet", + buflens: []int{-1}, + numReads: 1, + noGet:true, + }, + } + for _, test := range tests { + t.Run(test.name, func(t *testing.T) { + ctx, cancelFn := context.WithCancel(context.Background()) + ch := { + id:"test", + requests: make(chan *fnpb.StateRequest), + responses: make(map[string]chan<- *fnpb.StateResponse), + cancelFn: cancelFn, + DoneCh:ctx.Done(), + } + + // Handle the channel behavior asynchronously. + go func() { + if test.noGet { + req := <-ch.requests + ch.responses[req.Id] <- { + Id: req.Id, + } + return + } + for i, buflen := range test.buflens { + var buf []byte + if buflen >= 0 { + buf = bytes.Repeat([]byte{42}, buflen) + } + token := []byte(fmt.Sprint(i)) + if i+1 == len(test.buflens) { +
[jira] [Updated] (BEAM-9758) [very low priority] Asterisks in nexmark should be escaped
[ https://issues.apache.org/jira/browse/BEAM-9758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-9758: -- Status: Open (was: Triage Needed) > [very low priority] Asterisks in nexmark should be escaped > -- > > Key: BEAM-9758 > URL: https://issues.apache.org/jira/browse/BEAM-9758 > Project: Beam > Issue Type: Bug > Components: testing-nexmark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Trivial > > Nexmark tests echo asterisks. These don't print as asterisks, but instead are > expanded by the shell to show the files in the current folder, e.g. "echo src > RUN NEXMARK IN BATCH MODE USING DATAFLOW RUNNER src". Which is not really > much of an issue but I thought it was kind of silly :) > https://github.com/apache/beam/blob/f950b71bb37366cf23bab43eb46f77c761e2300b/.test-infra/jenkins/NexmarkBuilder.groovy#L77 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9758) [very low priority] Asterisks in nexmark should be escaped
Kyle Weaver created BEAM-9758: - Summary: [very low priority] Asterisks in nexmark should be escaped Key: BEAM-9758 URL: https://issues.apache.org/jira/browse/BEAM-9758 Project: Beam Issue Type: Bug Components: testing-nexmark Reporter: Kyle Weaver Assignee: Kyle Weaver Nexmark tests echo asterisks. These don't print as asterisks, but instead are expanded by the shell to show the files in the current folder, e.g. "echo src RUN NEXMARK IN BATCH MODE USING DATAFLOW RUNNER src". Which is not really much of an issue but I thought it was kind of silly :) https://github.com/apache/beam/blob/f950b71bb37366cf23bab43eb46f77c761e2300b/.test-infra/jenkins/NexmarkBuilder.groovy#L77 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9756) beam_PostCommit_Java_Nexmark (non-Dataflow) failing
[ https://issues.apache.org/jira/browse/BEAM-9756?focusedWorklogId=422397=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422397 ] ASF GitHub Bot logged work on BEAM-9756: Author: ASF GitHub Bot Created on: 14/Apr/20 22:36 Start Date: 14/Apr/20 22:36 Worklog Time Spent: 10m Work Description: ibzib commented on issue #11417: [BEAM-9756] Nexmark: only use --region in Dataflow. URL: https://github.com/apache/beam/pull/11417#issuecomment-613714882 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422397) Time Spent: 1h (was: 50m) > beam_PostCommit_Java_Nexmark (non-Dataflow) failing > --- > > Key: BEAM-9756 > URL: https://issues.apache.org/jira/browse/BEAM-9756 > Project: Beam > Issue Type: Bug > Components: test-failures, testing-nexmark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > 12:02:24 Exception in thread "main" java.lang.IllegalArgumentException: Class > interface org.apache.beam.sdk.nexmark.NexmarkOptions missing a property named > 'region'. > 12:02:24 at > org.apache.beam.sdk.options.PipelineOptionsFactory.parseObjects(PipelineOptionsFactory.java:1625) > 12:02:24 at > org.apache.beam.sdk.options.PipelineOptionsFactory.access$400(PipelineOptionsFactory.java:115) > 12:02:24 at > org.apache.beam.sdk.options.PipelineOptionsFactory$Builder.as(PipelineOptionsFactory.java:298) > 12:02:24 at org.apache.beam.sdk.nexmark.Main.runAll(Main.java:98) > 12:02:24 at org.apache.beam.sdk.nexmark.Main.main(Main.java:415) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8872) Add support for splitting at fractions > 0 to org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker
[ https://issues.apache.org/jira/browse/BEAM-8872?focusedWorklogId=422396=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422396 ] ASF GitHub Bot logged work on BEAM-8872: Author: ASF GitHub Bot Created on: 14/Apr/20 22:32 Start Date: 14/Apr/20 22:32 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11418: [BEAM-8872] Support split at fraction for OffsetRangeTracker URL: https://github.com/apache/beam/pull/11418#discussion_r408473046 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java ## @@ -49,13 +50,22 @@ public OffsetRange currentRestriction() { @Override public SplitResult trySplit(double fractionOfRemainder) { -// TODO(BEAM-8872): Add support for splitting off a fixed amount of work for this restriction -// instead of only supporting checkpointing. - -checkState( -lastClaimedOffset != null, "Can't checkpoint before any offset was successfully claimed"); -OffsetRange res = new OffsetRange(lastClaimedOffset + 1, range.getTo()); -this.range = new OffsetRange(range.getFrom(), lastClaimedOffset + 1); +checkState(lastClaimedOffset != null, "Can't split before any offset was successfully claimed"); Review comment: We can split before any successfully claimed block by returning `[from, to)` and updating the current range to be `[from, from)` This makes sense in some cases where we want to handoff all the work to someone else for the active element while this bundle finishes other processing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422396) Time Spent: 40m (was: 0.5h) > Add support for splitting at fractions > 0 to > org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker > -- > > Key: BEAM-8872 > URL: https://issues.apache.org/jira/browse/BEAM-8872 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker only > supports checkpointing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8872) Add support for splitting at fractions > 0 to org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker
[ https://issues.apache.org/jira/browse/BEAM-8872?focusedWorklogId=422394=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422394 ] ASF GitHub Bot logged work on BEAM-8872: Author: ASF GitHub Bot Created on: 14/Apr/20 22:32 Start Date: 14/Apr/20 22:32 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11418: [BEAM-8872] Support split at fraction for OffsetRangeTracker URL: https://github.com/apache/beam/pull/11418#discussion_r408473046 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java ## @@ -49,13 +50,22 @@ public OffsetRange currentRestriction() { @Override public SplitResult trySplit(double fractionOfRemainder) { -// TODO(BEAM-8872): Add support for splitting off a fixed amount of work for this restriction -// instead of only supporting checkpointing. - -checkState( -lastClaimedOffset != null, "Can't checkpoint before any offset was successfully claimed"); -OffsetRange res = new OffsetRange(lastClaimedOffset + 1, range.getTo()); -this.range = new OffsetRange(range.getFrom(), lastClaimedOffset + 1); +checkState(lastClaimedOffset != null, "Can't split before any offset was successfully claimed"); Review comment: We can split before any successfully claimed block by returning `[from, to)` and updating the current range to be `[from, from)` This makes sense in some cases where we want to handoff all the work to someone else for the active element while this Bundle finishes other processing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422394) Time Spent: 20m (was: 10m) > Add support for splitting at fractions > 0 to > org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker > -- > > Key: BEAM-8872 > URL: https://issues.apache.org/jira/browse/BEAM-8872 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker only > supports checkpointing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8872) Add support for splitting at fractions > 0 to org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker
[ https://issues.apache.org/jira/browse/BEAM-8872?focusedWorklogId=422395=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422395 ] ASF GitHub Bot logged work on BEAM-8872: Author: ASF GitHub Bot Created on: 14/Apr/20 22:32 Start Date: 14/Apr/20 22:32 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11418: [BEAM-8872] Support split at fraction for OffsetRangeTracker URL: https://github.com/apache/beam/pull/11418#discussion_r408473271 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.java ## @@ -49,13 +50,22 @@ public OffsetRange currentRestriction() { @Override public SplitResult trySplit(double fractionOfRemainder) { -// TODO(BEAM-8872): Add support for splitting off a fixed amount of work for this restriction -// instead of only supporting checkpointing. - -checkState( -lastClaimedOffset != null, "Can't checkpoint before any offset was successfully claimed"); -OffsetRange res = new OffsetRange(lastClaimedOffset + 1, range.getTo()); -this.range = new OffsetRange(range.getFrom(), lastClaimedOffset + 1); +checkState(lastClaimedOffset != null, "Can't split before any offset was successfully claimed"); +// No more split should be performed if checkpoint has happened. +if (checkpointed) { + return null; +} +Long splitPos = +lastClaimedOffset ++ Math.max(1L, (long) ((range.getTo() - lastClaimedOffset) * fractionOfRemainder)); +if (splitPos >= range.getTo()) { + return null; +} +if (fractionOfRemainder == 0.0) { + checkpointed = true; Review comment: Why do we need `checkpointed`? Shouldn't the range restriction change so that `to` becomes `lastClaimed` (or `from` if nothing has been claimed)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422395) Time Spent: 0.5h (was: 20m) > Add support for splitting at fractions > 0 to > org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker > -- > > Key: BEAM-8872 > URL: https://issues.apache.org/jira/browse/BEAM-8872 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker only > supports checkpointing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=422393=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422393 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 14/Apr/20 22:30 Start Date: 14/Apr/20 22:30 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on pull request #11416: [BEAM-9136] reduce third_party_dependencies size URL: https://github.com/apache/beam/pull/11416 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422393) Time Spent: 22h 10m (was: 22h) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 22h 10m > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage
[ https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=422392=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422392 ] ASF GitHub Bot logged work on BEAM-8889: Author: ASF GitHub Bot Created on: 14/Apr/20 22:20 Start Date: 14/Apr/20 22:20 Worklog Time Spent: 10m Work Description: stale[bot] commented on issue #10857: [BEAM-8889] Upgrade guava to 28.0-jre URL: https://github.com/apache/beam/pull/10857#issuecomment-613709936 This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the d...@beam.apache.org list. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422392) Remaining Estimate: 138h (was: 138h 10m) Time Spent: 30h (was: 29h 50m) > Make GcsUtil use GoogleCloudStorage > --- > > Key: BEAM-8889 > URL: https://issues.apache.org/jira/browse/BEAM-8889 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.16.0 >Reporter: Esun Kim >Assignee: VASU NORI >Priority: Major > Labels: gcs > Original Estimate: 168h > Time Spent: 30h > Remaining Estimate: 138h > > [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java] > is a primary class to access Google Cloud Storage on Apache Beam. Current > implementation directly creates GoogleCloudStorageReadChannel and > GoogleCloudStorageWriteChannel by itself to read and write GCS data rather > than using > [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java] > which is an abstract class providing basic IO capability which eventually > creates channel objects. This request is about updating GcsUtil to use > GoogleCloudStorage to create read and write channel, which is expected > flexible because it can easily pick up the new change; e.g. new channel > implementation using new protocol without code change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9746) [Go SDK] Empty side inputs causing spurious zero elements
[ https://issues.apache.org/jira/browse/BEAM-9746?focusedWorklogId=422388=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422388 ] ASF GitHub Bot logged work on BEAM-9746: Author: ASF GitHub Bot Created on: 14/Apr/20 22:17 Start Date: 14/Apr/20 22:17 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #11413: [BEAM-9746] check for 0 length copies from state URL: https://github.com/apache/beam/pull/11413#discussion_r408468636 ## File path: sdks/go/pkg/beam/core/runtime/harness/statemgr_test.go ## @@ -258,6 +261,167 @@ func TestStateChannel(t *testing.T) { } } +// TestStateKeyReader validates ordinary Read cases +func TestStateKeyReader(t *testing.T) { + const readLen = 4 + tests := []struct { + name string + buflens []int // sizes of the buffers received on the state channel. + numReads int + closed bool // tries to read from closed reader + noGetbool // tries to read from nil get response reader + }{ + { + name: "emptyData", + buflens: []int{-1}, + numReads: 1, + }, { + name: "singleBufferSingleRead", + buflens: []int{readLen}, + numReads: 2, + }, { + name: "singleBufferMultipleReads", + buflens: []int{2 * readLen}, + numReads: 3, + }, { + name: "singleBufferShortRead", + buflens: []int{readLen - 1}, + numReads: 2, + }, { + name: "multiBuffer", + buflens: []int{readLen, readLen}, + numReads: 3, + }, { + name: "multiBuffer-short-reads", + buflens: []int{readLen - 1, readLen - 1, readLen - 2}, + numReads: 4, + }, { + name: "emptyDataFirst", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{-1, readLen, readLen}, + numReads: 4, + }, { + name: "emptyDataMid", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1, readLen}, + numReads: 5, + }, { + name: "emptyDataLast", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1}, + numReads: 3, + }, { + name: "emptyDataLast-short", + buflens: []int{3*readLen - 2, -1}, + numReads: 4, + }, { + name: "closed", + buflens: []int{-1, -1}, + numReads: 1, + closed: true, + }, { + name: "noGet", + buflens: []int{-1}, + numReads: 1, + noGet:true, + }, + } + for _, test := range tests { + t.Run(test.name, func(t *testing.T) { + ctx, cancelFn := context.WithCancel(context.Background()) + ch := { + id:"test", + requests: make(chan *fnpb.StateRequest), + responses: make(map[string]chan<- *fnpb.StateResponse), + cancelFn: cancelFn, + DoneCh:ctx.Done(), + } + + // Handle the channel behavior asynchronously. + go func() { + for i := 0; i < len(test.buflens); i++ { + token := []byte(strconv.Itoa(i)) + var buf []byte + if test.buflens[i] >= 0 { + buf = bytes.Repeat([]byte{42}, test.buflens[i]) + } + // On the last request response pair, send no token. + if i+1 == len(test.buflens) { + token = nil + } + + req := <-ch.requests + + if test.noGet { +
[jira] [Work logged] (BEAM-9746) [Go SDK] Empty side inputs causing spurious zero elements
[ https://issues.apache.org/jira/browse/BEAM-9746?focusedWorklogId=422387=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422387 ] ASF GitHub Bot logged work on BEAM-9746: Author: ASF GitHub Bot Created on: 14/Apr/20 22:17 Start Date: 14/Apr/20 22:17 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #11413: [BEAM-9746] check for 0 length copies from state URL: https://github.com/apache/beam/pull/11413#discussion_r408468620 ## File path: sdks/go/pkg/beam/core/runtime/harness/statemgr_test.go ## @@ -258,6 +261,167 @@ func TestStateChannel(t *testing.T) { } } +// TestStateKeyReader validates ordinary Read cases +func TestStateKeyReader(t *testing.T) { + const readLen = 4 + tests := []struct { + name string + buflens []int // sizes of the buffers received on the state channel. + numReads int + closed bool // tries to read from closed reader + noGetbool // tries to read from nil get response reader + }{ + { + name: "emptyData", + buflens: []int{-1}, + numReads: 1, + }, { + name: "singleBufferSingleRead", + buflens: []int{readLen}, + numReads: 2, + }, { + name: "singleBufferMultipleReads", + buflens: []int{2 * readLen}, + numReads: 3, + }, { + name: "singleBufferShortRead", + buflens: []int{readLen - 1}, + numReads: 2, + }, { + name: "multiBuffer", + buflens: []int{readLen, readLen}, + numReads: 3, + }, { + name: "multiBuffer-short-reads", + buflens: []int{readLen - 1, readLen - 1, readLen - 2}, + numReads: 4, + }, { + name: "emptyDataFirst", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{-1, readLen, readLen}, + numReads: 4, + }, { + name: "emptyDataMid", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1, readLen}, + numReads: 5, + }, { + name: "emptyDataLast", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1}, + numReads: 3, + }, { + name: "emptyDataLast-short", + buflens: []int{3*readLen - 2, -1}, + numReads: 4, + }, { + name: "closed", + buflens: []int{-1, -1}, + numReads: 1, + closed: true, + }, { + name: "noGet", + buflens: []int{-1}, + numReads: 1, + noGet:true, + }, + } + for _, test := range tests { + t.Run(test.name, func(t *testing.T) { + ctx, cancelFn := context.WithCancel(context.Background()) + ch := { + id:"test", + requests: make(chan *fnpb.StateRequest), + responses: make(map[string]chan<- *fnpb.StateResponse), + cancelFn: cancelFn, + DoneCh:ctx.Done(), + } + + // Handle the channel behavior asynchronously. + go func() { + for i := 0; i < len(test.buflens); i++ { + token := []byte(strconv.Itoa(i)) + var buf []byte + if test.buflens[i] >= 0 { + buf = bytes.Repeat([]byte{42}, test.buflens[i]) + } + // On the last request response pair, send no token. + if i+1 == len(test.buflens) { + token = nil + } Review comment: Great suggestions. Done. This is an automated message from the Apache Git
[jira] [Work logged] (BEAM-9746) [Go SDK] Empty side inputs causing spurious zero elements
[ https://issues.apache.org/jira/browse/BEAM-9746?focusedWorklogId=422386=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422386 ] ASF GitHub Bot logged work on BEAM-9746: Author: ASF GitHub Bot Created on: 14/Apr/20 22:16 Start Date: 14/Apr/20 22:16 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #11413: [BEAM-9746] check for 0 length copies from state URL: https://github.com/apache/beam/pull/11413#discussion_r408468603 ## File path: sdks/go/pkg/beam/core/runtime/harness/statemgr_test.go ## @@ -258,6 +261,167 @@ func TestStateChannel(t *testing.T) { } } +// TestStateKeyReader validates ordinary Read cases +func TestStateKeyReader(t *testing.T) { + const readLen = 4 + tests := []struct { + name string + buflens []int // sizes of the buffers received on the state channel. + numReads int + closed bool // tries to read from closed reader + noGetbool // tries to read from nil get response reader + }{ + { + name: "emptyData", + buflens: []int{-1}, + numReads: 1, + }, { + name: "singleBufferSingleRead", + buflens: []int{readLen}, + numReads: 2, + }, { + name: "singleBufferMultipleReads", + buflens: []int{2 * readLen}, + numReads: 3, + }, { + name: "singleBufferShortRead", + buflens: []int{readLen - 1}, + numReads: 2, + }, { + name: "multiBuffer", + buflens: []int{readLen, readLen}, + numReads: 3, + }, { + name: "multiBuffer-short-reads", + buflens: []int{readLen - 1, readLen - 1, readLen - 2}, + numReads: 4, + }, { + name: "emptyDataFirst", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{-1, readLen, readLen}, + numReads: 4, + }, { + name: "emptyDataMid", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1, readLen}, + numReads: 5, + }, { + name: "emptyDataLast", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1}, + numReads: 3, + }, { + name: "emptyDataLast-short", + buflens: []int{3*readLen - 2, -1}, + numReads: 4, + }, { + name: "closed", + buflens: []int{-1, -1}, + numReads: 1, + closed: true, + }, { + name: "noGet", + buflens: []int{-1}, + numReads: 1, + noGet:true, + }, + } + for _, test := range tests { + t.Run(test.name, func(t *testing.T) { + ctx, cancelFn := context.WithCancel(context.Background()) + ch := { + id:"test", + requests: make(chan *fnpb.StateRequest), + responses: make(map[string]chan<- *fnpb.StateResponse), + cancelFn: cancelFn, + DoneCh:ctx.Done(), + } + + // Handle the channel behavior asynchronously. + go func() { + for i := 0; i < len(test.buflens); i++ { + token := []byte(strconv.Itoa(i)) + var buf []byte + if test.buflens[i] >= 0 { + buf = bytes.Repeat([]byte{42}, test.buflens[i]) + } + // On the last request response pair, send no token. + if i+1 == len(test.buflens) { + token = nil + } + + req := <-ch.requests + + if test.noGet { +
[jira] [Work logged] (BEAM-9577) Update artifact staging and retrieval protocols to be dependency aware.
[ https://issues.apache.org/jira/browse/BEAM-9577?focusedWorklogId=422383=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422383 ] ASF GitHub Bot logged work on BEAM-9577: Author: ASF GitHub Bot Created on: 14/Apr/20 22:16 Start Date: 14/Apr/20 22:16 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11420: [BEAM-9577] Fix test to create urls from paths which are compatible with Windows. URL: https://github.com/apache/beam/pull/11420 Tested manually on a Windows VM with Python 3.7 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-9577) Update artifact staging and retrieval protocols to be dependency aware.
[ https://issues.apache.org/jira/browse/BEAM-9577?focusedWorklogId=422385=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422385 ] ASF GitHub Bot logged work on BEAM-9577: Author: ASF GitHub Bot Created on: 14/Apr/20 22:16 Start Date: 14/Apr/20 22:16 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11420: [BEAM-9577] Fix test to create urls from paths which are compatible with Windows. URL: https://github.com/apache/beam/pull/11420#issuecomment-613708529 R: @tvalentyn @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422385) Time Spent: 14.5h (was: 14h 20m) > Update artifact staging and retrieval protocols to be dependency aware. > --- > > Key: BEAM-9577 > URL: https://issues.apache.org/jira/browse/BEAM-9577 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 14.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=422375=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422375 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 14/Apr/20 22:12 Start Date: 14/Apr/20 22:12 Worklog Time Spent: 10m Work Description: jaketf commented on issue #11339: [BEAM-9468] [WIP] Fhir io URL: https://github.com/apache/beam/pull/11339#issuecomment-611125298 TODOs: - [x] ValueProvider support - [x] Add example usage to javadoc - [x] Unit test for FhirIO dead letter handling - [ ] Add IT for FhirIO.Read - [ ] Migrate ITs to parameterized tests to DRY up ITs against different FHIR versions - [ ] Benchmark / load test the FhirIO.Import This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422375) Time Spent: 26h 50m (was: 26h 40m) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 26h 50m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9748) Add Reshuffle.ForSequentiallyGeneratedInput transform
[ https://issues.apache.org/jira/browse/BEAM-9748?focusedWorklogId=422374=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422374 ] ASF GitHub Bot logged work on BEAM-9748: Author: ASF GitHub Bot Created on: 14/Apr/20 22:08 Start Date: 14/Apr/20 22:08 Worklog Time Spent: 10m Work Description: iemejia commented on issue #11406: [BEAM-9748] Move Reparallelize transform to Reshuffle URL: https://github.com/apache/beam/pull/11406#issuecomment-613706077 Changes addressed and renamed to the suggested name by Eugene. PTAL again @lukecwik This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422374) Time Spent: 1h 40m (was: 1.5h) > Add Reshuffle.ForSequentiallyGeneratedInput transform > - > > Key: BEAM-9748 > URL: https://issues.apache.org/jira/browse/BEAM-9748 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > Some DoFn based IOs like JdbcIO and RedisIO rely on a different approach to > Reparallelize outputs using a combination of a an empty PCollectionView to > force materialization and Reshuffle.viaRandomkey to reparallelize a > PCollection. This issue extracts this transform and expose it as part of the > Reshuffle to avoid repeating the code for transforms (notably IOs) that > produce lots of sequentially generated data where and benefit of this > alternative approach to perform better reparallelization of its output. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9748) Add Reshuffle.ForSequentiallyGeneratedInput transform
[ https://issues.apache.org/jira/browse/BEAM-9748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9748: --- Description: Some DoFn based IOs like JdbcIO and RedisIO rely on a different approach to Reparallelize outputs using a combination of a an empty PCollectionView to force materialization and Reshuffle.viaRandomkey to reparallelize a PCollection. This issue extracts this transform and expose it as part of the Reshuffle to avoid repeating the code for transforms (notably IOs) that produce lots of sequentially generated data where and benefit of this alternative approach to perform better reparallelization of its output. (was: Some DoFn based IOs like JdbcIO and RedisIO rely on the Reparallelize transform, a combination of a an empty PCollectionView and Reshuffle to force the materialization and reparallelize a PCollection. The idea of this issue is to extract this transform and expose it as part of the internal Reshuffle transform to avoid repeating the code for transforms (notably IOs) that require to reparallelize its output.) > Add Reshuffle.ForSequentiallyGeneratedInput transform > - > > Key: BEAM-9748 > URL: https://issues.apache.org/jira/browse/BEAM-9748 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > Some DoFn based IOs like JdbcIO and RedisIO rely on a different approach to > Reparallelize outputs using a combination of a an empty PCollectionView to > force materialization and Reshuffle.viaRandomkey to reparallelize a > PCollection. This issue extracts this transform and expose it as part of the > Reshuffle to avoid repeating the code for transforms (notably IOs) that > produce lots of sequentially generated data where and benefit of this > alternative approach to perform better reparallelization of its output. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2939) Fn API SDF support
[ https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=422372=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422372 ] ASF GitHub Bot logged work on BEAM-2939: Author: ASF GitHub Bot Created on: 14/Apr/20 22:01 Start Date: 14/Apr/20 22:01 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11409: [BEAM-2939] Update unbounded source as SDF wrapper to resume successfully. URL: https://github.com/apache/beam/pull/11409#issuecomment-613703212 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422372) Time Spent: 27h 20m (was: 27h 10m) > Fn API SDF support > -- > > Key: BEAM-2939 > URL: https://issues.apache.org/jira/browse/BEAM-2939 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Major > Labels: portability > Time Spent: 27h 20m > Remaining Estimate: 0h > > The Fn API should support streaming SDF. Detailed design TBD. > Once design is ready, expand subtasks similarly to BEAM-2822. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9748) Add Reshuffle.ForSequentiallyGeneratedInput transform
[ https://issues.apache.org/jira/browse/BEAM-9748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9748: --- Summary: Add Reshuffle.ForSequentiallyGeneratedInput transform (was: Add Reshuffle.forSequentiallyGeneratedInput() transform) > Add Reshuffle.ForSequentiallyGeneratedInput transform > - > > Key: BEAM-9748 > URL: https://issues.apache.org/jira/browse/BEAM-9748 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > Some DoFn based IOs like JdbcIO and RedisIO rely on the Reparallelize > transform, > a combination of a an empty PCollectionView and Reshuffle to force the > materialization and reparallelize a PCollection. The idea of this issue is to > extract this transform and expose it as part of the internal Reshuffle > transform to avoid repeating the code for transforms (notably IOs) that > require > to reparallelize its output. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9748) Add Reshuffle.forSequentiallyGeneratedInput() transform
[ https://issues.apache.org/jira/browse/BEAM-9748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9748: --- Summary: Add Reshuffle.forSequentiallyGeneratedInput() transform (was: Move Reparallelize transform to Reshuffle) > Add Reshuffle.forSequentiallyGeneratedInput() transform > --- > > Key: BEAM-9748 > URL: https://issues.apache.org/jira/browse/BEAM-9748 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > Some DoFn based IOs like JdbcIO and RedisIO rely on the Reparallelize > transform, > a combination of a an empty PCollectionView and Reshuffle to force the > materialization and reparallelize a PCollection. The idea of this issue is to > extract this transform and expose it as part of the internal Reshuffle > transform to avoid repeating the code for transforms (notably IOs) that > require > to reparallelize its output. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9692) Clean Python DataflowRunner to use portable pipelines
[ https://issues.apache.org/jira/browse/BEAM-9692?focusedWorklogId=422371=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422371 ] ASF GitHub Bot logged work on BEAM-9692: Author: ASF GitHub Bot Created on: 14/Apr/20 22:00 Start Date: 14/Apr/20 22:00 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11335: [BEAM-9692]: Make CombineValues portable URL: https://github.com/apache/beam/pull/11335#discussion_r408461679 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -110,22 +110,27 @@ class DataflowRunner(PipelineRunner): # Imported here to avoid circular dependencies. # TODO: Remove the apache_beam.pipeline dependency in CreatePTransformOverride + from apache_beam.runners.dataflow.ptransform_overrides import CombineValuesPTransformOverride from apache_beam.runners.dataflow.ptransform_overrides import CreatePTransformOverride from apache_beam.runners.dataflow.ptransform_overrides import ReadPTransformOverride from apache_beam.runners.dataflow.ptransform_overrides import JrhReadPTransformOverride - _PTRANSFORM_OVERRIDES = [] # type: List[PTransformOverride] + # Thesse overrides should be applied before the proto representation of the + # graph is created. + _PTRANSFORM_OVERRIDES = [ + CombineValuesPTransformOverride() Review comment: I suppose this is fine; it's just preserving an inconsistency in Dataflow vs. everything else. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422371) Time Spent: 1.5h (was: 1h 20m) > Clean Python DataflowRunner to use portable pipelines > - > > Key: BEAM-9692 > URL: https://issues.apache.org/jira/browse/BEAM-9692 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9692) Clean Python DataflowRunner to use portable pipelines
[ https://issues.apache.org/jira/browse/BEAM-9692?focusedWorklogId=422370=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422370 ] ASF GitHub Bot logged work on BEAM-9692: Author: ASF GitHub Bot Created on: 14/Apr/20 22:00 Start Date: 14/Apr/20 22:00 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11335: [BEAM-9692]: Make CombineValues portable URL: https://github.com/apache/beam/pull/11335#discussion_r408461337 ## File path: sdks/python/apache_beam/runners/dataflow/ptransform_overrides.py ## @@ -111,3 +111,38 @@ def expand(self, pbegin): return JrhRead().with_output_types( ptransform.get_type_hints().simple_output_type('Read')) + + +class CombineValuesPTransformOverride(PTransformOverride): + """A ``PTransformOverride`` for ``CombineValues``. + + The DataflowRunner expects that the CombineValues PTransform acts as a + primitive. So this override replaces the CombineValues with a primitive. + """ + def matches(self, applied_ptransform): +# Imported here to avoid circular dependencies. +# pylint: disable=wrong-import-order, wrong-import-position +from apache_beam import CombineValues + +if isinstance(applied_ptransform.transform, CombineValues): + self.transform = applied_ptransform.transform + return True +return False + + def get_replacement_transform(self, ptransform): +# Imported here to avoid circular dependencies. +# pylint: disable=wrong-import-order, wrong-import-position +from apache_beam import PTransform +from apache_beam.pvalue import PCollection + +# The DataflowRunner still needs access to the CombineValues members to Review comment: I was thinking that run_xxx could also be called for composites. That might, however, be a bigger change, so we can go with this approach. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422370) Time Spent: 1.5h (was: 1h 20m) > Clean Python DataflowRunner to use portable pipelines > - > > Key: BEAM-9692 > URL: https://issues.apache.org/jira/browse/BEAM-9692 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9650) Add consistent slowly changing side inputs support
[ https://issues.apache.org/jira/browse/BEAM-9650?focusedWorklogId=422357=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422357 ] ASF GitHub Bot logged work on BEAM-9650: Author: ASF GitHub Bot Created on: 14/Apr/20 21:42 Start Date: 14/Apr/20 21:42 Worklog Time Spent: 10m Work Description: Ardagan commented on pull request #11415: [BEAM-9650] Cleanup documentation on side inputs patterns URL: https://github.com/apache/beam/pull/11415#discussion_r408453789 ## File path: website/src/documentation/patterns/side-inputs.md ## @@ -50,25 +50,25 @@ For instance, the following code sample uses a `Map` to create a `DoFn`. The `Ma ## Slowly updating side input using windowing -You can read side input pcollection periodically into distinct windows. -Later, when you apply side input to your main input, windows will be matched automatically 1:1. -This way, you can guarantee side input consistency on the duration of the single window. - -To do this, you can utilize PeriodicSequence PTransform that will generate infinite sequence -of elements with some real-time period: +You can read side input data periodically into distinct PCollection windows. +Later, when you apply the side input to your main input, each main input +window is automatically matched to a single side input window. +This guarantees side input consistency on the duration of the single window, +meaning that each window on main input side will be matched to a single +version of side input data. + +To do this, you can utilize a combination of PeriodicSequence/PeriodicImpulse +PTransforms that will generate infinite sequence of elements with some real-time +period and SDF Read operation or similar to read data into side input +periodically: 1. Use the PeriodicImpulse transform to generate windowed periodic sequence. -a. MAX_TIMESTAMP can be replaced with some closer boundary if you want to stop generating elements at some point. - 1. Read data using Read operation triggered by arrival of PCollection element. 1. Apply side input. -```python +```py Review comment: Website sets config for the page to show only python or java snippets. It seem to pick the last language provided as default. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422357) Time Spent: 5h 10m (was: 5h) > Add consistent slowly changing side inputs support > -- > > Key: BEAM-9650 > URL: https://issues.apache.org/jira/browse/BEAM-9650 > Project: Beam > Issue Type: Bug > Components: io-ideas >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 5h 10m > Remaining Estimate: 0h > > Add implementation for slowly changing dimentions based on [design > doc](https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg/edit] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2546) Create InfluxDbIO
[ https://issues.apache.org/jira/browse/BEAM-2546?focusedWorklogId=422355=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422355 ] ASF GitHub Bot logged work on BEAM-2546: Author: ASF GitHub Bot Created on: 14/Apr/20 21:40 Start Date: 14/Apr/20 21:40 Worklog Time Spent: 10m Work Description: iemejia commented on issue #11028: BEAM-2546 Beam IO for InfluxDB URL: https://github.com/apache/beam/pull/11028#issuecomment-613695575 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422355) Time Spent: 16h 20m (was: 16h 10m) > Create InfluxDbIO > - > > Key: BEAM-2546 > URL: https://issues.apache.org/jira/browse/BEAM-2546 > Project: Beam > Issue Type: New Feature > Components: io-ideas >Reporter: Jean-Baptiste Onofré >Assignee: Bipin Upadhyaya >Priority: Major > Time Spent: 16h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9738) Python Dataflow runner omits capabilities.
[ https://issues.apache.org/jira/browse/BEAM-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083619#comment-17083619 ] Robert Bradshaw commented on BEAM-9738: --- Yes, the cherry-pick has now been merged into the release branch. > Python Dataflow runner omits capabilities. > -- > > Key: BEAM-9738 > URL: https://issues.apache.org/jira/browse/BEAM-9738 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Blocker > Fix For: 2.21.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9738) Python Dataflow runner omits capabilities.
[ https://issues.apache.org/jira/browse/BEAM-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Bradshaw resolved BEAM-9738. --- Resolution: Fixed > Python Dataflow runner omits capabilities. > -- > > Key: BEAM-9738 > URL: https://issues.apache.org/jira/browse/BEAM-9738 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Blocker > Fix For: 2.21.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8603) Add Python SqlTransform MVP
[ https://issues.apache.org/jira/browse/BEAM-8603?focusedWorklogId=422354=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422354 ] ASF GitHub Bot logged work on BEAM-8603: Author: ASF GitHub Bot Created on: 14/Apr/20 21:38 Start Date: 14/Apr/20 21:38 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #10055: [BEAM-8603] Add Python SqlTransform URL: https://github.com/apache/beam/pull/10055#issuecomment-613694546 Run CommunityMetrics PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422354) Time Spent: 8h 50m (was: 8h 40m) > Add Python SqlTransform MVP > --- > > Key: BEAM-8603 > URL: https://issues.apache.org/jira/browse/BEAM-8603 > Project: Beam > Issue Type: Improvement > Components: dsl-sql, sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 8h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=422343=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422343 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 14/Apr/20 21:33 Start Date: 14/Apr/20 21:33 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on issue #11416: [BEAM-9136] reduce third_party_dependencies size URL: https://github.com/apache/beam/pull/11416#issuecomment-613692624 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422343) Time Spent: 22h (was: 21h 50m) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 22h > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9650) Add consistent slowly changing side inputs support
[ https://issues.apache.org/jira/browse/BEAM-9650?focusedWorklogId=422340=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422340 ] ASF GitHub Bot logged work on BEAM-9650: Author: ASF GitHub Bot Created on: 14/Apr/20 21:32 Start Date: 14/Apr/20 21:32 Worklog Time Spent: 10m Work Description: Ardagan commented on pull request #11415: [BEAM-9650] Cleanup documentation on side inputs patterns URL: https://github.com/apache/beam/pull/11415#discussion_r408448451 ## File path: website/src/documentation/patterns/side-inputs.md ## @@ -50,25 +50,25 @@ For instance, the following code sample uses a `Map` to create a `DoFn`. The `Ma ## Slowly updating side input using windowing -You can read side input pcollection periodically into distinct windows. -Later, when you apply side input to your main input, windows will be matched automatically 1:1. -This way, you can guarantee side input consistency on the duration of the single window. - -To do this, you can utilize PeriodicSequence PTransform that will generate infinite sequence -of elements with some real-time period: +You can read side input data periodically into distinct PCollection windows. +Later, when you apply the side input to your main input, each main input +window is automatically matched to a single side input window. +This guarantees side input consistency on the duration of the single window, +meaning that each window on main input side will be matched to a single +version of side input data. + +To do this, you can utilize a combination of PeriodicSequence/PeriodicImpulse +PTransforms that will generate infinite sequence of elements with some real-time +period and SDF Read operation or similar to read data into side input +periodically: 1. Use the PeriodicImpulse transform to generate windowed periodic sequence. -a. MAX_TIMESTAMP can be replaced with some closer boundary if you want to stop generating elements at some point. - 1. Read data using Read operation triggered by arrival of PCollection element. 1. Apply side input. -```python +```py Review comment: Seems that adding this snippet makes java snippet at L45 disappear from served site despite me not changing anything in code above. Do you know if there's some trick to make both snippets appear? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422340) Time Spent: 5h (was: 4h 50m) > Add consistent slowly changing side inputs support > -- > > Key: BEAM-9650 > URL: https://issues.apache.org/jira/browse/BEAM-9650 > Project: Beam > Issue Type: Bug > Components: io-ideas >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > Add implementation for slowly changing dimentions based on [design > doc](https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg/edit] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8603) Add Python SqlTransform MVP
[ https://issues.apache.org/jira/browse/BEAM-8603?focusedWorklogId=422331=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422331 ] ASF GitHub Bot logged work on BEAM-8603: Author: ASF GitHub Bot Created on: 14/Apr/20 21:13 Start Date: 14/Apr/20 21:13 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #10055: [BEAM-8603] Add Python SqlTransform URL: https://github.com/apache/beam/pull/10055#issuecomment-613684342 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422331) Time Spent: 8h 40m (was: 8.5h) > Add Python SqlTransform MVP > --- > > Key: BEAM-8603 > URL: https://issues.apache.org/jira/browse/BEAM-8603 > Project: Beam > Issue Type: Improvement > Components: dsl-sql, sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 8h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8603) Add Python SqlTransform MVP
[ https://issues.apache.org/jira/browse/BEAM-8603?focusedWorklogId=422330=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422330 ] ASF GitHub Bot logged work on BEAM-8603: Author: ASF GitHub Bot Created on: 14/Apr/20 21:12 Start Date: 14/Apr/20 21:12 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #10055: [BEAM-8603] Add Python SqlTransform URL: https://github.com/apache/beam/pull/10055#issuecomment-613684230 Run CommunityMetrics PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422330) Time Spent: 8.5h (was: 8h 20m) > Add Python SqlTransform MVP > --- > > Key: BEAM-8603 > URL: https://issues.apache.org/jira/browse/BEAM-8603 > Project: Beam > Issue Type: Improvement > Components: dsl-sql, sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 8.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8603) Add Python SqlTransform MVP
[ https://issues.apache.org/jira/browse/BEAM-8603?focusedWorklogId=422329=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422329 ] ASF GitHub Bot logged work on BEAM-8603: Author: ASF GitHub Bot Created on: 14/Apr/20 21:12 Start Date: 14/Apr/20 21:12 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #10055: [BEAM-8603] Add Python SqlTransform URL: https://github.com/apache/beam/pull/10055#issuecomment-613684149 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422329) Time Spent: 8h 20m (was: 8h 10m) > Add Python SqlTransform MVP > --- > > Key: BEAM-8603 > URL: https://issues.apache.org/jira/browse/BEAM-8603 > Project: Beam > Issue Type: Improvement > Components: dsl-sql, sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 8h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9650) Add consistent slowly changing side inputs support
[ https://issues.apache.org/jira/browse/BEAM-9650?focusedWorklogId=422326=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422326 ] ASF GitHub Bot logged work on BEAM-9650: Author: ASF GitHub Bot Created on: 14/Apr/20 21:05 Start Date: 14/Apr/20 21:05 Worklog Time Spent: 10m Work Description: soyrice commented on pull request #11415: [BEAM-9650] Cleanup documentation on side inputs patterns URL: https://github.com/apache/beam/pull/11415#discussion_r408433143 ## File path: website/src/documentation/patterns/side-inputs.md ## @@ -50,25 +50,25 @@ For instance, the following code sample uses a `Map` to create a `DoFn`. The `Ma ## Slowly updating side input using windowing -You can read side input pcollection periodically into distinct windows. -Later, when you apply side input to your main input, windows will be matched automatically 1:1. -This way, you can guarantee side input consistency on the duration of the single window. - -To do this, you can utilize PeriodicSequence PTransform that will generate infinite sequence -of elements with some real-time period: +You can read side input data periodically into distinct PCollection windows. +Later, when you apply the side input to your main input, each main input +window is automatically matched to a single side input window. +This guarantees side input consistency on the duration of the single window, +meaning that each window on main input side will be matched to a single +version of side input data. + +To do this, you can utilize a combination of PeriodicSequence/PeriodicImpulse +PTransforms that will generate infinite sequence of elements with some real-time Review comment: Present tense -> "PTransforms that generate" Missing article -> "generate an infinite sequence..." This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422326) Time Spent: 4h 50m (was: 4h 40m) > Add consistent slowly changing side inputs support > -- > > Key: BEAM-9650 > URL: https://issues.apache.org/jira/browse/BEAM-9650 > Project: Beam > Issue Type: Bug > Components: io-ideas >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > Add implementation for slowly changing dimentions based on [design > doc](https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg/edit] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9650) Add consistent slowly changing side inputs support
[ https://issues.apache.org/jira/browse/BEAM-9650?focusedWorklogId=422321=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422321 ] ASF GitHub Bot logged work on BEAM-9650: Author: ASF GitHub Bot Created on: 14/Apr/20 21:05 Start Date: 14/Apr/20 21:05 Worklog Time Spent: 10m Work Description: soyrice commented on pull request #11415: [BEAM-9650] Cleanup documentation on side inputs patterns URL: https://github.com/apache/beam/pull/11415#discussion_r408430834 ## File path: website/src/documentation/patterns/side-inputs.md ## @@ -50,25 +50,25 @@ For instance, the following code sample uses a `Map` to create a `DoFn`. The `Ma ## Slowly updating side input using windowing -You can read side input pcollection periodically into distinct windows. -Later, when you apply side input to your main input, windows will be matched automatically 1:1. -This way, you can guarantee side input consistency on the duration of the single window. - -To do this, you can utilize PeriodicSequence PTransform that will generate infinite sequence -of elements with some real-time period: +You can read side input data periodically into distinct PCollection windows. +Later, when you apply the side input to your main input, each main input +window is automatically matched to a single side input window. +This guarantees side input consistency on the duration of the single window, +meaning that each window on main input side will be matched to a single Review comment: What does the term "main input side" mean? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422321) Time Spent: 4h 20m (was: 4h 10m) > Add consistent slowly changing side inputs support > -- > > Key: BEAM-9650 > URL: https://issues.apache.org/jira/browse/BEAM-9650 > Project: Beam > Issue Type: Bug > Components: io-ideas >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > Add implementation for slowly changing dimentions based on [design > doc](https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg/edit] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=422320=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422320 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 14/Apr/20 21:05 Start Date: 14/Apr/20 21:05 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#issuecomment-613681105 @lukecwik ptal so @jaketf won't have to rebase if changes look fine This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422320) Time Spent: 26h 40m (was: 26.5h) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 26h 40m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9650) Add consistent slowly changing side inputs support
[ https://issues.apache.org/jira/browse/BEAM-9650?focusedWorklogId=422323=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422323 ] ASF GitHub Bot logged work on BEAM-9650: Author: ASF GitHub Bot Created on: 14/Apr/20 21:05 Start Date: 14/Apr/20 21:05 Worklog Time Spent: 10m Work Description: soyrice commented on pull request #11415: [BEAM-9650] Cleanup documentation on side inputs patterns URL: https://github.com/apache/beam/pull/11415#discussion_r408431325 ## File path: website/src/documentation/patterns/side-inputs.md ## @@ -50,25 +50,25 @@ For instance, the following code sample uses a `Map` to create a `DoFn`. The `Ma ## Slowly updating side input using windowing -You can read side input pcollection periodically into distinct windows. -Later, when you apply side input to your main input, windows will be matched automatically 1:1. -This way, you can guarantee side input consistency on the duration of the single window. - -To do this, you can utilize PeriodicSequence PTransform that will generate infinite sequence -of elements with some real-time period: +You can read side input data periodically into distinct PCollection windows. +Later, when you apply the side input to your main input, each main input +window is automatically matched to a single side input window. +This guarantees side input consistency on the duration of the single window, +meaning that each window on main input side will be matched to a single +version of side input data. + +To do this, you can utilize a combination of PeriodicSequence/PeriodicImpulse Review comment: It's probably best to replace "To do this" with "To do ABC" - it'll help the reader figure out what "this" refers to. Otherwise, if the reader is just skimming the page and starts at this paragraph, they have to read the previous paragraph to figure out what the pronoun refers to. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422323) Time Spent: 4.5h (was: 4h 20m) > Add consistent slowly changing side inputs support > -- > > Key: BEAM-9650 > URL: https://issues.apache.org/jira/browse/BEAM-9650 > Project: Beam > Issue Type: Bug > Components: io-ideas >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > > Add implementation for slowly changing dimentions based on [design > doc](https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg/edit] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9650) Add consistent slowly changing side inputs support
[ https://issues.apache.org/jira/browse/BEAM-9650?focusedWorklogId=422325=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422325 ] ASF GitHub Bot logged work on BEAM-9650: Author: ASF GitHub Bot Created on: 14/Apr/20 21:05 Start Date: 14/Apr/20 21:05 Worklog Time Spent: 10m Work Description: soyrice commented on pull request #11415: [BEAM-9650] Cleanup documentation on side inputs patterns URL: https://github.com/apache/beam/pull/11415#discussion_r408433930 ## File path: website/src/documentation/patterns/side-inputs.md ## @@ -50,25 +50,25 @@ For instance, the following code sample uses a `Map` to create a `DoFn`. The `Ma ## Slowly updating side input using windowing -You can read side input pcollection periodically into distinct windows. -Later, when you apply side input to your main input, windows will be matched automatically 1:1. -This way, you can guarantee side input consistency on the duration of the single window. - -To do this, you can utilize PeriodicSequence PTransform that will generate infinite sequence -of elements with some real-time period: +You can read side input data periodically into distinct PCollection windows. +Later, when you apply the side input to your main input, each main input +window is automatically matched to a single side input window. +This guarantees side input consistency on the duration of the single window, +meaning that each window on main input side will be matched to a single +version of side input data. + +To do this, you can utilize a combination of PeriodicSequence/PeriodicImpulse +PTransforms that will generate infinite sequence of elements with some real-time +period and SDF Read operation or similar to read data into side input Review comment: I think we should split this sentence into two - it's hard to follow. A good place to start a new sentence is at "SDF Read operation" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422325) > Add consistent slowly changing side inputs support > -- > > Key: BEAM-9650 > URL: https://issues.apache.org/jira/browse/BEAM-9650 > Project: Beam > Issue Type: Bug > Components: io-ideas >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > > Add implementation for slowly changing dimentions based on [design > doc](https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg/edit] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9650) Add consistent slowly changing side inputs support
[ https://issues.apache.org/jira/browse/BEAM-9650?focusedWorklogId=422327=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422327 ] ASF GitHub Bot logged work on BEAM-9650: Author: ASF GitHub Bot Created on: 14/Apr/20 21:05 Start Date: 14/Apr/20 21:05 Worklog Time Spent: 10m Work Description: soyrice commented on pull request #11415: [BEAM-9650] Cleanup documentation on side inputs patterns URL: https://github.com/apache/beam/pull/11415#discussion_r408434733 ## File path: website/src/documentation/patterns/side-inputs.md ## @@ -50,25 +50,25 @@ For instance, the following code sample uses a `Map` to create a `DoFn`. The `Ma ## Slowly updating side input using windowing -You can read side input pcollection periodically into distinct windows. -Later, when you apply side input to your main input, windows will be matched automatically 1:1. -This way, you can guarantee side input consistency on the duration of the single window. - -To do this, you can utilize PeriodicSequence PTransform that will generate infinite sequence -of elements with some real-time period: +You can read side input data periodically into distinct PCollection windows. +Later, when you apply the side input to your main input, each main input +window is automatically matched to a single side input window. +This guarantees side input consistency on the duration of the single window, +meaning that each window on main input side will be matched to a single +version of side input data. + +To do this, you can utilize a combination of PeriodicSequence/PeriodicImpulse +PTransforms that will generate infinite sequence of elements with some real-time +period and SDF Read operation or similar to read data into side input +periodically: 1. Use the PeriodicImpulse transform to generate windowed periodic sequence. -a. MAX_TIMESTAMP can be replaced with some closer boundary if you want to stop generating elements at some point. - 1. Read data using Read operation triggered by arrival of PCollection element. Review comment: On a second look, I think it's redundant to say "Read data using a Read operation" - maybe we can say something like "Read data when a PCollection element arrives" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422327) Time Spent: 4h 50m (was: 4h 40m) > Add consistent slowly changing side inputs support > -- > > Key: BEAM-9650 > URL: https://issues.apache.org/jira/browse/BEAM-9650 > Project: Beam > Issue Type: Bug > Components: io-ideas >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > Add implementation for slowly changing dimentions based on [design > doc](https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg/edit] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9650) Add consistent slowly changing side inputs support
[ https://issues.apache.org/jira/browse/BEAM-9650?focusedWorklogId=422322=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422322 ] ASF GitHub Bot logged work on BEAM-9650: Author: ASF GitHub Bot Created on: 14/Apr/20 21:05 Start Date: 14/Apr/20 21:05 Worklog Time Spent: 10m Work Description: soyrice commented on pull request #11415: [BEAM-9650] Cleanup documentation on side inputs patterns URL: https://github.com/apache/beam/pull/11415#discussion_r408430781 ## File path: website/src/documentation/patterns/side-inputs.md ## @@ -50,25 +50,25 @@ For instance, the following code sample uses a `Map` to create a `DoFn`. The `Ma ## Slowly updating side input using windowing -You can read side input pcollection periodically into distinct windows. -Later, when you apply side input to your main input, windows will be matched automatically 1:1. -This way, you can guarantee side input consistency on the duration of the single window. - -To do this, you can utilize PeriodicSequence PTransform that will generate infinite sequence -of elements with some real-time period: +You can read side input data periodically into distinct PCollection windows. +Later, when you apply the side input to your main input, each main input +window is automatically matched to a single side input window. +This guarantees side input consistency on the duration of the single window, +meaning that each window on main input side will be matched to a single Review comment: Missing "the" -> "on the main input" Present tense -> "is matched to a single..." This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422322) Time Spent: 4h 20m (was: 4h 10m) > Add consistent slowly changing side inputs support > -- > > Key: BEAM-9650 > URL: https://issues.apache.org/jira/browse/BEAM-9650 > Project: Beam > Issue Type: Bug > Components: io-ideas >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > Add implementation for slowly changing dimentions based on [design > doc](https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg/edit] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9756) beam_PostCommit_Java_Nexmark (non-Dataflow) failing
[ https://issues.apache.org/jira/browse/BEAM-9756?focusedWorklogId=422319=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422319 ] ASF GitHub Bot logged work on BEAM-9756: Author: ASF GitHub Bot Created on: 14/Apr/20 21:05 Start Date: 14/Apr/20 21:05 Worklog Time Spent: 10m Work Description: ibzib commented on issue #11417: [BEAM-9756] Nexmark: only use --region in Dataflow. URL: https://github.com/apache/beam/pull/11417#issuecomment-613681012 Run Dataflow Runner Nexmark Tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422319) Time Spent: 50m (was: 40m) > beam_PostCommit_Java_Nexmark (non-Dataflow) failing > --- > > Key: BEAM-9756 > URL: https://issues.apache.org/jira/browse/BEAM-9756 > Project: Beam > Issue Type: Bug > Components: test-failures, testing-nexmark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > 12:02:24 Exception in thread "main" java.lang.IllegalArgumentException: Class > interface org.apache.beam.sdk.nexmark.NexmarkOptions missing a property named > 'region'. > 12:02:24 at > org.apache.beam.sdk.options.PipelineOptionsFactory.parseObjects(PipelineOptionsFactory.java:1625) > 12:02:24 at > org.apache.beam.sdk.options.PipelineOptionsFactory.access$400(PipelineOptionsFactory.java:115) > 12:02:24 at > org.apache.beam.sdk.options.PipelineOptionsFactory$Builder.as(PipelineOptionsFactory.java:298) > 12:02:24 at org.apache.beam.sdk.nexmark.Main.runAll(Main.java:98) > 12:02:24 at org.apache.beam.sdk.nexmark.Main.main(Main.java:415) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9650) Add consistent slowly changing side inputs support
[ https://issues.apache.org/jira/browse/BEAM-9650?focusedWorklogId=422324=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422324 ] ASF GitHub Bot logged work on BEAM-9650: Author: ASF GitHub Bot Created on: 14/Apr/20 21:05 Start Date: 14/Apr/20 21:05 Worklog Time Spent: 10m Work Description: soyrice commented on pull request #11415: [BEAM-9650] Cleanup documentation on side inputs patterns URL: https://github.com/apache/beam/pull/11415#discussion_r408432853 ## File path: website/src/documentation/patterns/side-inputs.md ## @@ -50,25 +50,25 @@ For instance, the following code sample uses a `Map` to create a `DoFn`. The `Ma ## Slowly updating side input using windowing -You can read side input pcollection periodically into distinct windows. -Later, when you apply side input to your main input, windows will be matched automatically 1:1. -This way, you can guarantee side input consistency on the duration of the single window. - -To do this, you can utilize PeriodicSequence PTransform that will generate infinite sequence -of elements with some real-time period: +You can read side input data periodically into distinct PCollection windows. +Later, when you apply the side input to your main input, each main input +window is automatically matched to a single side input window. +This guarantees side input consistency on the duration of the single window, +meaning that each window on main input side will be matched to a single +version of side input data. + +To do this, you can utilize a combination of PeriodicSequence/PeriodicImpulse Review comment: For formatting, it might be clearer to say "a combination of PeriodicSequence and PeriodicImpulse PTransforms" instead of "PeriodicSequence/PeriodicImpulse PTransforms" - just to make sure there's no ambiguity in regards to these being two separate transforms. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422324) Time Spent: 4h 40m (was: 4.5h) > Add consistent slowly changing side inputs support > -- > > Key: BEAM-9650 > URL: https://issues.apache.org/jira/browse/BEAM-9650 > Project: Beam > Issue Type: Bug > Components: io-ideas >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > > Add implementation for slowly changing dimentions based on [design > doc](https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg/edit] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9363) BeamSQL windowing as TVF
[ https://issues.apache.org/jira/browse/BEAM-9363?focusedWorklogId=422316=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422316 ] ASF GitHub Bot logged work on BEAM-9363: Author: ASF GitHub Bot Created on: 14/Apr/20 21:01 Start Date: 14/Apr/20 21:01 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #10946: [BEAM-9363] TUMBLE as TVF URL: https://github.com/apache/beam/pull/10946#issuecomment-613679461 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422316) Time Spent: 1h 10m (was: 1h) > BeamSQL windowing as TVF > > > Key: BEAM-9363 > URL: https://issues.apache.org/jira/browse/BEAM-9363 > Project: Beam > Issue Type: New Feature > Components: dsl-sql, dsl-sql-zetasql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > This Jira tracks the implementation for > https://s.apache.org/streaming-beam-sql > TVF is table-valued function, which is a SQL feature that produce a table as > function's output. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9561) Run pandas tests with Beam Dataframe API
[ https://issues.apache.org/jira/browse/BEAM-9561?focusedWorklogId=422308=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422308 ] ASF GitHub Bot logged work on BEAM-9561: Author: ASF GitHub Bot Created on: 14/Apr/20 20:50 Start Date: 14/Apr/20 20:50 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11419: [BEAM-9561] Add a framework for running pandas doctests with beam dataframes. URL: https://github.com/apache/beam/pull/11419 R: @TheNeuralBit Depends on https://github.com/apache/beam/pull/11264 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-9737) beam_PostCommit_Website_Test failing
[ https://issues.apache.org/jira/browse/BEAM-9737?focusedWorklogId=422295=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422295 ] ASF GitHub Bot logged work on BEAM-9737: Author: ASF GitHub Bot Created on: 14/Apr/20 20:39 Start Date: 14/Apr/20 20:39 Worklog Time Spent: 10m Work Description: udim commented on issue #11386: [BEAM-9737] Fix website postcommit URL: https://github.com/apache/beam/pull/11386#issuecomment-613669573 Run Full Website Test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422295) Time Spent: 2h (was: 1h 50m) > beam_PostCommit_Website_Test failing > > > Key: BEAM-9737 > URL: https://issues.apache.org/jira/browse/BEAM-9737 > Project: Beam > Issue Type: Bug > Components: test-failures, website >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > Also failing: beam_PostCommit_Website_Publish (same failure) > {code} > > Task :website:buildLocalWebsite > `/` is not writable. > Bundler will use `/tmp/bundler/home/unknown' as your home directory > temporarily. > Configuration file: /repo/website/_config.yml > Configuration file: /repo/website/_config_test.yml > Configuration file: /tmp/_config_branch_repo.yml > Source: /repo/website/src >Destination: generated-local-content > Incremental build: enabled > Generating... > jekyll 3.6.3 | Error: Permission denied @ dir_s_mkdir - > /repo/build/website/generated-local-content/security > {code} > https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_Website_Test/3676/console > Possible culprit: https://github.com/apache/beam/pull/11232/files -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9650) Add consistent slowly changing side inputs support
[ https://issues.apache.org/jira/browse/BEAM-9650?focusedWorklogId=422291=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422291 ] ASF GitHub Bot logged work on BEAM-9650: Author: ASF GitHub Bot Created on: 14/Apr/20 20:27 Start Date: 14/Apr/20 20:27 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #11415: [BEAM-9650] Cleanup documentation on side inputs patterns URL: https://github.com/apache/beam/pull/11415#issuecomment-613597087 R: @soyrice @rosetn This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422291) Time Spent: 4h 10m (was: 4h) > Add consistent slowly changing side inputs support > -- > > Key: BEAM-9650 > URL: https://issues.apache.org/jira/browse/BEAM-9650 > Project: Beam > Issue Type: Bug > Components: io-ideas >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > Add implementation for slowly changing dimentions based on [design > doc](https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg/edit] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8872) Add support for splitting at fractions > 0 to org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker
[ https://issues.apache.org/jira/browse/BEAM-8872?focusedWorklogId=422290=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422290 ] ASF GitHub Bot logged work on BEAM-8872: Author: ASF GitHub Bot Created on: 14/Apr/20 20:26 Start Date: 14/Apr/20 20:26 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #11418: [BEAM-8872] Support split at fraction for OffsetRangeTracker URL: https://github.com/apache/beam/pull/11418 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[jira] [Commented] (BEAM-6860) WriteToText crash with "GlobalWindow -> ._IntervalWindowBase"
[ https://issues.apache.org/jira/browse/BEAM-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083583#comment-17083583 ] Chamikara Madhusanka Jayalath commented on BEAM-6860: - Please file a new Jira and assign to the author if you believe [https://github.com/apache/beam/pull/7170] broke this. > WriteToText crash with "GlobalWindow -> ._IntervalWindowBase" > - > > Key: BEAM-6860 > URL: https://issues.apache.org/jira/browse/BEAM-6860 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.11.0 > Environment: macOS, DirectRunner, python 2.7.15 via > pyenv/pyenv-virtualenv >Reporter: Henrik >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Labels: newbie > Fix For: 2.16.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Main error: > > Cannot convert GlobalWindow to > > apache_beam.utils.windowed_value._IntervalWindowBase > This is very hard for me to debug. Doing a DoPar call before, printing the > input, gives me just what I want; so the lines of data to serialise are > "alright"; just JSON strings, in fact. > Stacktrace: > {code:java} > Traceback (most recent call last): > File "./okr_end_ride.py", line 254, in > run() > File "./okr_end_ride.py", line 250, in run > run_pipeline(pipeline_options, known_args) > File "./okr_end_ride.py", line 198, in run_pipeline > | 'write_all' >> WriteToText(known_args.output, > file_name_suffix=".txt") > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py", > line 426, in __exit__ > self.run().wait_until_finish() > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py", > line 406, in run > self._options).run(False) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py", > line 419, in run > return self.runner.run_pipeline(self, self._options) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/direct/direct_runner.py", > line 132, in run_pipeline > return runner.run_pipeline(pipeline, options) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 275, in run_pipeline > default_environment=self._default_environment)) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 278, in run_via_runner_api > return self.run_stages(*self.create_stages(pipeline_proto)) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 354, in run_stages > stage_context.safe_coders) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 509, in run_stage > data_input, data_output) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 1206, in process_bundle > result_future = self._controller.control_handler.push(process_bundle) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 821, in push > response = self.worker.do_instruction(request) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 265, in do_instruction > request.instruction_id) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 281, in process_bundle > delayed_applications = bundle_processor.process_bundle(instruction_id) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 552, in process_bundle > op.finish() > File "apache_beam/runners/worker/operations.py", line 549, in > apache_beam.runners.worker.operations.DoOperation.finish > File "apache_beam/runners/worker/operations.py", line 550, in > apache_beam.runners.worker.operations.DoOperation.finish > File "apache_beam/runners/worker/operations.py", line 551, in > apache_beam.runners.worker.operations.DoOperation.finish > File "apache_beam/runners/common.py", line 758, in > apache_beam.runners.common.DoFnRunner.finish > File "apache_beam/runners/common.py", line 752,
[jira] [Work logged] (BEAM-9746) [Go SDK] Empty side inputs causing spurious zero elements
[ https://issues.apache.org/jira/browse/BEAM-9746?focusedWorklogId=422288=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422288 ] ASF GitHub Bot logged work on BEAM-9746: Author: ASF GitHub Bot Created on: 14/Apr/20 20:24 Start Date: 14/Apr/20 20:24 Worklog Time Spent: 10m Work Description: thetorpedodog commented on pull request #11413: [BEAM-9746] check for 0 length copies from state URL: https://github.com/apache/beam/pull/11413#discussion_r408412639 ## File path: sdks/go/pkg/beam/core/runtime/harness/statemgr_test.go ## @@ -258,6 +261,167 @@ func TestStateChannel(t *testing.T) { } } +// TestStateKeyReader validates ordinary Read cases +func TestStateKeyReader(t *testing.T) { + const readLen = 4 + tests := []struct { + name string + buflens []int // sizes of the buffers received on the state channel. + numReads int + closed bool // tries to read from closed reader + noGetbool // tries to read from nil get response reader + }{ + { + name: "emptyData", + buflens: []int{-1}, + numReads: 1, + }, { + name: "singleBufferSingleRead", + buflens: []int{readLen}, + numReads: 2, + }, { + name: "singleBufferMultipleReads", + buflens: []int{2 * readLen}, + numReads: 3, + }, { + name: "singleBufferShortRead", + buflens: []int{readLen - 1}, + numReads: 2, + }, { + name: "multiBuffer", + buflens: []int{readLen, readLen}, + numReads: 3, + }, { + name: "multiBuffer-short-reads", + buflens: []int{readLen - 1, readLen - 1, readLen - 2}, + numReads: 4, + }, { + name: "emptyDataFirst", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{-1, readLen, readLen}, + numReads: 4, + }, { + name: "emptyDataMid", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1, readLen}, + numReads: 5, + }, { + name: "emptyDataLast", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1}, + numReads: 3, + }, { + name: "emptyDataLast-short", + buflens: []int{3*readLen - 2, -1}, + numReads: 4, + }, { + name: "closed", + buflens: []int{-1, -1}, + numReads: 1, + closed: true, + }, { + name: "noGet", + buflens: []int{-1}, + numReads: 1, + noGet:true, + }, + } + for _, test := range tests { + t.Run(test.name, func(t *testing.T) { + ctx, cancelFn := context.WithCancel(context.Background()) + ch := { + id:"test", + requests: make(chan *fnpb.StateRequest), + responses: make(map[string]chan<- *fnpb.StateResponse), + cancelFn: cancelFn, + DoneCh:ctx.Done(), + } + + // Handle the channel behavior asynchronously. + go func() { + for i := 0; i < len(test.buflens); i++ { + token := []byte(strconv.Itoa(i)) + var buf []byte + if test.buflens[i] >= 0 { + buf = bytes.Repeat([]byte{42}, test.buflens[i]) + } + // On the last request response pair, send no token. + if i+1 == len(test.buflens) { + token = nil + } + + req := <-ch.requests + + if test.noGet { +
[jira] [Work logged] (BEAM-9746) [Go SDK] Empty side inputs causing spurious zero elements
[ https://issues.apache.org/jira/browse/BEAM-9746?focusedWorklogId=422287=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422287 ] ASF GitHub Bot logged work on BEAM-9746: Author: ASF GitHub Bot Created on: 14/Apr/20 20:24 Start Date: 14/Apr/20 20:24 Worklog Time Spent: 10m Work Description: thetorpedodog commented on pull request #11413: [BEAM-9746] check for 0 length copies from state URL: https://github.com/apache/beam/pull/11413#discussion_r408403890 ## File path: sdks/go/pkg/beam/core/runtime/harness/statemgr_test.go ## @@ -258,6 +261,167 @@ func TestStateChannel(t *testing.T) { } } +// TestStateKeyReader validates ordinary Read cases +func TestStateKeyReader(t *testing.T) { + const readLen = 4 + tests := []struct { + name string + buflens []int // sizes of the buffers received on the state channel. + numReads int + closed bool // tries to read from closed reader + noGetbool // tries to read from nil get response reader + }{ + { + name: "emptyData", + buflens: []int{-1}, + numReads: 1, + }, { + name: "singleBufferSingleRead", + buflens: []int{readLen}, + numReads: 2, + }, { + name: "singleBufferMultipleReads", + buflens: []int{2 * readLen}, + numReads: 3, + }, { + name: "singleBufferShortRead", + buflens: []int{readLen - 1}, + numReads: 2, + }, { + name: "multiBuffer", + buflens: []int{readLen, readLen}, + numReads: 3, + }, { + name: "multiBuffer-short-reads", + buflens: []int{readLen - 1, readLen - 1, readLen - 2}, + numReads: 4, + }, { + name: "emptyDataFirst", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{-1, readLen, readLen}, + numReads: 4, + }, { + name: "emptyDataMid", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1, readLen}, + numReads: 5, + }, { + name: "emptyDataLast", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1}, + numReads: 3, + }, { + name: "emptyDataLast-short", + buflens: []int{3*readLen - 2, -1}, + numReads: 4, + }, { + name: "closed", + buflens: []int{-1, -1}, + numReads: 1, + closed: true, + }, { + name: "noGet", + buflens: []int{-1}, + numReads: 1, + noGet:true, + }, + } + for _, test := range tests { + t.Run(test.name, func(t *testing.T) { + ctx, cancelFn := context.WithCancel(context.Background()) + ch := { + id:"test", + requests: make(chan *fnpb.StateRequest), + responses: make(map[string]chan<- *fnpb.StateResponse), + cancelFn: cancelFn, + DoneCh:ctx.Done(), + } + + // Handle the channel behavior asynchronously. + go func() { + for i := 0; i < len(test.buflens); i++ { + token := []byte(strconv.Itoa(i)) + var buf []byte + if test.buflens[i] >= 0 { + buf = bytes.Repeat([]byte{42}, test.buflens[i]) + } + // On the last request response pair, send no token. + if i+1 == len(test.buflens) { + token = nil + } Review comment: rearranging this so that all the initialization for this variable is together might make it easier to follow: ```go var buf
[jira] [Work logged] (BEAM-9746) [Go SDK] Empty side inputs causing spurious zero elements
[ https://issues.apache.org/jira/browse/BEAM-9746?focusedWorklogId=422286=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422286 ] ASF GitHub Bot logged work on BEAM-9746: Author: ASF GitHub Bot Created on: 14/Apr/20 20:24 Start Date: 14/Apr/20 20:24 Worklog Time Spent: 10m Work Description: thetorpedodog commented on pull request #11413: [BEAM-9746] check for 0 length copies from state URL: https://github.com/apache/beam/pull/11413#discussion_r408401030 ## File path: sdks/go/pkg/beam/core/runtime/harness/statemgr_test.go ## @@ -258,6 +261,167 @@ func TestStateChannel(t *testing.T) { } } +// TestStateKeyReader validates ordinary Read cases +func TestStateKeyReader(t *testing.T) { + const readLen = 4 + tests := []struct { + name string + buflens []int // sizes of the buffers received on the state channel. + numReads int + closed bool // tries to read from closed reader + noGetbool // tries to read from nil get response reader + }{ + { + name: "emptyData", + buflens: []int{-1}, + numReads: 1, + }, { + name: "singleBufferSingleRead", + buflens: []int{readLen}, + numReads: 2, + }, { + name: "singleBufferMultipleReads", + buflens: []int{2 * readLen}, + numReads: 3, + }, { + name: "singleBufferShortRead", + buflens: []int{readLen - 1}, + numReads: 2, + }, { + name: "multiBuffer", + buflens: []int{readLen, readLen}, + numReads: 3, + }, { + name: "multiBuffer-short-reads", + buflens: []int{readLen - 1, readLen - 1, readLen - 2}, + numReads: 4, + }, { + name: "emptyDataFirst", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{-1, readLen, readLen}, + numReads: 4, + }, { + name: "emptyDataMid", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1, readLen}, + numReads: 5, + }, { + name: "emptyDataLast", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1}, + numReads: 3, + }, { + name: "emptyDataLast-short", + buflens: []int{3*readLen - 2, -1}, + numReads: 4, + }, { + name: "closed", + buflens: []int{-1, -1}, + numReads: 1, + closed: true, + }, { + name: "noGet", + buflens: []int{-1}, + numReads: 1, + noGet:true, + }, + } + for _, test := range tests { + t.Run(test.name, func(t *testing.T) { + ctx, cancelFn := context.WithCancel(context.Background()) + ch := { + id:"test", + requests: make(chan *fnpb.StateRequest), + responses: make(map[string]chan<- *fnpb.StateResponse), + cancelFn: cancelFn, + DoneCh:ctx.Done(), + } + + // Handle the channel behavior asynchronously. + go func() { + for i := 0; i < len(test.buflens); i++ { + token := []byte(strconv.Itoa(i)) + var buf []byte + if test.buflens[i] >= 0 { + buf = bytes.Repeat([]byte{42}, test.buflens[i]) + } + // On the last request response pair, send no token. + if i+1 == len(test.buflens) { + token = nil + } + + req := <-ch.requests + + if test.noGet { +
[jira] [Work logged] (BEAM-2939) Fn API SDF support
[ https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=422279=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422279 ] ASF GitHub Bot logged work on BEAM-2939: Author: ASF GitHub Bot Created on: 14/Apr/20 20:10 Start Date: 14/Apr/20 20:10 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11409: [BEAM-2939] Update unbounded source as SDF wrapper to resume successfully. URL: https://github.com/apache/beam/pull/11409#issuecomment-613656085 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422279) Time Spent: 27h 10m (was: 27h) > Fn API SDF support > -- > > Key: BEAM-2939 > URL: https://issues.apache.org/jira/browse/BEAM-2939 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Major > Labels: portability > Time Spent: 27h 10m > Remaining Estimate: 0h > > The Fn API should support streaming SDF. Detailed design TBD. > Once design is ready, expand subtasks similarly to BEAM-2822. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage
[ https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=422278=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422278 ] ASF GitHub Bot logged work on BEAM-8889: Author: ASF GitHub Bot Created on: 14/Apr/20 20:07 Start Date: 14/Apr/20 20:07 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11381: [BEAM-8889] add gRPC suport in GCS connector (behind an experimental-flag) URL: https://github.com/apache/beam/pull/11381#issuecomment-613655133 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422278) Remaining Estimate: 138h 10m (was: 138h 20m) Time Spent: 29h 50m (was: 29h 40m) > Make GcsUtil use GoogleCloudStorage > --- > > Key: BEAM-8889 > URL: https://issues.apache.org/jira/browse/BEAM-8889 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.16.0 >Reporter: Esun Kim >Assignee: VASU NORI >Priority: Major > Labels: gcs > Original Estimate: 168h > Time Spent: 29h 50m > Remaining Estimate: 138h 10m > > [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java] > is a primary class to access Google Cloud Storage on Apache Beam. Current > implementation directly creates GoogleCloudStorageReadChannel and > GoogleCloudStorageWriteChannel by itself to read and write GCS data rather > than using > [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java] > which is an abstract class providing basic IO capability which eventually > creates channel objects. This request is about updating GcsUtil to use > GoogleCloudStorage to create read and write channel, which is expected > flexible because it can easily pick up the new change; e.g. new channel > implementation using new protocol without code change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9746) [Go SDK] Empty side inputs causing spurious zero elements
[ https://issues.apache.org/jira/browse/BEAM-9746?focusedWorklogId=422277=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422277 ] ASF GitHub Bot logged work on BEAM-9746: Author: ASF GitHub Bot Created on: 14/Apr/20 20:07 Start Date: 14/Apr/20 20:07 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #11413: [BEAM-9746] check for 0 length copies from state URL: https://github.com/apache/beam/pull/11413#discussion_r408403428 ## File path: sdks/go/pkg/beam/core/runtime/harness/statemgr_test.go ## @@ -258,6 +261,167 @@ func TestStateChannel(t *testing.T) { } } +// TestStateKeyReader validates ordinary Read cases +func TestStateKeyReader(t *testing.T) { + const readLen = 4 + tests := []struct { + name string + buflens []int // sizes of the buffers received on the state channel. + numReads int + closed bool // tries to read from closed reader + noGetbool // tries to read from nil get response reader + }{ + { + name: "emptyData", + buflens: []int{-1}, + numReads: 1, + }, { + name: "singleBufferSingleRead", + buflens: []int{readLen}, + numReads: 2, + }, { + name: "singleBufferMultipleReads", + buflens: []int{2 * readLen}, + numReads: 3, + }, { + name: "singleBufferShortRead", + buflens: []int{readLen - 1}, + numReads: 2, + }, { + name: "multiBuffer", + buflens: []int{readLen, readLen}, + numReads: 3, + }, { + name: "multiBuffer-short-reads", + buflens: []int{readLen - 1, readLen - 1, readLen - 2}, + numReads: 4, + }, { + name: "emptyDataFirst", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{-1, readLen, readLen}, + numReads: 4, + }, { + name: "emptyDataMid", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1, readLen}, + numReads: 5, + }, { + name: "emptyDataLast", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1}, + numReads: 3, + }, { + name: "emptyDataLast-short", + buflens: []int{3*readLen - 2, -1}, + numReads: 4, + }, { + name: "closed", + buflens: []int{-1, -1}, + numReads: 1, + closed: true, + }, { + name: "noGet", + buflens: []int{-1}, + numReads: 1, + noGet:true, + }, + } + for _, test := range tests { + t.Run(test.name, func(t *testing.T) { + ctx, cancelFn := context.WithCancel(context.Background()) + ch := { + id:"test", + requests: make(chan *fnpb.StateRequest), + responses: make(map[string]chan<- *fnpb.StateResponse), + cancelFn: cancelFn, + DoneCh:ctx.Done(), + } + + // Handle the channel behavior asynchronously. + go func() { + for i := 0; i < len(test.buflens); i++ { Review comment: Good catch! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422277) Time Spent: 1h 10m (was: 1h) > [Go SDK] Empty side inputs causing spurious zero elements > - > > Key: BEAM-9746 > URL: https://issues.apache.org/jira/browse/BEAM-9746 > Project: Beam > Issue Type: Improvement >
[jira] [Work logged] (BEAM-9737) beam_PostCommit_Website_Test failing
[ https://issues.apache.org/jira/browse/BEAM-9737?focusedWorklogId=422270=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422270 ] ASF GitHub Bot logged work on BEAM-9737: Author: ASF GitHub Bot Created on: 14/Apr/20 20:02 Start Date: 14/Apr/20 20:02 Worklog Time Spent: 10m Work Description: udim commented on issue #11386: [BEAM-9737] Fix website postcommit URL: https://github.com/apache/beam/pull/11386#issuecomment-613652725 Run Full Website Test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422270) Time Spent: 1h 50m (was: 1h 40m) > beam_PostCommit_Website_Test failing > > > Key: BEAM-9737 > URL: https://issues.apache.org/jira/browse/BEAM-9737 > Project: Beam > Issue Type: Bug > Components: test-failures, website >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > Also failing: beam_PostCommit_Website_Publish (same failure) > {code} > > Task :website:buildLocalWebsite > `/` is not writable. > Bundler will use `/tmp/bundler/home/unknown' as your home directory > temporarily. > Configuration file: /repo/website/_config.yml > Configuration file: /repo/website/_config_test.yml > Configuration file: /tmp/_config_branch_repo.yml > Source: /repo/website/src >Destination: generated-local-content > Incremental build: enabled > Generating... > jekyll 3.6.3 | Error: Permission denied @ dir_s_mkdir - > /repo/build/website/generated-local-content/security > {code} > https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_Website_Test/3676/console > Possible culprit: https://github.com/apache/beam/pull/11232/files -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9746) [Go SDK] Empty side inputs causing spurious zero elements
[ https://issues.apache.org/jira/browse/BEAM-9746?focusedWorklogId=422267=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422267 ] ASF GitHub Bot logged work on BEAM-9746: Author: ASF GitHub Bot Created on: 14/Apr/20 19:57 Start Date: 14/Apr/20 19:57 Worklog Time Spent: 10m Work Description: thetorpedodog commented on pull request #11413: [BEAM-9746] check for 0 length copies from state URL: https://github.com/apache/beam/pull/11413#discussion_r408397803 ## File path: sdks/go/pkg/beam/core/runtime/harness/statemgr_test.go ## @@ -258,6 +261,167 @@ func TestStateChannel(t *testing.T) { } } +// TestStateKeyReader validates ordinary Read cases +func TestStateKeyReader(t *testing.T) { + const readLen = 4 + tests := []struct { + name string + buflens []int // sizes of the buffers received on the state channel. + numReads int + closed bool // tries to read from closed reader + noGetbool // tries to read from nil get response reader + }{ + { + name: "emptyData", + buflens: []int{-1}, + numReads: 1, + }, { + name: "singleBufferSingleRead", + buflens: []int{readLen}, + numReads: 2, + }, { + name: "singleBufferMultipleReads", + buflens: []int{2 * readLen}, + numReads: 3, + }, { + name: "singleBufferShortRead", + buflens: []int{readLen - 1}, + numReads: 2, + }, { + name: "multiBuffer", + buflens: []int{readLen, readLen}, + numReads: 3, + }, { + name: "multiBuffer-short-reads", + buflens: []int{readLen - 1, readLen - 1, readLen - 2}, + numReads: 4, + }, { + name: "emptyDataFirst", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{-1, readLen, readLen}, + numReads: 4, + }, { + name: "emptyDataMid", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1, readLen}, + numReads: 5, + }, { + name: "emptyDataLast", // Shouldn't happen, but not unreasonable to handle. + buflens: []int{readLen, readLen, -1}, + numReads: 3, + }, { + name: "emptyDataLast-short", + buflens: []int{3*readLen - 2, -1}, + numReads: 4, + }, { + name: "closed", + buflens: []int{-1, -1}, + numReads: 1, + closed: true, + }, { + name: "noGet", + buflens: []int{-1}, + numReads: 1, + noGet:true, + }, + } + for _, test := range tests { + t.Run(test.name, func(t *testing.T) { + ctx, cancelFn := context.WithCancel(context.Background()) + ch := { + id:"test", + requests: make(chan *fnpb.StateRequest), + responses: make(map[string]chan<- *fnpb.StateResponse), + cancelFn: cancelFn, + DoneCh:ctx.Done(), + } + + // Handle the channel behavior asynchronously. + go func() { + for i := 0; i < len(test.buflens); i++ { Review comment: i, buflen := range test.buflens ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422267) Time Spent: 1h (was: 50m) > [Go SDK] Empty side inputs causing spurious zero elements > - > > Key: BEAM-9746 > URL: https://issues.apache.org/jira/browse/BEAM-9746 > Project: Beam > Issue
[jira] [Work logged] (BEAM-9737) beam_PostCommit_Website_Test failing
[ https://issues.apache.org/jira/browse/BEAM-9737?focusedWorklogId=422256=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422256 ] ASF GitHub Bot logged work on BEAM-9737: Author: ASF GitHub Bot Created on: 14/Apr/20 19:29 Start Date: 14/Apr/20 19:29 Worklog Time Spent: 10m Work Description: udim commented on issue #11386: [BEAM-9737] Fix website postcommit URL: https://github.com/apache/beam/pull/11386#issuecomment-613637918 Run Full Website Test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422256) Time Spent: 1h 40m (was: 1.5h) > beam_PostCommit_Website_Test failing > > > Key: BEAM-9737 > URL: https://issues.apache.org/jira/browse/BEAM-9737 > Project: Beam > Issue Type: Bug > Components: test-failures, website >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > Also failing: beam_PostCommit_Website_Publish (same failure) > {code} > > Task :website:buildLocalWebsite > `/` is not writable. > Bundler will use `/tmp/bundler/home/unknown' as your home directory > temporarily. > Configuration file: /repo/website/_config.yml > Configuration file: /repo/website/_config_test.yml > Configuration file: /tmp/_config_branch_repo.yml > Source: /repo/website/src >Destination: generated-local-content > Incremental build: enabled > Generating... > jekyll 3.6.3 | Error: Permission denied @ dir_s_mkdir - > /repo/build/website/generated-local-content/security > {code} > https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_Website_Test/3676/console > Possible culprit: https://github.com/apache/beam/pull/11232/files -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9737) beam_PostCommit_Website_Test failing
[ https://issues.apache.org/jira/browse/BEAM-9737?focusedWorklogId=422248=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-422248 ] ASF GitHub Bot logged work on BEAM-9737: Author: ASF GitHub Bot Created on: 14/Apr/20 19:22 Start Date: 14/Apr/20 19:22 Worklog Time Spent: 10m Work Description: udim commented on issue #11386: [BEAM-9737] Fix website postcommit URL: https://github.com/apache/beam/pull/11386#issuecomment-613634979 Run Full Website Test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 422248) Time Spent: 1.5h (was: 1h 20m) > beam_PostCommit_Website_Test failing > > > Key: BEAM-9737 > URL: https://issues.apache.org/jira/browse/BEAM-9737 > Project: Beam > Issue Type: Bug > Components: test-failures, website >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Also failing: beam_PostCommit_Website_Publish (same failure) > {code} > > Task :website:buildLocalWebsite > `/` is not writable. > Bundler will use `/tmp/bundler/home/unknown' as your home directory > temporarily. > Configuration file: /repo/website/_config.yml > Configuration file: /repo/website/_config_test.yml > Configuration file: /tmp/_config_branch_repo.yml > Source: /repo/website/src >Destination: generated-local-content > Incremental build: enabled > Generating... > jekyll 3.6.3 | Error: Permission denied @ dir_s_mkdir - > /repo/build/website/generated-local-content/security > {code} > https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_Website_Test/3676/console > Possible culprit: https://github.com/apache/beam/pull/11232/files -- This message was sent by Atlassian Jira (v8.3.4#803005)