[jira] [Work logged] (BEAM-9650) Add consistent slowly changing side inputs support
[ https://issues.apache.org/jira/browse/BEAM-9650?focusedWorklogId=421864&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421864 ] ASF GitHub Bot logged work on BEAM-9650: Author: ASF GitHub Bot Created on: 14/Apr/20 05:44 Start Date: 14/Apr/20 05:44 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #11182: [BEAM-9650] Add PeriodicImpulse Transform and slowly changing side input documentation URL: https://github.com/apache/beam/pull/11182#issuecomment-613237708 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421864) Time Spent: 2h 20m (was: 2h 10m) > Add consistent slowly changing side inputs support > -- > > Key: BEAM-9650 > URL: https://issues.apache.org/jira/browse/BEAM-9650 > Project: Beam > Issue Type: Bug > Components: io-ideas >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > Add implementation for slowly changing dimentions based on [design > doc](https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg/edit] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9642) Add SDF execution-time runners
[ https://issues.apache.org/jira/browse/BEAM-9642?focusedWorklogId=421815&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421815 ] ASF GitHub Bot logged work on BEAM-9642: Author: ASF GitHub Bot Created on: 14/Apr/20 03:21 Start Date: 14/Apr/20 03:21 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #11327: [BEAM-9642] Add SDF execution units. URL: https://github.com/apache/beam/pull/11327#discussion_r407842880 ## File path: sdks/go/pkg/beam/core/runtime/exec/sdf_test.go ## @@ -0,0 +1,408 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +//http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package exec + +import ( + "context" + "github.com/apache/beam/sdks/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/go/pkg/beam/core/typex" + "github.com/google/go-cmp/cmp" + "testing" +) + +// testTimestamp is a constant used to check that timestamps are retained. +const testTimestamp = 15 + +// testWindow is a constant used to check that windows are retained +var testWindows = []typex.Window{window.IntervalWindow{Start: 10, End: 20}} + +// TestSdfNodes verifies that the various SDF nodes fulfill each of their +// described contracts, that they each successfully invoke any SDF methods +// needed, and that they preserve timestamps and windows correctly. +func TestSdfNodes(t *testing.T) { + // Setup. The DoFns created below are defined in sdf_invokers_test.go and + // have testable behavior to confirm that they got correctly invoked. + // Without knowing the expected behavior of these DoFns, the desired outputs + // in the unit tests below will not make much sense. + dfn, err := graph.NewDoFn(&Sdf{}, graph.NumMainInputs(graph.MainSingle)) + if err != nil { + t.Fatalf("invalid function: %v", err) + } + kvdfn, err := graph.NewDoFn(&KvSdf{}, graph.NumMainInputs(graph.MainKv)) + if err != nil { + t.Fatalf("invalid function: %v", err) + } + + // Validate PairWithRestriction matches its contract and properly invokes + // SDF method CreateInitialRestriction. + t.Run("PairWithRestriction", func(t *testing.T) { + tests := []struct { + name string + fn *graph.DoFn + in *FullValue + want *FullValue + }{ + { + name: "SingleElem", + fn: dfn, + in: &FullValue{ + Elm: 5, + Elm2: nil, + Timestamp: testTimestamp, + Windows: testWindows, + }, + want: &FullValue{ + Elm: &FullValue{ + Elm: 5, + Elm2: nil, + Timestamp: testTimestamp, + Windows: testWindows, + }, + Elm2: Restriction{5}, + Timestamp: testTimestamp, + Windows: testWindows, + }, + }, + { + name: "KvElem", + fn: kvdfn, + in: &FullValue{ + Elm: 5, + Elm2: 2, + Timestamp: testTimestamp, + Windows: testWindows, + }, + want: &FullValue{ +
[jira] [Work logged] (BEAM-9642) Add SDF execution-time runners
[ https://issues.apache.org/jira/browse/BEAM-9642?focusedWorklogId=421820&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421820 ] ASF GitHub Bot logged work on BEAM-9642: Author: ASF GitHub Bot Created on: 14/Apr/20 03:22 Start Date: 14/Apr/20 03:22 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #11327: [BEAM-9642] Add SDF execution units. URL: https://github.com/apache/beam/pull/11327#discussion_r407843164 ## File path: sdks/go/pkg/beam/core/runtime/exec/sdf_test.go ## @@ -0,0 +1,408 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +//http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package exec + +import ( + "context" + "github.com/apache/beam/sdks/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/go/pkg/beam/core/typex" + "github.com/google/go-cmp/cmp" + "testing" +) + +// testTimestamp is a constant used to check that timestamps are retained. +const testTimestamp = 15 + +// testWindow is a constant used to check that windows are retained +var testWindows = []typex.Window{window.IntervalWindow{Start: 10, End: 20}} + +// TestSdfNodes verifies that the various SDF nodes fulfill each of their +// described contracts, that they each successfully invoke any SDF methods +// needed, and that they preserve timestamps and windows correctly. +func TestSdfNodes(t *testing.T) { + // Setup. The DoFns created below are defined in sdf_invokers_test.go and + // have testable behavior to confirm that they got correctly invoked. + // Without knowing the expected behavior of these DoFns, the desired outputs + // in the unit tests below will not make much sense. + dfn, err := graph.NewDoFn(&Sdf{}, graph.NumMainInputs(graph.MainSingle)) + if err != nil { + t.Fatalf("invalid function: %v", err) + } + kvdfn, err := graph.NewDoFn(&KvSdf{}, graph.NumMainInputs(graph.MainKv)) + if err != nil { + t.Fatalf("invalid function: %v", err) + } + + // Validate PairWithRestriction matches its contract and properly invokes + // SDF method CreateInitialRestriction. + t.Run("PairWithRestriction", func(t *testing.T) { + tests := []struct { + name string + fn *graph.DoFn + in *FullValue + want *FullValue + }{ + { + name: "SingleElem", + fn: dfn, + in: &FullValue{ + Elm: 5, + Elm2: nil, + Timestamp: testTimestamp, + Windows: testWindows, + }, + want: &FullValue{ + Elm: &FullValue{ + Elm: 5, + Elm2: nil, + Timestamp: testTimestamp, + Windows: testWindows, + }, + Elm2: Restriction{5}, + Timestamp: testTimestamp, + Windows: testWindows, + }, + }, + { + name: "KvElem", + fn: kvdfn, + in: &FullValue{ + Elm: 5, + Elm2: 2, + Timestamp: testTimestamp, + Windows: testWindows, + }, + want: &FullValue{ +
[jira] [Work logged] (BEAM-9642) Add SDF execution-time runners
[ https://issues.apache.org/jira/browse/BEAM-9642?focusedWorklogId=421814&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421814 ] ASF GitHub Bot logged work on BEAM-9642: Author: ASF GitHub Bot Created on: 14/Apr/20 03:21 Start Date: 14/Apr/20 03:21 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #11327: [BEAM-9642] Add SDF execution units. URL: https://github.com/apache/beam/pull/11327#discussion_r407829916 ## File path: sdks/go/pkg/beam/core/runtime/exec/sdf_test.go ## @@ -0,0 +1,408 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +//http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package exec + +import ( + "context" + "github.com/apache/beam/sdks/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/go/pkg/beam/core/typex" + "github.com/google/go-cmp/cmp" + "testing" +) + +// testTimestamp is a constant used to check that timestamps are retained. +const testTimestamp = 15 + +// testWindow is a constant used to check that windows are retained +var testWindows = []typex.Window{window.IntervalWindow{Start: 10, End: 20}} + +// TestSdfNodes verifies that the various SDF nodes fulfill each of their +// described contracts, that they each successfully invoke any SDF methods +// needed, and that they preserve timestamps and windows correctly. +func TestSdfNodes(t *testing.T) { + // Setup. The DoFns created below are defined in sdf_invokers_test.go and + // have testable behavior to confirm that they got correctly invoked. + // Without knowing the expected behavior of these DoFns, the desired outputs + // in the unit tests below will not make much sense. + dfn, err := graph.NewDoFn(&Sdf{}, graph.NumMainInputs(graph.MainSingle)) + if err != nil { + t.Fatalf("invalid function: %v", err) + } + kvdfn, err := graph.NewDoFn(&KvSdf{}, graph.NumMainInputs(graph.MainKv)) + if err != nil { + t.Fatalf("invalid function: %v", err) + } + + // Validate PairWithRestriction matches its contract and properly invokes + // SDF method CreateInitialRestriction. + t.Run("PairWithRestriction", func(t *testing.T) { + tests := []struct { + name string + fn *graph.DoFn + in *FullValue + want *FullValue + }{ + { + name: "SingleElem", + fn: dfn, + in: &FullValue{ + Elm: 5, + Elm2: nil, + Timestamp: testTimestamp, + Windows: testWindows, + }, + want: &FullValue{ + Elm: &FullValue{ + Elm: 5, + Elm2: nil, + Timestamp: testTimestamp, + Windows: testWindows, + }, + Elm2: Restriction{5}, + Timestamp: testTimestamp, + Windows: testWindows, + }, + }, + { + name: "KvElem", + fn: kvdfn, + in: &FullValue{ + Elm: 5, + Elm2: 2, + Timestamp: testTimestamp, + Windows: testWindows, + }, + want: &FullValue{ +
[jira] [Work logged] (BEAM-9642) Add SDF execution-time runners
[ https://issues.apache.org/jira/browse/BEAM-9642?focusedWorklogId=421816&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421816 ] ASF GitHub Bot logged work on BEAM-9642: Author: ASF GitHub Bot Created on: 14/Apr/20 03:21 Start Date: 14/Apr/20 03:21 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #11327: [BEAM-9642] Add SDF execution units. URL: https://github.com/apache/beam/pull/11327#discussion_r407827372 ## File path: sdks/go/pkg/beam/core/runtime/exec/pardo.go ## @@ -120,11 +120,17 @@ func (n *ParDo) ProcessElement(ctx context.Context, elm *FullValue, values ...Re if n.status != Active { return errors.Errorf("invalid status for pardo %v: %v, want Active", n.UID, n.status) } + + return n.ProcessMainInput(&MainInput{Key: *elm, Values: values}) +} + +func (n *ParDo) ProcessMainInput(mainIn *MainInput) error { Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421816) Time Spent: 4h (was: 3h 50m) > Add SDF execution-time runners > -- > > Key: BEAM-9642 > URL: https://issues.apache.org/jira/browse/BEAM-9642 > Project: Beam > Issue Type: Sub-task > Components: sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > Adds execution-time SDF runner units to the exec package, and any unit tests > + helpers required. > This is needed to get the expanded SDF URNs to execute in the runner harness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9642) Add SDF execution-time runners
[ https://issues.apache.org/jira/browse/BEAM-9642?focusedWorklogId=421818&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421818 ] ASF GitHub Bot logged work on BEAM-9642: Author: ASF GitHub Bot Created on: 14/Apr/20 03:21 Start Date: 14/Apr/20 03:21 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #11327: [BEAM-9642] Add SDF execution units. URL: https://github.com/apache/beam/pull/11327#discussion_r407828960 ## File path: sdks/go/pkg/beam/core/runtime/exec/sdf_invokers_test.go ## @@ -273,7 +287,9 @@ func (fn *KvSdf) CreateTracker(rest Restriction) *RTracker { return &RTracker{rest, 2} } -// ProcessElement is a no-op, it's only included to pass validation. -func (fn *KvSdf) ProcessElement(*RTracker, int, int) int { - return 0 +// ProcessElement emits two ints. The first is the first input (key) + +// RTracker.Rest.Val. The second is the second input (value) + RTracker.Val. +func (fn *KvSdf) ProcessElement(rt *RTracker, i1 int, i2 int, emit func(int, int)) { + emit(i1+rt.Rest.Val, i2+rt.Val) + return Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421818) Time Spent: 4h 10m (was: 4h) > Add SDF execution-time runners > -- > > Key: BEAM-9642 > URL: https://issues.apache.org/jira/browse/BEAM-9642 > Project: Beam > Issue Type: Sub-task > Components: sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > Adds execution-time SDF runner units to the exec package, and any unit tests > + helpers required. > This is needed to get the expanded SDF URNs to execute in the runner harness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9642) Add SDF execution-time runners
[ https://issues.apache.org/jira/browse/BEAM-9642?focusedWorklogId=421817&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421817 ] ASF GitHub Bot logged work on BEAM-9642: Author: ASF GitHub Bot Created on: 14/Apr/20 03:21 Start Date: 14/Apr/20 03:21 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #11327: [BEAM-9642] Add SDF execution units. URL: https://github.com/apache/beam/pull/11327#discussion_r407842781 ## File path: sdks/go/pkg/beam/core/runtime/exec/sdf_test.go ## @@ -0,0 +1,408 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +//http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package exec + +import ( + "context" + "github.com/apache/beam/sdks/go/pkg/beam/core/graph" + "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window" + "github.com/apache/beam/sdks/go/pkg/beam/core/typex" + "github.com/google/go-cmp/cmp" + "testing" +) + +// testTimestamp is a constant used to check that timestamps are retained. +const testTimestamp = 15 + +// testWindow is a constant used to check that windows are retained +var testWindows = []typex.Window{window.IntervalWindow{Start: 10, End: 20}} + +// TestSdfNodes verifies that the various SDF nodes fulfill each of their +// described contracts, that they each successfully invoke any SDF methods +// needed, and that they preserve timestamps and windows correctly. +func TestSdfNodes(t *testing.T) { + // Setup. The DoFns created below are defined in sdf_invokers_test.go and + // have testable behavior to confirm that they got correctly invoked. + // Without knowing the expected behavior of these DoFns, the desired outputs + // in the unit tests below will not make much sense. + dfn, err := graph.NewDoFn(&Sdf{}, graph.NumMainInputs(graph.MainSingle)) + if err != nil { + t.Fatalf("invalid function: %v", err) + } + kvdfn, err := graph.NewDoFn(&KvSdf{}, graph.NumMainInputs(graph.MainKv)) + if err != nil { + t.Fatalf("invalid function: %v", err) + } + + // Validate PairWithRestriction matches its contract and properly invokes + // SDF method CreateInitialRestriction. + t.Run("PairWithRestriction", func(t *testing.T) { + tests := []struct { + name string + fn *graph.DoFn + in *FullValue + want *FullValue + }{ + { + name: "SingleElem", + fn: dfn, + in: &FullValue{ + Elm: 5, + Elm2: nil, + Timestamp: testTimestamp, + Windows: testWindows, + }, + want: &FullValue{ + Elm: &FullValue{ + Elm: 5, + Elm2: nil, + Timestamp: testTimestamp, + Windows: testWindows, + }, + Elm2: Restriction{5}, + Timestamp: testTimestamp, + Windows: testWindows, + }, + }, + { + name: "KvElem", + fn: kvdfn, + in: &FullValue{ + Elm: 5, + Elm2: 2, + Timestamp: testTimestamp, + Windows: testWindows, + }, + want: &FullValue{ +
[jira] [Work logged] (BEAM-9642) Add SDF execution-time runners
[ https://issues.apache.org/jira/browse/BEAM-9642?focusedWorklogId=421813&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421813 ] ASF GitHub Bot logged work on BEAM-9642: Author: ASF GitHub Bot Created on: 14/Apr/20 03:21 Start Date: 14/Apr/20 03:21 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #11327: [BEAM-9642] Add SDF execution units. URL: https://github.com/apache/beam/pull/11327#discussion_r407827873 ## File path: sdks/go/pkg/beam/core/runtime/exec/sdf.go ## @@ -0,0 +1,297 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +//http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package exec + +import ( + "context" + "fmt" + "github.com/apache/beam/sdks/go/pkg/beam/core/graph" + "path" + + "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" +) + +// PairWithRestriction is an executor for the expanded SDF step of the same +// name. This is the first step of an expanded SDF. It pairs each main input +// element with a restriction via the SDF's associated sdf.RestrictionProvider. +// This step is followed by SplitAndSizeRestrictions. +type PairWithRestriction struct { + UID UnitID + Fn *graph.DoFn + Out []Node + + inv *cirInvoker +} + +// ID returns the UnitID for this unit. +func (n *PairWithRestriction) ID() UnitID { + return n.UID +} + +// Up performs one-time setup for this executor. +func (n *PairWithRestriction) Up(ctx context.Context) error { + fn := (*graph.SplittableDoFn)(n.Fn).CreateInitialRestrictionFn() + var err error + if n.inv, err = newCreateInitialRestrictionInvoker(fn); err != nil { + return errors.WithContextf(err, "PairWithRestriction transform with UID %v", n.ID()) + } + return nil +} + +// StartBundle currently does nothing. +func (n *PairWithRestriction) StartBundle(ctx context.Context, id string, data DataContext) error { + return n.Out[0].StartBundle(ctx, id, data) +} + +// ProcessElement expects elm to be the main input to the ParDo. See +// exec.FullValue for more details on the expected input. +// +// ProcessElement creates an initial restriction representing the entire input. +// The output is in the structure , where elem is the main +// input originally passed in (i.e. the parameter elm). Windows and Timestamp +// are copied to the outer *FullValue. They still remain within the original +// element as well, but will no longer be used. +// +// Output Diagram: +// +// *FullValue { +// Elm: *FullValue (original input) +// Elm2: Restriction +// Windows +// Timestamps +// } +func (n *PairWithRestriction) ProcessElement(ctx context.Context, elm *FullValue, values ...ReStream) error { + rest := n.inv.Invoke(elm) + output := FullValue{Elm: elm, Elm2: rest, Timestamp: elm.Timestamp, Windows: elm.Windows} + + return n.Out[0].ProcessElement(ctx, &output, values...) +} + +// FinishBundle does some teardown for the end of the bundle. +func (n *PairWithRestriction) FinishBundle(ctx context.Context) error { + n.inv.Reset() + return n.Out[0].FinishBundle(ctx) +} + +// Down currently does nothing. +func (n *PairWithRestriction) Down(ctx context.Context) error { + return nil +} + +// String outputs a human-readable description of this transform. +func (n *PairWithRestriction) String() string { + return fmt.Sprintf("SDF.PairWithRestriction[%v] Out:%v", path.Base(n.Fn.Name()), IDs(n.Out...)) +} + +// SplitAndSizeRestrictions is an executor for the expanded SDF step of the +// same name. It is the second step of the expanded SDF, occuring after +// CreateInitialRestriction. It performs initial splits on the initial restrictions +// and adds sizing information, producing one or more output elements per input +// element. This step is followed by ProcessSizedElementsAndRestrictions. +type SplitAndSizeRestrictions struct { + UID UnitID + Fn *graph.DoFn + Out []Node Review comment: This was me copying from the old implementation, which had all these nodes wrapping ParDos. No, these shouldn't be outputting to mo
[jira] [Work logged] (BEAM-9751) Upgrade ZetaSQL to 2020.04.1
[ https://issues.apache.org/jira/browse/BEAM-9751?focusedWorklogId=421807&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421807 ] ASF GitHub Bot logged work on BEAM-9751: Author: ASF GitHub Bot Created on: 14/Apr/20 02:50 Start Date: 14/Apr/20 02:50 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #11410: [BEAM-9751] Upgrade ZetaSQL java 2020.04.1 URL: https://github.com/apache/beam/pull/11410 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421807) Time Spent: 1h 20m (was: 1h 10m) > Upgrade ZetaSQL to 2020.04.1 > > > Key: BEAM-9751 > URL: https://issues.apache.org/jira/browse/BEAM-9751 > Project: Beam > Issue Type: Task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9751) Upgrade ZetaSQL to 2020.04.1
[ https://issues.apache.org/jira/browse/BEAM-9751?focusedWorklogId=421804&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421804 ] ASF GitHub Bot logged work on BEAM-9751: Author: ASF GitHub Bot Created on: 14/Apr/20 02:42 Start Date: 14/Apr/20 02:42 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #11410: [BEAM-9751] Upgrade ZetaSQL java 2020.04.1 URL: https://github.com/apache/beam/pull/11410#issuecomment-613195548 Run JavaBeamZetaSQL PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421804) Time Spent: 1h (was: 50m) > Upgrade ZetaSQL to 2020.04.1 > > > Key: BEAM-9751 > URL: https://issues.apache.org/jira/browse/BEAM-9751 > Project: Beam > Issue Type: Task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9751) Upgrade ZetaSQL to 2020.04.1
[ https://issues.apache.org/jira/browse/BEAM-9751?focusedWorklogId=421805&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421805 ] ASF GitHub Bot logged work on BEAM-9751: Author: ASF GitHub Bot Created on: 14/Apr/20 02:42 Start Date: 14/Apr/20 02:42 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #11410: [BEAM-9751] Upgrade ZetaSQL java 2020.04.1 URL: https://github.com/apache/beam/pull/11410#issuecomment-613195581 Run SQL PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421805) Time Spent: 1h 10m (was: 1h) > Upgrade ZetaSQL to 2020.04.1 > > > Key: BEAM-9751 > URL: https://issues.apache.org/jira/browse/BEAM-9751 > Project: Beam > Issue Type: Task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9753) Use cmp in fullvalue_test.go
Daniel Oliveira created BEAM-9753: - Summary: Use cmp in fullvalue_test.go Key: BEAM-9753 URL: https://issues.apache.org/jira/browse/BEAM-9753 Project: Beam Issue Type: Improvement Components: sdk-go Reporter: Daniel Oliveira Assignee: Daniel Oliveira We could probably update the comparison helpers in [fullvalue_test.go|https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/exec/fullvalue_test.go] to use cmp options and Transformers instead which would make things much clearer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9753) Use cmp in fullvalue_test.go
[ https://issues.apache.org/jira/browse/BEAM-9753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-9753: -- Status: Open (was: Triage Needed) > Use cmp in fullvalue_test.go > > > Key: BEAM-9753 > URL: https://issues.apache.org/jira/browse/BEAM-9753 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Minor > > We could probably update the comparison helpers in > [fullvalue_test.go|https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/exec/fullvalue_test.go] > to use cmp options and Transformers instead which would make things much > clearer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9250) Improve beam release script based on 2.19.0 release experience
[ https://issues.apache.org/jira/browse/BEAM-9250?focusedWorklogId=421789&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421789 ] ASF GitHub Bot logged work on BEAM-9250: Author: ASF GitHub Bot Created on: 14/Apr/20 01:46 Start Date: 14/Apr/20 01:46 Worklog Time Spent: 10m Work Description: stale[bot] commented on issue #10776: [BEAM-9250] Update building java_doc and py_doc URL: https://github.com/apache/beam/pull/10776#issuecomment-613181124 This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the d...@beam.apache.org list. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421789) Time Spent: 3h (was: 2h 50m) > Improve beam release script based on 2.19.0 release experience > -- > > Key: BEAM-9250 > URL: https://issues.apache.org/jira/browse/BEAM-9250 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: Not applicable > > Time Spent: 3h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9250) Improve beam release script based on 2.19.0 release experience
[ https://issues.apache.org/jira/browse/BEAM-9250?focusedWorklogId=421790&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421790 ] ASF GitHub Bot logged work on BEAM-9250: Author: ASF GitHub Bot Created on: 14/Apr/20 01:46 Start Date: 14/Apr/20 01:46 Worklog Time Spent: 10m Work Description: stale[bot] commented on issue #10772: [BEAM-9250] Re-structure python release candidate target. URL: https://github.com/apache/beam/pull/10772#issuecomment-613181131 This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the d...@beam.apache.org list. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421790) Time Spent: 3h 10m (was: 3h) > Improve beam release script based on 2.19.0 release experience > -- > > Key: BEAM-9250 > URL: https://issues.apache.org/jira/browse/BEAM-9250 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: Not applicable > > Time Spent: 3h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage
[ https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=421777&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421777 ] ASF GitHub Bot logged work on BEAM-8889: Author: ASF GitHub Bot Created on: 14/Apr/20 01:08 Start Date: 14/Apr/20 01:08 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #11381: [BEAM-8889] add gRPC suport in GCS connector (behind an experimental-flag) URL: https://github.com/apache/beam/pull/11381#issuecomment-613171185 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421777) Remaining Estimate: 138h 20m (was: 138.5h) Time Spent: 29h 40m (was: 29.5h) > Make GcsUtil use GoogleCloudStorage > --- > > Key: BEAM-8889 > URL: https://issues.apache.org/jira/browse/BEAM-8889 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.16.0 >Reporter: Esun Kim >Assignee: VASU NORI >Priority: Major > Labels: gcs > Original Estimate: 168h > Time Spent: 29h 40m > Remaining Estimate: 138h 20m > > [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java] > is a primary class to access Google Cloud Storage on Apache Beam. Current > implementation directly creates GoogleCloudStorageReadChannel and > GoogleCloudStorageWriteChannel by itself to read and write GCS data rather > than using > [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java] > which is an abstract class providing basic IO capability which eventually > creates channel objects. This request is about updating GcsUtil to use > GoogleCloudStorage to create read and write channel, which is expected > flexible because it can easily pick up the new change; e.g. new channel > implementation using new protocol without code change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage
[ https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=421776&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421776 ] ASF GitHub Bot logged work on BEAM-8889: Author: ASF GitHub Bot Created on: 14/Apr/20 01:07 Start Date: 14/Apr/20 01:07 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #11381: [BEAM-8889] add gRPC suport in GCS connector (behind an experimental-flag) URL: https://github.com/apache/beam/pull/11381#issuecomment-613171121 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421776) Remaining Estimate: 138.5h (was: 138h 40m) Time Spent: 29.5h (was: 29h 20m) > Make GcsUtil use GoogleCloudStorage > --- > > Key: BEAM-8889 > URL: https://issues.apache.org/jira/browse/BEAM-8889 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.16.0 >Reporter: Esun Kim >Assignee: VASU NORI >Priority: Major > Labels: gcs > Original Estimate: 168h > Time Spent: 29.5h > Remaining Estimate: 138.5h > > [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java] > is a primary class to access Google Cloud Storage on Apache Beam. Current > implementation directly creates GoogleCloudStorageReadChannel and > GoogleCloudStorageWriteChannel by itself to read and write GCS data rather > than using > [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java] > which is an abstract class providing basic IO capability which eventually > creates channel objects. This request is about updating GcsUtil to use > GoogleCloudStorage to create read and write channel, which is expected > flexible because it can easily pick up the new change; e.g. new channel > implementation using new protocol without code change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9751) Upgrade ZetaSQL to 2020.04.1
[ https://issues.apache.org/jira/browse/BEAM-9751?focusedWorklogId=421775&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421775 ] ASF GitHub Bot logged work on BEAM-9751: Author: ASF GitHub Bot Created on: 14/Apr/20 01:07 Start Date: 14/Apr/20 01:07 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #11410: [BEAM-9751] Upgrade ZetaSQL java 2020.04.1 URL: https://github.com/apache/beam/pull/11410#issuecomment-613170968 Run SQL PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421775) Time Spent: 50m (was: 40m) > Upgrade ZetaSQL to 2020.04.1 > > > Key: BEAM-9751 > URL: https://issues.apache.org/jira/browse/BEAM-9751 > Project: Beam > Issue Type: Task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9738) Python Dataflow runner omits capabilities.
[ https://issues.apache.org/jira/browse/BEAM-9738?focusedWorklogId=421771&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421771 ] ASF GitHub Bot logged work on BEAM-9738: Author: ASF GitHub Bot Created on: 14/Apr/20 00:57 Start Date: 14/Apr/20 00:57 Worklog Time Spent: 10m Work Description: robertwb commented on issue #11390: [BEAM-9738] Update dataflow to setup correct docker environment options. URL: https://github.com/apache/beam/pull/11390#issuecomment-613168347 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421771) Time Spent: 2h (was: 1h 50m) > Python Dataflow runner omits capabilities. > -- > > Key: BEAM-9738 > URL: https://issues.apache.org/jira/browse/BEAM-9738 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Blocker > Fix For: 2.21.0 > > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9752) Too many shards in GCS
Ankur Goenka created BEAM-9752: -- Summary: Too many shards in GCS Key: BEAM-9752 URL: https://issues.apache.org/jira/browse/BEAM-9752 Project: Beam Issue Type: Bug Components: sdk-py-core, sdk-py-harness Reporter: Ankur Goenka We have observed case where the data was spread very thinly over automatically computed number of shards. This caused wait for the buffers to fill before sending the data over to gcs causing upload timeout as we did not upload any data for while waiting. However, by setting an explicit number of shards (1000 in my case) solved this problem potentially because all the shards had enough data to fill the buffer write avoiding timeout. We can improve the sharding logic so that we don't create too many shards. Alternatively, we can improve connection handling so that the connection does not timeout. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9751) Upgrade ZetaSQL to 2020.04.1
[ https://issues.apache.org/jira/browse/BEAM-9751?focusedWorklogId=421757&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421757 ] ASF GitHub Bot logged work on BEAM-9751: Author: ASF GitHub Bot Created on: 14/Apr/20 00:07 Start Date: 14/Apr/20 00:07 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #11410: [BEAM-9751] Upgrade ZetaSQL java 2020.04.1 URL: https://github.com/apache/beam/pull/11410#discussion_r407789393 ## File path: sdks/java/extensions/sql/zetasql/build.gradle ## @@ -20,12 +20,18 @@ plugins { id 'org.apache.beam.module' } +repositories { Review comment: Can't merge with this block. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421757) Time Spent: 40m (was: 0.5h) > Upgrade ZetaSQL to 2020.04.1 > > > Key: BEAM-9751 > URL: https://issues.apache.org/jira/browse/BEAM-9751 > Project: Beam > Issue Type: Task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9743) TFRecordCodec not attempt to fully read/write
[ https://issues.apache.org/jira/browse/BEAM-9743?focusedWorklogId=421754&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421754 ] ASF GitHub Bot logged work on BEAM-9743: Author: ASF GitHub Bot Created on: 13/Apr/20 23:54 Start Date: 13/Apr/20 23:54 Worklog Time Spent: 10m Work Description: lukemin89 commented on issue #11397: [BEAM-9743] Fix TFRecordCodec to try harder to read/write URL: https://github.com/apache/beam/pull/11397#issuecomment-613152729 R: @lukecwik This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421754) Time Spent: 2h (was: 1h 50m) > TFRecordCodec not attempt to fully read/write > - > > Key: BEAM-9743 > URL: https://issues.apache.org/jira/browse/BEAM-9743 > Project: Beam > Issue Type: Bug > Components: io-java-tfrecord, sdk-java-core >Reporter: Kyoungha Min >Assignee: Kyoungha Min >Priority: Critical > Time Spent: 2h > Remaining Estimate: 0h > > The same issue has been pointed out and the issues were marked resolved. But > they were still remaining parts > https://issues.apache.org/jira/browse/BEAM-5412?jql=text%20~%20%22tfrecord%22 > > Issue # 1: TFRecordCodec only tries once to read the header/footer. This is > likely to fail around the end of channel buffer. > Issue # 2: (minor) TFRecordCodec currently does not checks how much it > writes. > > Seems like it only happens with Zstd compression (or any other picky input > stream that refuse to read fully). ZstdInputStream seems very picky at giving > out data. > The parts with the issue are > [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L672] > [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L699] > > And not so problem within the beam application (As all (or most) of > WritableByteChannels in beam-java-sdk-core are backed by some OutputStream), > but still not following the WritableByteChannel specification, > [https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L720-L727] > > ReadableByteChannel/WritableByteChannel Javadoc specifies that they are not > required to read/write fully, and can refuse to read/write time to time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9709) timezone off by 8 hours
[ https://issues.apache.org/jira/browse/BEAM-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082720#comment-17082720 ] Andrew Pilloud edited comment on BEAM-9709 at 4/13/20, 11:49 PM: - That test appears to be asserting the default timezone is not UTC (it has the wrong "right" answer for Beam, but right answer for ZetaSQL): https://github.com/google/zetasql/blob/7d983d3632702f200c8340933160c02f1d94e5a7/javatests/com/google/zetasql/PreparedExpressionTest.java#L370 was (Author: apilloud): That test appears to be asserting the default timezone is not UTC (it has the wrong "right" answer): https://github.com/google/zetasql/blob/7d983d3632702f200c8340933160c02f1d94e5a7/javatests/com/google/zetasql/PreparedExpressionTest.java#L370 > timezone off by 8 hours > --- > > Key: BEAM-9709 > URL: https://issues.apache.org/jira/browse/BEAM-9709 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Yueyang Qiu >Priority: Trivial > Labels: zetasql-compliance > > two failures in shard 13, one failure in shard 19 > {code} > Expected: ARRAY>[{2014-01-31 00:00:00+00}] > Actual: ARRAY>[{2014-01-31 08:00:00+00}], > {code} > {code} > select timestamp(date '2014-01-31') > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9709) timezone off by 8 hours
[ https://issues.apache.org/jira/browse/BEAM-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082745#comment-17082745 ] Yueyang Qiu edited comment on BEAM-9709 at 4/13/20, 11:47 PM: -- OK. I agree this is a bug in ZetaSQL. I run {code:java} String expr = "cast(timestamp('2015-04-01') as string)"; try (PreparedExpression exp = new PreparedExpression(expr)) { AnalyzerOptions options = new AnalyzerOptions(); options.setDefaultTimezone("Asia/Shanghai"); exp.prepare(options); Value value = exp.execute(); System.out.println(value.getStringValue()); } {code} and the result is {code:java} 2015-04-01 00:00:00-07 {code} which means setDefaultTimezone() does something wrong while running the timestamp() constructor, but it works fine while running cast() (see the second part of the test: [https://github.com/google/zetasql/blob/7d983d3632702f200c8340933160c02f1d94e5a7/javatests/com/google/zetasql/PreparedExpressionTest.java#L375], the first part seems to be useless because the default time zone can be anything according to ZetaSQL spec; it does not have to be UTC) was (Author: robinyqiu): OK. I agree this is a bug in ZetaSQL. I run {code:java} String expr = "cast(timestamp('2015-04-01') as string)"; try (PreparedExpression exp = new PreparedExpression(expr)) { AnalyzerOptions options = new AnalyzerOptions(); options.setDefaultTimezone("Asia/Shanghai"); exp.prepare(options); Value value = exp.execute(); System.out.println(value.getStringValue()); } {code} and the result is {code:java} 2015-04-01 00:00:00-07 {code} which means setDefaultTimezone() does something wrong while running the timestamp() constructor, but it works fine while running cast() (see the second part of the test: [https://github.com/google/zetasql/blob/7d983d3632702f200c8340933160c02f1d94e5a7/javatests/com/google/zetasql/PreparedExpressionTest.java#L375]) > timezone off by 8 hours > --- > > Key: BEAM-9709 > URL: https://issues.apache.org/jira/browse/BEAM-9709 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Yueyang Qiu >Priority: Trivial > Labels: zetasql-compliance > > two failures in shard 13, one failure in shard 19 > {code} > Expected: ARRAY>[{2014-01-31 00:00:00+00}] > Actual: ARRAY>[{2014-01-31 08:00:00+00}], > {code} > {code} > select timestamp(date '2014-01-31') > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9692) Clean Python DataflowRunner to use portable pipelines
[ https://issues.apache.org/jira/browse/BEAM-9692?focusedWorklogId=421749&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421749 ] ASF GitHub Bot logged work on BEAM-9692: Author: ASF GitHub Bot Created on: 13/Apr/20 23:47 Start Date: 13/Apr/20 23:47 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #11335: [BEAM-9692]: Make CombineValues portable URL: https://github.com/apache/beam/pull/11335#discussion_r407783064 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -110,22 +110,27 @@ class DataflowRunner(PipelineRunner): # Imported here to avoid circular dependencies. # TODO: Remove the apache_beam.pipeline dependency in CreatePTransformOverride + from apache_beam.runners.dataflow.ptransform_overrides import CombineValuesPTransformOverride from apache_beam.runners.dataflow.ptransform_overrides import CreatePTransformOverride from apache_beam.runners.dataflow.ptransform_overrides import ReadPTransformOverride from apache_beam.runners.dataflow.ptransform_overrides import JrhReadPTransformOverride - _PTRANSFORM_OVERRIDES = [] # type: List[PTransformOverride] + # Thesse overrides should be applied before the proto representation of the + # graph is created. + _PTRANSFORM_OVERRIDES = [ + CombineValuesPTransformOverride() Review comment: This override should place the pipeline object into the same state as if the runner had defined an apply_CombineValues, what am I missing? Looking at the code, is it because other overrides might also use a CombineValues transform so it might needed to be replaced again? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421749) Time Spent: 1h 10m (was: 1h) > Clean Python DataflowRunner to use portable pipelines > - > > Key: BEAM-9692 > URL: https://issues.apache.org/jira/browse/BEAM-9692 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9709) timezone off by 8 hours
[ https://issues.apache.org/jira/browse/BEAM-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082745#comment-17082745 ] Yueyang Qiu edited comment on BEAM-9709 at 4/13/20, 11:46 PM: -- OK. I agree this is a bug in ZetaSQL. I run {code:java} String expr = "cast(timestamp('2015-04-01') as string)"; try (PreparedExpression exp = new PreparedExpression(expr)) { AnalyzerOptions options = new AnalyzerOptions(); options.setDefaultTimezone("Asia/Shanghai"); exp.prepare(options); Value value = exp.execute(); System.out.println(value.getStringValue()); } {code} and the result is {code:java} 2015-04-01 00:00:00-07 {code} which means setDefaultTimezone() does something wrong while running the timestamp() constructor, but it works fine while running cast() (see the second part of the test: [https://github.com/google/zetasql/blob/7d983d3632702f200c8340933160c02f1d94e5a7/javatests/com/google/zetasql/PreparedExpressionTest.java#L375]) was (Author: robinyqiu): OK. This seems to be a bug in ZetaSQL. I run {code:java} String expr = "cast(timestamp('2015-04-01') as string)"; try (PreparedExpression exp = new PreparedExpression(expr)) { AnalyzerOptions options = new AnalyzerOptions(); options.setDefaultTimezone("Asia/Shanghai"); exp.prepare(options); Value value = exp.execute(); System.out.println(value.getStringValue()); } {code} and the result is {code:java} 2015-04-01 00:00:00-07 {code} which means setDefaultTimezone() does something wrong while running the timestamp() constructor, but it works fine while running cast() (see the second part of the test: https://github.com/google/zetasql/blob/7d983d3632702f200c8340933160c02f1d94e5a7/javatests/com/google/zetasql/PreparedExpressionTest.java#L375) > timezone off by 8 hours > --- > > Key: BEAM-9709 > URL: https://issues.apache.org/jira/browse/BEAM-9709 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Yueyang Qiu >Priority: Trivial > Labels: zetasql-compliance > > two failures in shard 13, one failure in shard 19 > {code} > Expected: ARRAY>[{2014-01-31 00:00:00+00}] > Actual: ARRAY>[{2014-01-31 08:00:00+00}], > {code} > {code} > select timestamp(date '2014-01-31') > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=421746&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421746 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 13/Apr/20 23:44 Start Date: 13/Apr/20 23:44 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#issuecomment-613150119 this looks fine to me as long as the dependency changes look fine to @lukecwik This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421746) Time Spent: 26.5h (was: 26h 20m) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 26.5h > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7923) Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-7923?focusedWorklogId=421745&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421745 ] ASF GitHub Bot logged work on BEAM-7923: Author: ASF GitHub Bot Created on: 13/Apr/20 23:43 Start Date: 13/Apr/20 23:43 Worklog Time Spent: 10m Work Description: aaltay commented on issue #11338: [BEAM-7923] Screendiff Integration Tests URL: https://github.com/apache/beam/pull/11338#issuecomment-613149907 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421745) Time Spent: 19h 40m (was: 19.5h) > Interactive Beam > > > Key: BEAM-7923 > URL: https://issues.apache.org/jira/browse/BEAM-7923 > Project: Beam > Issue Type: New Feature > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 19h 40m > Remaining Estimate: 0h > > This is the top level ticket for all efforts leveraging [interactive > Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]] > As the development goes, blocking tickets will be added to this one. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9709) timezone off by 8 hours
[ https://issues.apache.org/jira/browse/BEAM-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082704#comment-17082704 ] Yueyang Qiu edited comment on BEAM-9709 at 4/13/20, 11:43 PM: -- (Previous comments are wrong. Writing the right version.) Thank you Andrew for linking to the other bug. I took that as well. I think that one is easier to fix after this one is fixed. was (Author: robinyqiu): (Previous comments are wrong. Writing the right version.) Thank you Andrew for linking to the other bug. I took that as well. I think these 2 issues could be fixed together. > timezone off by 8 hours > --- > > Key: BEAM-9709 > URL: https://issues.apache.org/jira/browse/BEAM-9709 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Yueyang Qiu >Priority: Trivial > Labels: zetasql-compliance > > two failures in shard 13, one failure in shard 19 > {code} > Expected: ARRAY>[{2014-01-31 00:00:00+00}] > Actual: ARRAY>[{2014-01-31 08:00:00+00}], > {code} > {code} > select timestamp(date '2014-01-31') > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9709) timezone off by 8 hours
[ https://issues.apache.org/jira/browse/BEAM-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082745#comment-17082745 ] Yueyang Qiu commented on BEAM-9709: --- OK. This seems to be a bug in ZetaSQL. I run {code:java} String expr = "cast(timestamp('2015-04-01') as string)"; try (PreparedExpression exp = new PreparedExpression(expr)) { AnalyzerOptions options = new AnalyzerOptions(); options.setDefaultTimezone("Asia/Shanghai"); exp.prepare(options); Value value = exp.execute(); System.out.println(value.getStringValue()); } {code} and the result is {code:java} 2015-04-01 00:00:00-07 {code} which means setDefaultTimezone() does something wrong while running the timestamp() constructor, but it works fine while running cast() (see the second part of the test: https://github.com/google/zetasql/blob/7d983d3632702f200c8340933160c02f1d94e5a7/javatests/com/google/zetasql/PreparedExpressionTest.java#L375) > timezone off by 8 hours > --- > > Key: BEAM-9709 > URL: https://issues.apache.org/jira/browse/BEAM-9709 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Yueyang Qiu >Priority: Trivial > Labels: zetasql-compliance > > two failures in shard 13, one failure in shard 19 > {code} > Expected: ARRAY>[{2014-01-31 00:00:00+00}] > Actual: ARRAY>[{2014-01-31 08:00:00+00}], > {code} > {code} > select timestamp(date '2014-01-31') > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9738) Python Dataflow runner omits capabilities.
[ https://issues.apache.org/jira/browse/BEAM-9738?focusedWorklogId=421742&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421742 ] ASF GitHub Bot logged work on BEAM-9738: Author: ASF GitHub Bot Created on: 13/Apr/20 23:40 Start Date: 13/Apr/20 23:40 Worklog Time Spent: 10m Work Description: robertwb commented on issue #11390: [BEAM-9738] Update dataflow to setup correct docker environment options. URL: https://github.com/apache/beam/pull/11390#issuecomment-613149239 This now contains the fixes from the cherry-pick. Letting tests run again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421742) Time Spent: 1h 50m (was: 1h 40m) > Python Dataflow runner omits capabilities. > -- > > Key: BEAM-9738 > URL: https://issues.apache.org/jira/browse/BEAM-9738 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Blocker > Fix For: 2.21.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9748) Move Reparallelize transform to Reshuffle
[ https://issues.apache.org/jira/browse/BEAM-9748?focusedWorklogId=421739&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421739 ] ASF GitHub Bot logged work on BEAM-9748: Author: ASF GitHub Bot Created on: 13/Apr/20 23:32 Start Date: 13/Apr/20 23:32 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #11406: [BEAM-9748] Move Reparallelize transform to Reshuffle URL: https://github.com/apache/beam/pull/11406#discussion_r407778247 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reshuffle.java ## @@ -65,6 +66,11 @@ private Reshuffle() {} return new ViaRandomKey<>(); } + @Experimental + public static Reparallelize reparallelize() { Review comment: Yes I will improve the javadoc, maybe move some of the details in the impl. comment below. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421739) Time Spent: 1h 20m (was: 1h 10m) > Move Reparallelize transform to Reshuffle > - > > Key: BEAM-9748 > URL: https://issues.apache.org/jira/browse/BEAM-9748 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > Some DoFn based IOs like JdbcIO and RedisIO rely on the Reparallelize > transform, > a combination of a an empty PCollectionView and Reshuffle to force the > materialization and reparallelize a PCollection. The idea of this issue is to > extract this transform and expose it as part of the internal Reshuffle > transform to avoid repeating the code for transforms (notably IOs) that > require > to reparallelize its output. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9748) Move Reparallelize transform to Reshuffle
[ https://issues.apache.org/jira/browse/BEAM-9748?focusedWorklogId=421740&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421740 ] ASF GitHub Bot logged work on BEAM-9748: Author: ASF GitHub Bot Created on: 13/Apr/20 23:32 Start Date: 13/Apr/20 23:32 Worklog Time Spent: 10m Work Description: jkff commented on issue #11406: [BEAM-9748] Move Reparallelize transform to Reshuffle URL: https://github.com/apache/beam/pull/11406#issuecomment-613147215 The comments inside Reparallelize explain how this transform differs from Reshuffle.viaRandomKey(): it performs dramatically better on Dataflow in case the input PCollection is generated highly sequentially, as in the case of reading several GB of JDBC results. It almost certainly performs somewhat worse if the input PCollection is generated in a well-parallelized way, but I haven't measured that; I haven't measured the former case for non-Dataflow runners either. I think it's reasonable to move this to Reshuffle, but rename it to something more clear: maybe Reshuffle.forSequentiallyGeneratedInput()? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421740) Time Spent: 1.5h (was: 1h 20m) > Move Reparallelize transform to Reshuffle > - > > Key: BEAM-9748 > URL: https://issues.apache.org/jira/browse/BEAM-9748 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > Some DoFn based IOs like JdbcIO and RedisIO rely on the Reparallelize > transform, > a combination of a an empty PCollectionView and Reshuffle to force the > materialization and reparallelize a PCollection. The idea of this issue is to > extract this transform and expose it as part of the internal Reshuffle > transform to avoid repeating the code for transforms (notably IOs) that > require > to reparallelize its output. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9748) Move Reparallelize transform to Reshuffle
[ https://issues.apache.org/jira/browse/BEAM-9748?focusedWorklogId=421738&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421738 ] ASF GitHub Bot logged work on BEAM-9748: Author: ASF GitHub Bot Created on: 13/Apr/20 23:31 Start Date: 13/Apr/20 23:31 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #11406: [BEAM-9748] Move Reparallelize transform to Reshuffle URL: https://github.com/apache/beam/pull/11406#discussion_r407778080 ## File path: sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisIO.java ## @@ -309,7 +305,7 @@ public ReadAll withOutputParallelization(boolean outputParallelization) { .apply(ParDo.of(new ReadFn(connectionConfiguration(), batchSize( .setCoder(KvCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of())); if (outputParallelization()) { -output = output.apply(new Reparallelize()); +output = (PCollection>) output.apply(Reshuffle.reparallelize()); Review comment: Not necessary, I will remove it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421738) Time Spent: 1h 10m (was: 1h) > Move Reparallelize transform to Reshuffle > - > > Key: BEAM-9748 > URL: https://issues.apache.org/jira/browse/BEAM-9748 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > Some DoFn based IOs like JdbcIO and RedisIO rely on the Reparallelize > transform, > a combination of a an empty PCollectionView and Reshuffle to force the > materialization and reparallelize a PCollection. The idea of this issue is to > extract this transform and expose it as part of the internal Reshuffle > transform to avoid repeating the code for transforms (notably IOs) that > require > to reparallelize its output. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9748) Move Reparallelize transform to Reshuffle
[ https://issues.apache.org/jira/browse/BEAM-9748?focusedWorklogId=421737&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421737 ] ASF GitHub Bot logged work on BEAM-9748: Author: ASF GitHub Bot Created on: 13/Apr/20 23:30 Start Date: 13/Apr/20 23:30 Worklog Time Spent: 10m Work Description: iemejia commented on issue #11406: [BEAM-9748] Move Reparallelize transform to Reshuffle URL: https://github.com/apache/beam/pull/11406#issuecomment-613144797 > Why do we want to do this over Reshuffle.viaRandomKeys which should get us the output parallelization we want? I think that's the case, maybe @jkff who created that code may confirm. There seems to be some more details in the implementation choice in the original ticket [BEAM-2803](https://issues.apache.org/jira/browse/BEAM-2803) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421737) Time Spent: 1h (was: 50m) > Move Reparallelize transform to Reshuffle > - > > Key: BEAM-9748 > URL: https://issues.apache.org/jira/browse/BEAM-9748 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > Some DoFn based IOs like JdbcIO and RedisIO rely on the Reparallelize > transform, > a combination of a an empty PCollectionView and Reshuffle to force the > materialization and reparallelize a PCollection. The idea of this issue is to > extract this transform and expose it as part of the internal Reshuffle > transform to avoid repeating the code for transforms (notably IOs) that > require > to reparallelize its output. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9748) Move Reparallelize transform to Reshuffle
[ https://issues.apache.org/jira/browse/BEAM-9748?focusedWorklogId=421734&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421734 ] ASF GitHub Bot logged work on BEAM-9748: Author: ASF GitHub Bot Created on: 13/Apr/20 23:24 Start Date: 13/Apr/20 23:24 Worklog Time Spent: 10m Work Description: iemejia commented on issue #11406: [BEAM-9748] Move Reparallelize transform to Reshuffle URL: https://github.com/apache/beam/pull/11406#issuecomment-613144797 > Why do we want to do this over Reshuffle.viaRandomKeys which should get us the output parallelization we want? I think that's the case, maybe @jkff who created that code may confirm. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421734) Time Spent: 50m (was: 40m) > Move Reparallelize transform to Reshuffle > - > > Key: BEAM-9748 > URL: https://issues.apache.org/jira/browse/BEAM-9748 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > Some DoFn based IOs like JdbcIO and RedisIO rely on the Reparallelize > transform, > a combination of a an empty PCollectionView and Reshuffle to force the > materialization and reparallelize a PCollection. The idea of this issue is to > extract this transform and expose it as part of the internal Reshuffle > transform to avoid repeating the code for transforms (notably IOs) that > require > to reparallelize its output. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9748) Move Reparallelize transform to Reshuffle
[ https://issues.apache.org/jira/browse/BEAM-9748?focusedWorklogId=421733&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421733 ] ASF GitHub Bot logged work on BEAM-9748: Author: ASF GitHub Bot Created on: 13/Apr/20 23:24 Start Date: 13/Apr/20 23:24 Worklog Time Spent: 10m Work Description: iemejia commented on issue #11406: [BEAM-9748] Move Reparallelize transform to Reshuffle URL: https://github.com/apache/beam/pull/11406#issuecomment-613144797 > Why do we want to do this over Reshuffle.viaRandomKeys which should get us the output parallelization we want? I think that's the case, maybe @jkff who created that code may confirm. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421733) Time Spent: 40m (was: 0.5h) > Move Reparallelize transform to Reshuffle > - > > Key: BEAM-9748 > URL: https://issues.apache.org/jira/browse/BEAM-9748 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > Some DoFn based IOs like JdbcIO and RedisIO rely on the Reparallelize > transform, > a combination of a an empty PCollectionView and Reshuffle to force the > materialization and reparallelize a PCollection. The idea of this issue is to > extract this transform and expose it as part of the internal Reshuffle > transform to avoid repeating the code for transforms (notably IOs) that > require > to reparallelize its output. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9747) Deprecate RedisIO.readAll() and add RedisIO.readKeyPatterns() as a replacement
[ https://issues.apache.org/jira/browse/BEAM-9747?focusedWorklogId=421732&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421732 ] ASF GitHub Bot logged work on BEAM-9747: Author: ASF GitHub Bot Created on: 13/Apr/20 23:22 Start Date: 13/Apr/20 23:22 Worklog Time Spent: 10m Work Description: iemejia commented on issue #11405: [BEAM-9747] Deprecate RedisIO.readAll() and add RedisIO.readKeyPatterns as a replacement URL: https://github.com/apache/beam/pull/11405#issuecomment-613144141 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421732) Time Spent: 20m (was: 10m) > Deprecate RedisIO.readAll() and add RedisIO.readKeyPatterns() as a replacement > -- > > Key: BEAM-9747 > URL: https://issues.apache.org/jira/browse/BEAM-9747 > Project: Beam > Issue Type: Improvement > Components: io-java-redis >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > The current RedisIO.ReadAll transform does not follow the ReadAll pattern > introduced by other IOs like HBaseIO, SolrIO and soon CassandraIO. We should > change current ReadAll into ReadKeyPatterns to avoid confusion to be able to > provide a > consistent ReadAll transform BEAM-9403 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9751) Upgrade ZetaSQL to 2020.04.1
[ https://issues.apache.org/jira/browse/BEAM-9751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9751: --- Status: Open (was: Triage Needed) > Upgrade ZetaSQL to 2020.04.1 > > > Key: BEAM-9751 > URL: https://issues.apache.org/jira/browse/BEAM-9751 > Project: Beam > Issue Type: Task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9750) Streaming Word Count Example Documents is out of date (Python)
[ https://issues.apache.org/jira/browse/BEAM-9750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9750: --- Status: Open (was: Triage Needed) > Streaming Word Count Example Documents is out of date (Python) > -- > > Key: BEAM-9750 > URL: https://issues.apache.org/jira/browse/BEAM-9750 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, website >Reporter: Ahmet Altay >Assignee: Rose Nguyen >Priority: Major > > Flink runners are listed as "This runner is not yet available for the Python > SDK." This is not accurate., Flink runner supports streaming with python. > Link: > https://beam.apache.org/get-started/wordcount-example/#streamingwordcount-example > /cc [~ibzib] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9738) Python Dataflow runner omits capabilities.
[ https://issues.apache.org/jira/browse/BEAM-9738?focusedWorklogId=421731&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421731 ] ASF GitHub Bot logged work on BEAM-9738: Author: ASF GitHub Bot Created on: 13/Apr/20 23:14 Start Date: 13/Apr/20 23:14 Worklog Time Spent: 10m Work Description: ibzib commented on issue #11390: [BEAM-9738] Update dataflow to setup correct docker environment options. URL: https://github.com/apache/beam/pull/11390#issuecomment-613141201 @robertwb looks like some test failures are related, could you please take a look? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421731) Time Spent: 1h 40m (was: 1.5h) > Python Dataflow runner omits capabilities. > -- > > Key: BEAM-9738 > URL: https://issues.apache.org/jira/browse/BEAM-9738 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Blocker > Fix For: 2.21.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9738) Python Dataflow runner omits capabilities.
[ https://issues.apache.org/jira/browse/BEAM-9738?focusedWorklogId=421730&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421730 ] ASF GitHub Bot logged work on BEAM-9738: Author: ASF GitHub Bot Created on: 13/Apr/20 23:12 Start Date: 13/Apr/20 23:12 Worklog Time Spent: 10m Work Description: ibzib commented on issue #11390: [BEAM-9738] Update dataflow to setup correct docker environment options. URL: https://github.com/apache/beam/pull/11390#issuecomment-613141201 @robertwb looks like the test failure is related, could you please take a look? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421730) Time Spent: 1.5h (was: 1h 20m) > Python Dataflow runner omits capabilities. > -- > > Key: BEAM-9738 > URL: https://issues.apache.org/jira/browse/BEAM-9738 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Blocker > Fix For: 2.21.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9745) [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to deserialize Custom DoFns and Custom Coders.
[ https://issues.apache.org/jira/browse/BEAM-9745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira reassigned BEAM-9745: - Assignee: Kyle Weaver (was: Daniel Oliveira) Handing it to Kyle who is likely going to be rebuilding the worker in the next few days. Feel free to hand it back to me if that doesn't fix it. > [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to > deserialize Custom DoFns and Custom Coders. > - > > Key: BEAM-9745 > URL: https://issues.apache.org/jira/browse/BEAM-9745 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, java-fn-execution, sdk-java-harness, > test-failures >Reporter: Daniel Oliveira >Assignee: Kyle Weaver >Priority: Blocker > Labels: currently-failing > Fix For: 2.21.0 > > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4657/] > * [Gradle Build > Scan|https://scans.gradle.com/s/c3izncsa4u24k/tests/by-project] > Initial investigation: > The bug appears to be popping up on BigQuery tests mostly, but also a > BigTable and a Datastore test. > Here's an example stacktrace of the two errors, showing _only_ the error > messages themselves. Source: > [https://scans.gradle.com/s/c3izncsa4u24k/tests/efn4wciuamvqq-ccxt3jvofvqbe] > {noformat} > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error > received from SDK harness for instruction -191: > java.lang.IllegalArgumentException: unable to deserialize Custom DoFn With > Execution Info > ... > Caused by: java.lang.ClassNotFoundException: > org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3 > ... > Caused by: java.lang.RuntimeException: Error received from SDK harness for > instruction -191: java.lang.IllegalArgumentException: unable to deserialize > Custom DoFn With Execution Info > ... > Caused by: java.lang.ClassNotFoundException: > org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3 > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error > received from SDK harness for instruction -206: > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes > ... > Caused by: > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes > ... > Caused by: java.lang.IllegalArgumentException: unable to deserialize Custom > Coder Bytes > ... > Caused by: java.lang.ClassNotFoundException: > org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder > ... > Caused by: java.lang.RuntimeException: Error received from SDK harness for > instruction -206: > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes > ... > Caused by: > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes > ... > Caused by: java.lang.IllegalArgumentException: unable to deserialize Custom > Coder Bytes > ... > Caused by: java.lang.ClassNotFoundException: > org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder > ... > {noformat} > Update: Looks like this has been failing as far back as [Apr > 4|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4566/] > after a long period where the test was consistently timing out since [Mar > 31|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4546/]. > So it's hard to narrow down what commit may have caused this. Plus, the test > was failing due to a completely different BigQuery failure before anyway, so > it seems like this test will need to be completely fixed from scratch, > instead of tracking down a specific breaking change. > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postc
[jira] [Resolved] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver resolved BEAM-9562. --- Resolution: Fixed > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-java-harness, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 24h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082727#comment-17082727 ] Kyle Weaver commented on BEAM-9562: --- Thanks for the fix Luke. We cherry-picked it, so I am marking this issue as resolved again. > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-java-harness, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 24h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-9725) Perfomance regression in reshuffle
[ https://issues.apache.org/jira/browse/BEAM-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver closed BEAM-9725. - Assignee: Ankur Goenka Resolution: Fixed > Perfomance regression in reshuffle > --- > > Key: BEAM-9725 > URL: https://issues.apache.org/jira/browse/BEAM-9725 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core, sdk-py-harness >Affects Versions: 2.20.0 >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Fix For: 2.21.0 > > > PR [https://github.com/apache/beam/pull/11066] is causing a performance > regression for reshuffle transform. > > cc: [~amaliujia] [~altay] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8466) Python typehints: pep 484 warn and strict modes
[ https://issues.apache.org/jira/browse/BEAM-8466?focusedWorklogId=421728&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421728 ] ASF GitHub Bot logged work on BEAM-8466: Author: ASF GitHub Bot Created on: 13/Apr/20 23:06 Start Date: 13/Apr/20 23:06 Worklog Time Spent: 10m Work Description: udim commented on issue #11240: [BEAM-8466] Make strip_iterable more strict URL: https://github.com/apache/beam/pull/11240#issuecomment-613139531 This is now ready to be internally tested This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421728) Time Spent: 2h (was: 1h 50m) > Python typehints: pep 484 warn and strict modes > --- > > Key: BEAM-8466 > URL: https://issues.apache.org/jira/browse/BEAM-8466 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > Allow type checking to use PEP 484 type hints, but only warn if there are > errors, and in another mode to raise exceptions on errors. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9735) Performance regression in Python Batch pipeline in Reshuffle
[ https://issues.apache.org/jira/browse/BEAM-9735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver resolved BEAM-9735. --- Resolution: Fixed > Performance regression in Python Batch pipeline in Reshuffle > > > Key: BEAM-9735 > URL: https://issues.apache.org/jira/browse/BEAM-9735 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Blocker > Fix For: 2.21.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9745) [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to deserialize Custom DoFns and Custom Coders.
[ https://issues.apache.org/jira/browse/BEAM-9745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082722#comment-17082722 ] Daniel Oliveira commented on BEAM-9745: --- Did some asking around. Looks like this may be related to https://github.com/apache/beam/commit/0cd2fb6633a9d3d9183fc0532336501c3a56406c#diff-ecb570b49f9b4854404be5fbd74b0f22 And it will probably be fixed once the worker is rebuilt. > [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to > deserialize Custom DoFns and Custom Coders. > - > > Key: BEAM-9745 > URL: https://issues.apache.org/jira/browse/BEAM-9745 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, java-fn-execution, sdk-java-harness, > test-failures >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Blocker > Labels: currently-failing > Fix For: 2.21.0 > > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4657/] > * [Gradle Build > Scan|https://scans.gradle.com/s/c3izncsa4u24k/tests/by-project] > Initial investigation: > The bug appears to be popping up on BigQuery tests mostly, but also a > BigTable and a Datastore test. > Here's an example stacktrace of the two errors, showing _only_ the error > messages themselves. Source: > [https://scans.gradle.com/s/c3izncsa4u24k/tests/efn4wciuamvqq-ccxt3jvofvqbe] > {noformat} > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error > received from SDK harness for instruction -191: > java.lang.IllegalArgumentException: unable to deserialize Custom DoFn With > Execution Info > ... > Caused by: java.lang.ClassNotFoundException: > org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3 > ... > Caused by: java.lang.RuntimeException: Error received from SDK harness for > instruction -191: java.lang.IllegalArgumentException: unable to deserialize > Custom DoFn With Execution Info > ... > Caused by: java.lang.ClassNotFoundException: > org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3 > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error > received from SDK harness for instruction -206: > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes > ... > Caused by: > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes > ... > Caused by: java.lang.IllegalArgumentException: unable to deserialize Custom > Coder Bytes > ... > Caused by: java.lang.ClassNotFoundException: > org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder > ... > Caused by: java.lang.RuntimeException: Error received from SDK harness for > instruction -206: > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes > ... > Caused by: > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes > ... > Caused by: java.lang.IllegalArgumentException: unable to deserialize Custom > Coder Bytes > ... > Caused by: java.lang.ClassNotFoundException: > org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder > ... > {noformat} > Update: Looks like this has been failing as far back as [Apr > 4|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4566/] > after a long period where the test was consistently timing out since [Mar > 31|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4546/]. > So it's hard to narrow down what commit may have caused this. Plus, the test > was failing due to a completely different BigQuery failure before anyway, so > it seems like this test will need to be completely fixed from scratch, > instead of tracking down a specific breaking change. > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriat
[jira] [Commented] (BEAM-9709) timezone off by 8 hours
[ https://issues.apache.org/jira/browse/BEAM-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082720#comment-17082720 ] Andrew Pilloud commented on BEAM-9709: -- That test appears to be asserting the default timezone is not UTC (it has the wrong "right" answer): https://github.com/google/zetasql/blob/7d983d3632702f200c8340933160c02f1d94e5a7/javatests/com/google/zetasql/PreparedExpressionTest.java#L370 > timezone off by 8 hours > --- > > Key: BEAM-9709 > URL: https://issues.apache.org/jira/browse/BEAM-9709 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Yueyang Qiu >Priority: Trivial > Labels: zetasql-compliance > > two failures in shard 13, one failure in shard 19 > {code} > Expected: ARRAY>[{2014-01-31 00:00:00+00}] > Actual: ARRAY>[{2014-01-31 08:00:00+00}], > {code} > {code} > select timestamp(date '2014-01-31') > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9745) [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to deserialize Custom DoFns and Custom Coders.
[ https://issues.apache.org/jira/browse/BEAM-9745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-9745: -- Fix Version/s: 2.21.0 > [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to > deserialize Custom DoFns and Custom Coders. > - > > Key: BEAM-9745 > URL: https://issues.apache.org/jira/browse/BEAM-9745 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, java-fn-execution, sdk-java-harness, > test-failures >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Blocker > Labels: currently-failing > Fix For: 2.21.0 > > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4657/] > * [Gradle Build > Scan|https://scans.gradle.com/s/c3izncsa4u24k/tests/by-project] > Initial investigation: > The bug appears to be popping up on BigQuery tests mostly, but also a > BigTable and a Datastore test. > Here's an example stacktrace of the two errors, showing _only_ the error > messages themselves. Source: > [https://scans.gradle.com/s/c3izncsa4u24k/tests/efn4wciuamvqq-ccxt3jvofvqbe] > {noformat} > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error > received from SDK harness for instruction -191: > java.lang.IllegalArgumentException: unable to deserialize Custom DoFn With > Execution Info > ... > Caused by: java.lang.ClassNotFoundException: > org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3 > ... > Caused by: java.lang.RuntimeException: Error received from SDK harness for > instruction -191: java.lang.IllegalArgumentException: unable to deserialize > Custom DoFn With Execution Info > ... > Caused by: java.lang.ClassNotFoundException: > org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3 > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error > received from SDK harness for instruction -206: > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes > ... > Caused by: > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes > ... > Caused by: java.lang.IllegalArgumentException: unable to deserialize Custom > Coder Bytes > ... > Caused by: java.lang.ClassNotFoundException: > org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder > ... > Caused by: java.lang.RuntimeException: Error received from SDK harness for > instruction -206: > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes > ... > Caused by: > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes > ... > Caused by: java.lang.IllegalArgumentException: unable to deserialize Custom > Coder Bytes > ... > Caused by: java.lang.ClassNotFoundException: > org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder > ... > {noformat} > Update: Looks like this has been failing as far back as [Apr > 4|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4566/] > after a long period where the test was consistently timing out since [Mar > 31|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4546/]. > So it's hard to narrow down what commit may have caused this. Plus, the test > was failing due to a completely different BigQuery failure before anyway, so > it seems like this test will need to be completely fixed from scratch, > instead of tracking down a specific breaking change. > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9692) Clean Python DataflowRunner to use portable pipelines
[ https://issues.apache.org/jira/browse/BEAM-9692?focusedWorklogId=421713&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421713 ] ASF GitHub Bot logged work on BEAM-9692: Author: ASF GitHub Bot Created on: 13/Apr/20 22:37 Start Date: 13/Apr/20 22:37 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11335: [BEAM-9692]: Make CombineValues portable URL: https://github.com/apache/beam/pull/11335#discussion_r407758508 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner_test.py ## @@ -566,6 +566,19 @@ def test_get_default_gcp_region_ignores_error( result = runner.get_default_gcp_region() self.assertIsNone(result) + def test_combine_values_translation(self): +runner = DataflowRunner() + +with beam.Pipeline(runner=runner, + options=PipelineOptions(self.default_properties)) as p: + ( # pylint: disable=expression-not-assigned + p + | beam.Create([('a', [1, 2]), ('b', [3, 4])]) + | beam.CombineValues(lambda v, _: sum(v))) + +job_dict = json.loads(str(runner.job)) +self.assertEqual(job_dict[u'steps'][1][u'kind'], u'CombineValues') Review comment: Asserting that it's the first step seems brittle, maybe just assert that there is some step that has this kind? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421713) Time Spent: 50m (was: 40m) > Clean Python DataflowRunner to use portable pipelines > - > > Key: BEAM-9692 > URL: https://issues.apache.org/jira/browse/BEAM-9692 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9692) Clean Python DataflowRunner to use portable pipelines
[ https://issues.apache.org/jira/browse/BEAM-9692?focusedWorklogId=421714&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421714 ] ASF GitHub Bot logged work on BEAM-9692: Author: ASF GitHub Bot Created on: 13/Apr/20 22:37 Start Date: 13/Apr/20 22:37 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11335: [BEAM-9692]: Make CombineValues portable URL: https://github.com/apache/beam/pull/11335#discussion_r407760055 ## File path: sdks/python/apache_beam/runners/dataflow/ptransform_overrides.py ## @@ -111,3 +111,38 @@ def expand(self, pbegin): return JrhRead().with_output_types( ptransform.get_type_hints().simple_output_type('Read')) + + +class CombineValuesPTransformOverride(PTransformOverride): + """A ``PTransformOverride`` for ``CombineValues``. + + The DataflowRunner expects that the CombineValues PTransform acts as a + primitive. So this override replaces the CombineValues with a primitive. + """ + def matches(self, applied_ptransform): +# Imported here to avoid circular dependencies. +# pylint: disable=wrong-import-order, wrong-import-position +from apache_beam import CombineValues + +if isinstance(applied_ptransform.transform, CombineValues): + self.transform = applied_ptransform.transform + return True +return False + + def get_replacement_transform(self, ptransform): +# Imported here to avoid circular dependencies. +# pylint: disable=wrong-import-order, wrong-import-position +from apache_beam import PTransform +from apache_beam.pvalue import PCollection + +# The DataflowRunner still needs access to the CombineValues members to Review comment: It would be preferable to simply let try and find methods for composites as well, rather than using PTransformOverrides. This would likely help with the GBK one too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421714) Time Spent: 1h (was: 50m) > Clean Python DataflowRunner to use portable pipelines > - > > Key: BEAM-9692 > URL: https://issues.apache.org/jira/browse/BEAM-9692 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9692) Clean Python DataflowRunner to use portable pipelines
[ https://issues.apache.org/jira/browse/BEAM-9692?focusedWorklogId=421712&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421712 ] ASF GitHub Bot logged work on BEAM-9692: Author: ASF GitHub Bot Created on: 13/Apr/20 22:37 Start Date: 13/Apr/20 22:37 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11335: [BEAM-9692]: Make CombineValues portable URL: https://github.com/apache/beam/pull/11335#discussion_r407757918 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -110,22 +110,27 @@ class DataflowRunner(PipelineRunner): # Imported here to avoid circular dependencies. # TODO: Remove the apache_beam.pipeline dependency in CreatePTransformOverride + from apache_beam.runners.dataflow.ptransform_overrides import CombineValuesPTransformOverride from apache_beam.runners.dataflow.ptransform_overrides import CreatePTransformOverride from apache_beam.runners.dataflow.ptransform_overrides import ReadPTransformOverride from apache_beam.runners.dataflow.ptransform_overrides import JrhReadPTransformOverride - _PTRANSFORM_OVERRIDES = [] # type: List[PTransformOverride] + # Thesse overrides should be applied before the proto representation of the + # graph is created. + _PTRANSFORM_OVERRIDES = [ + CombineValuesPTransformOverride() Review comment: Seems this one should happen after too... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421712) Time Spent: 40m (was: 0.5h) > Clean Python DataflowRunner to use portable pipelines > - > > Key: BEAM-9692 > URL: https://issues.apache.org/jira/browse/BEAM-9692 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9709) timezone off by 8 hours
[ https://issues.apache.org/jira/browse/BEAM-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082714#comment-17082714 ] Yueyang Qiu commented on BEAM-9709: --- It seems weird to me because this ZetaSQL unit test is working: [https://github.com/google/zetasql/blob/master/javatests/com/google/zetasql/PreparedExpressionTest.java#L364] > timezone off by 8 hours > --- > > Key: BEAM-9709 > URL: https://issues.apache.org/jira/browse/BEAM-9709 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Yueyang Qiu >Priority: Trivial > Labels: zetasql-compliance > > two failures in shard 13, one failure in shard 19 > {code} > Expected: ARRAY>[{2014-01-31 00:00:00+00}] > Actual: ARRAY>[{2014-01-31 08:00:00+00}], > {code} > {code} > select timestamp(date '2014-01-31') > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9709) timezone off by 8 hours
[ https://issues.apache.org/jira/browse/BEAM-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082704#comment-17082704 ] Yueyang Qiu edited comment on BEAM-9709 at 4/13/20, 10:33 PM: -- (Previous comments are wrong. Writing the right version.) Thank you Andrew for linking to the other bug. I took that as well. I think these 2 issues could be fixed together. was (Author: robinyqiu): OK I dug into this a bit more and found the actual problem. Currently Beam ZetaSQL engine does not set default time zone. So in BeamZetaSqlCalcRel, when ZetaSQL evaluator evaluate an expression that depends on default time zone, it choose its own: "America/Los_Angeles" ([https://github.com/google/zetasql/blob/master/zetasql/reference_impl/evaluation.cc#L122]). The fix should be a one-line change in BeamZetaSqlCalcRel "options.setDefaultTimezone("UTC");" (i.e. Beam ZetaSQL defines its own default time zone to be UTC). Thank you Andrew for linking to the other bug. I took that as well. I think these 2 issues could be fixed together. > timezone off by 8 hours > --- > > Key: BEAM-9709 > URL: https://issues.apache.org/jira/browse/BEAM-9709 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Yueyang Qiu >Priority: Trivial > Labels: zetasql-compliance > > two failures in shard 13, one failure in shard 19 > {code} > Expected: ARRAY>[{2014-01-31 00:00:00+00}] > Actual: ARRAY>[{2014-01-31 08:00:00+00}], > {code} > {code} > select timestamp(date '2014-01-31') > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9709) timezone off by 8 hours
[ https://issues.apache.org/jira/browse/BEAM-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082713#comment-17082713 ] Yueyang Qiu commented on BEAM-9709: --- Yes you are right. I deleted the previous wrong comment. > timezone off by 8 hours > --- > > Key: BEAM-9709 > URL: https://issues.apache.org/jira/browse/BEAM-9709 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Yueyang Qiu >Priority: Trivial > Labels: zetasql-compliance > > two failures in shard 13, one failure in shard 19 > {code} > Expected: ARRAY>[{2014-01-31 00:00:00+00}] > Actual: ARRAY>[{2014-01-31 08:00:00+00}], > {code} > {code} > select timestamp(date '2014-01-31') > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9709) timezone off by 8 hours
[ https://issues.apache.org/jira/browse/BEAM-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082712#comment-17082712 ] Andrew Pilloud commented on BEAM-9709: -- BeamZetaSqlCalcRel uses the analyzer options from Beam ZetaSQL, which already sets the default: https://github.com/apache/beam/blob/473790ac4f98405ef20fe9186a5b75b1e0ad5657/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java#L105 That must not be getting plumbed through to evaluation.cc somewhere. > timezone off by 8 hours > --- > > Key: BEAM-9709 > URL: https://issues.apache.org/jira/browse/BEAM-9709 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Yueyang Qiu >Priority: Trivial > Labels: zetasql-compliance > > two failures in shard 13, one failure in shard 19 > {code} > Expected: ARRAY>[{2014-01-31 00:00:00+00}] > Actual: ARRAY>[{2014-01-31 08:00:00+00}], > {code} > {code} > select timestamp(date '2014-01-31') > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8472) Get default GCP region from gcloud
[ https://issues.apache.org/jira/browse/BEAM-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082711#comment-17082711 ] Robert Burke edited comment on BEAM-8472 at 4/13/20, 10:26 PM: --- Just to be clear, the protocol is to check the environment variables, and then execute the gcloud command? Which would be to use [os.Getenv|https://godoc.org/pkg/os#Getenv] with "CLOUDSDK_COMPUTE_REGION" and then use the [os/exec package|https://godoc.org/pkg/os/exec] to call the gcloud executable? was (Author: lostluck): Just to be clear, the protocol is to check the environment variables, and then execute the gcloud command? Which would be to use [os.Getenv|https://godoc.corp.google.com/pkg/os#Getenv] with "CLOUDSDK_COMPUTE_REGION" and then use the [os/exec package|https://godoc.corp.google.com/pkg/os/exec] to call the gcloud executable? > Get default GCP region from gcloud > -- > > Key: BEAM-8472 > URL: https://issues.apache.org/jira/browse/BEAM-8472 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow, sdk-go >Reporter: Kyle Weaver >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > Currently, we default to us-central1 if --region flag is not set. The Google > Cloud SDK generally tries to get a default value in this case for > convenience, which we should follow. > [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties] > Update 11/12: this is complete for Python and Java, Go remains. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8472) Get default GCP region from gcloud
[ https://issues.apache.org/jira/browse/BEAM-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082711#comment-17082711 ] Robert Burke commented on BEAM-8472: Just to be clear, the protocol is to check the environment variables, and then execute the gcloud command? Which would be to use [os.Getenv|https://godoc.corp.google.com/pkg/os#Getenv] with "CLOUDSDK_COMPUTE_REGION" and then use the [os/exec package|https://godoc.corp.google.com/pkg/os/exec] to call the gcloud executable? > Get default GCP region from gcloud > -- > > Key: BEAM-8472 > URL: https://issues.apache.org/jira/browse/BEAM-8472 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow, sdk-go >Reporter: Kyle Weaver >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > Currently, we default to us-central1 if --region flag is not set. The Google > Cloud SDK generally tries to get a default value in this case for > convenience, which we should follow. > [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties] > Update 11/12: this is complete for Python and Java, Go remains. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9751) Upgrade ZetaSQL to 2020.04.1
[ https://issues.apache.org/jira/browse/BEAM-9751?focusedWorklogId=421708&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421708 ] ASF GitHub Bot logged work on BEAM-9751: Author: ASF GitHub Bot Created on: 13/Apr/20 22:23 Start Date: 13/Apr/20 22:23 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #11410: [BEAM-9751] Upgrade ZetaSQL java 2020 04 1 URL: https://github.com/apache/beam/pull/11410#issuecomment-613126620 Run SQL PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421708) Time Spent: 0.5h (was: 20m) > Upgrade ZetaSQL to 2020.04.1 > > > Key: BEAM-9751 > URL: https://issues.apache.org/jira/browse/BEAM-9751 > Project: Beam > Issue Type: Task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9751) Upgrade ZetaSQL to 2020.04.1
[ https://issues.apache.org/jira/browse/BEAM-9751?focusedWorklogId=421707&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421707 ] ASF GitHub Bot logged work on BEAM-9751: Author: ASF GitHub Bot Created on: 13/Apr/20 22:23 Start Date: 13/Apr/20 22:23 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #11410: [BEAM-9751] Upgrade ZetaSQL java 2020 04 1 URL: https://github.com/apache/beam/pull/11410#discussion_r407754950 ## File path: sdks/java/extensions/sql/zetasql/build.gradle ## @@ -20,12 +20,18 @@ plugins { id 'org.apache.beam.module' } +repositories { Review comment: Will remove this when PR is ready to merge. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421707) Time Spent: 20m (was: 10m) > Upgrade ZetaSQL to 2020.04.1 > > > Key: BEAM-9751 > URL: https://issues.apache.org/jira/browse/BEAM-9751 > Project: Beam > Issue Type: Task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9751) Upgrade ZetaSQL to 2020.04.1
[ https://issues.apache.org/jira/browse/BEAM-9751?focusedWorklogId=421706&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421706 ] ASF GitHub Bot logged work on BEAM-9751: Author: ASF GitHub Bot Created on: 13/Apr/20 22:22 Start Date: 13/Apr/20 22:22 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #11410: [BEAM-9751] Upgrade ZetaSQL java 2020 04 1 URL: https://github.com/apache/beam/pull/11410 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build
[jira] [Commented] (BEAM-8472) Get default GCP region from gcloud
[ https://issues.apache.org/jira/browse/BEAM-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082707#comment-17082707 ] Robert Burke commented on BEAM-8472: Eventually. Dataflow doesn't currently support the Go SDK so this won't be prioritized above current work any time soon. > Get default GCP region from gcloud > -- > > Key: BEAM-8472 > URL: https://issues.apache.org/jira/browse/BEAM-8472 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow, sdk-go >Reporter: Kyle Weaver >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > Currently, we default to us-central1 if --region flag is not set. The Google > Cloud SDK generally tries to get a default value in this case for > convenience, which we should follow. > [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties] > Update 11/12: this is complete for Python and Java, Go remains. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8472) Get default GCP region from gcloud
[ https://issues.apache.org/jira/browse/BEAM-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke reassigned BEAM-8472: -- Assignee: (was: Kyle Weaver) > Get default GCP region from gcloud > -- > > Key: BEAM-8472 > URL: https://issues.apache.org/jira/browse/BEAM-8472 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow, sdk-go >Reporter: Kyle Weaver >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > Currently, we default to us-central1 if --region flag is not set. The Google > Cloud SDK generally tries to get a default value in this case for > convenience, which we should follow. > [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties] > Update 11/12: this is complete for Python and Java, Go remains. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9751) Upgrade ZetaSQL to 2020.04.1
Rui Wang created BEAM-9751: -- Summary: Upgrade ZetaSQL to 2020.04.1 Key: BEAM-9751 URL: https://issues.apache.org/jira/browse/BEAM-9751 Project: Beam Issue Type: Task Components: dsl-sql-zetasql Reporter: Rui Wang Assignee: Rui Wang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9709) timezone off by 8 hours
[ https://issues.apache.org/jira/browse/BEAM-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082704#comment-17082704 ] Yueyang Qiu commented on BEAM-9709: --- OK I dug into this a bit more and found the actual problem. Currently Beam ZetaSQL engine does not set default time zone. So in BeamZetaSqlCalcRel, when ZetaSQL evaluator evaluate an expression that depends on default time zone, it choose its own: "America/Los_Angeles" ([https://github.com/google/zetasql/blob/master/zetasql/reference_impl/evaluation.cc#L122]). The fix should be a one-line change in BeamZetaSqlCalcRel "options.setDefaultTimezone("UTC");" (i.e. Beam ZetaSQL defines its own default time zone to be UTC). Thank you Andrew for linking to the other bug. I took that as well. I think these 2 issues could be fixed together. > timezone off by 8 hours > --- > > Key: BEAM-9709 > URL: https://issues.apache.org/jira/browse/BEAM-9709 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Yueyang Qiu >Priority: Trivial > Labels: zetasql-compliance > > two failures in shard 13, one failure in shard 19 > {code} > Expected: ARRAY>[{2014-01-31 00:00:00+00}] > Actual: ARRAY>[{2014-01-31 08:00:00+00}], > {code} > {code} > select timestamp(date '2014-01-31') > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9496) Add a Dataframe API for Python
[ https://issues.apache.org/jira/browse/BEAM-9496?focusedWorklogId=421690&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421690 ] ASF GitHub Bot logged work on BEAM-9496: Author: ASF GitHub Bot Created on: 13/Apr/20 22:07 Start Date: 13/Apr/20 22:07 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11264: [BEAM-9496] Add to_dataframe and to_pcollection APIs. URL: https://github.com/apache/beam/pull/11264#discussion_r407749163 ## File path: sdks/python/apache_beam/dataframe/convert.py ## @@ -0,0 +1,71 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import + +import inspect + +from apache_beam import pvalue +from apache_beam.dataframe import expressions +from apache_beam.dataframe import frame_base +from apache_beam.dataframe import transforms + + +def to_dataframe(pc): + pass + + +# TODO: Or should this be called as_dataframe? Review comment: So far we haven't added any methods to PCollection, but I'm open to the idea (thought it'd be a big change to the API that should be done wholistically, see https://lists.apache.org/thread.html/fcb422d61437a634662b24100d4e2d46a940ee766848b699023081d9%40%3Cdev.beam.apache.org%3E ) For now, at least, it seems a bit much to make dataframe-methods on PCollection itself. Short of that, can you think of any fluent styles for ``` from apache_beam import dataframe as ??? ... pcol = p | "Read from Source" >> beam.io.SomeSchemaSource(foo) df = ??? ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421690) Time Spent: 1h 20m (was: 1h 10m) > Add a Dataframe API for Python > -- > > Key: BEAM-9496 > URL: https://issues.apache.org/jira/browse/BEAM-9496 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > This is an umbrella bug for the dataframes work. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9744) Python performance tests failing
[ https://issues.apache.org/jira/browse/BEAM-9744?focusedWorklogId=421687&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421687 ] ASF GitHub Bot logged work on BEAM-9744: Author: ASF GitHub Bot Created on: 13/Apr/20 22:05 Start Date: 13/Apr/20 22:05 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11408: [BEAM-9744] Remove --region option from SQL tests. URL: https://github.com/apache/beam/pull/11408 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421687) Time Spent: 1h 40m (was: 1.5h) > Python performance tests failing > > > Key: BEAM-9744 > URL: https://issues.apache.org/jira/browse/BEAM-9744 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > beam_PerformanceTests_WordCountIT_Py* failing because --region is missing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=421686&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421686 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 13/Apr/20 22:04 Start Date: 13/Apr/20 22:04 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#issuecomment-613120178 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421686) Time Spent: 26h 20m (was: 26h 10m) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 26h 20m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=421685&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421685 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 13/Apr/20 22:04 Start Date: 13/Apr/20 22:04 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#issuecomment-613120004 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421685) Time Spent: 26h 10m (was: 26h) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 26h 10m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9712) setting default timezone doesn't work
[ https://issues.apache.org/jira/browse/BEAM-9712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yueyang Qiu reassigned BEAM-9712: - Assignee: Yueyang Qiu > setting default timezone doesn't work > - > > Key: BEAM-9712 > URL: https://issues.apache.org/jira/browse/BEAM-9712 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Yueyang Qiu >Priority: Trivial > Labels: zetasql-compliance > > several failures in shard 14 > (note: fixing the internal tests requires plumbing through the timezone > config.) > {code} > [name=timestamp_to_string_1] > select [cast(timestamp "2015-01-28" as string), > cast(timestamp "2015-01-28 00:00:00" as string), > cast(timestamp "2015-01-28 00:00:00.0" as string), > cast(timestamp "2015-01-28 00:00:00.00" as string), > cast(timestamp "2015-01-28 00:00:00.000" as string), > cast(timestamp "2015-01-28 00:00:00." as string), > cast(timestamp "2015-01-28 00:00:00.0" as string), > cast(timestamp "2015-01-28 00:00:00.00" as string)] > -- > ARRAY>>[ > {ARRAY[ > "2015-01-28 00:00:00+13:45", > "2015-01-28 00:00:00+13:45", > "2015-01-28 00:00:00+13:45", > "2015-01-28 00:00:00+13:45", > "2015-01-28 00:00:00+13:45", > "2015-01-28 00:00:00+13:45", > "2015-01-28 00:00:00+13:45", > "2015-01-28 00:00:00+13:45" >]} > ] > {code} > {code} > [default_time_zone=Pacific/Chatham] > [name=timestamp_to_string_1] > select [cast(timestamp "2015-01-28" as string), > cast(timestamp "2015-01-28 00:00:00" as string), > cast(timestamp "2015-01-28 00:00:00.0" as string), > cast(timestamp "2015-01-28 00:00:00.00" as string), > cast(timestamp "2015-01-28 00:00:00.000" as string), > cast(timestamp "2015-01-28 00:00:00." as string), > cast(timestamp "2015-01-28 00:00:00.0" as string), > cast(timestamp "2015-01-28 00:00:00.00" as string)] > -- > ARRAY>>[ > {ARRAY[ > "2015-01-28 00:00:00+13:45", > "2015-01-28 00:00:00+13:45", > "2015-01-28 00:00:00+13:45", > "2015-01-28 00:00:00+13:45", > "2015-01-28 00:00:00+13:45", > "2015-01-28 00:00:00+13:45", > "2015-01-28 00:00:00+13:45", > "2015-01-28 00:00:00+13:45" >]} > ] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9735) Performance regression in Python Batch pipeline in Reshuffle
[ https://issues.apache.org/jira/browse/BEAM-9735?focusedWorklogId=421683&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421683 ] ASF GitHub Bot logged work on BEAM-9735: Author: ASF GitHub Bot Created on: 13/Apr/20 22:03 Start Date: 13/Apr/20 22:03 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11395: [Cherrypick 11365] [BEAM-9735] Adding Always trigger and using it in Reshuffle URL: https://github.com/apache/beam/pull/11395 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421683) Time Spent: 1.5h (was: 1h 20m) > Performance regression in Python Batch pipeline in Reshuffle > > > Key: BEAM-9735 > URL: https://issues.apache.org/jira/browse/BEAM-9735 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Blocker > Fix For: 2.21.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=421682&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421682 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 13/Apr/20 22:02 Start Date: 13/Apr/20 22:02 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11407: [BEAM-9562] Cherry-pick: Fix output timestamp to be inferred from scheduled time w… URL: https://github.com/apache/beam/pull/11407 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421682) Time Spent: 24h 20m (was: 24h 10m) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-java-harness, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 24h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9744) Python performance tests failing
[ https://issues.apache.org/jira/browse/BEAM-9744?focusedWorklogId=421680&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421680 ] ASF GitHub Bot logged work on BEAM-9744: Author: ASF GitHub Bot Created on: 13/Apr/20 22:00 Start Date: 13/Apr/20 22:00 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11408: [BEAM-9744] Remove --region option from SQL tests. URL: https://github.com/apache/beam/pull/11408#discussion_r407746387 ## File path: sdks/java/extensions/sql/build.gradle ## @@ -149,15 +149,13 @@ task runPojoExample(type: JavaExec) { task integrationTest(type: Test) { group = "Verification" def gcpProject = project.findProperty('gcpProject') ?: 'apache-beam-testing' - def gcpRegion = project.findProperty('gcpRegion') ?: 'us-central1' def gcsTempRoot = project.findProperty('gcsTempRoot') ?: 'gs://temp-storage-for-end-to-end-tests/' // Disable Gradle cache (it should not be used because the IT's won't run). outputs.upToDateWhen { false } def pipelineOptions = [ "--project=${gcpProject}", Review comment: These tests use other GCP resources like BigQuery and Pub/Sub. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421680) Time Spent: 1.5h (was: 1h 20m) > Python performance tests failing > > > Key: BEAM-9744 > URL: https://issues.apache.org/jira/browse/BEAM-9744 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > beam_PerformanceTests_WordCountIT_Py* failing because --region is missing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9744) Python performance tests failing
[ https://issues.apache.org/jira/browse/BEAM-9744?focusedWorklogId=421679&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421679 ] ASF GitHub Bot logged work on BEAM-9744: Author: ASF GitHub Bot Created on: 13/Apr/20 21:59 Start Date: 13/Apr/20 21:59 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #11408: [BEAM-9744] Remove --region option from SQL tests. URL: https://github.com/apache/beam/pull/11408#discussion_r407745861 ## File path: sdks/java/extensions/sql/build.gradle ## @@ -149,15 +149,13 @@ task runPojoExample(type: JavaExec) { task integrationTest(type: Test) { group = "Verification" def gcpProject = project.findProperty('gcpProject') ?: 'apache-beam-testing' - def gcpRegion = project.findProperty('gcpRegion') ?: 'us-central1' def gcsTempRoot = project.findProperty('gcsTempRoot') ?: 'gs://temp-storage-for-end-to-end-tests/' // Disable Gradle cache (it should not be used because the IT's won't run). outputs.upToDateWhen { false } def pipelineOptions = [ "--project=${gcpProject}", Review comment: If they don't run on Dataflow, what is project and other gcp flags are used for? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421679) Time Spent: 1h 20m (was: 1h 10m) > Python performance tests failing > > > Key: BEAM-9744 > URL: https://issues.apache.org/jira/browse/BEAM-9744 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > beam_PerformanceTests_WordCountIT_Py* failing because --region is missing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9496) Add a Dataframe API for Python
[ https://issues.apache.org/jira/browse/BEAM-9496?focusedWorklogId=421673&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421673 ] ASF GitHub Bot logged work on BEAM-9496: Author: ASF GitHub Bot Created on: 13/Apr/20 21:57 Start Date: 13/Apr/20 21:57 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11264: [BEAM-9496] Add to_dataframe and to_pcollection APIs. URL: https://github.com/apache/beam/pull/11264#discussion_r407745027 ## File path: sdks/python/apache_beam/dataframe/transforms.py ## @@ -0,0 +1,255 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import + +import pandas as pd + +import apache_beam as beam +from apache_beam import transforms +from apache_beam.dataframe import expressions +from apache_beam.dataframe import frame_base +from apache_beam.dataframe import frames # pylint: disable=unused-import + + +class DataframeTransform(transforms.PTransform): + """A PTransform for applying function that takes and returns dataframes + to one or more PCollections. + + For example, if pcoll is a PCollection of dataframes, one could write:: + + pcoll | DataframeTransform(lambda df: df.group_by('key').sum(), proxy=...) + + To pass multiple PCollections, pass a tuple of PCollections wich will be + passed to the callable as positional arguments, or a dictionary of + PCollections, in which case they will be passed as keyword arguments. + """ + def __init__(self, func, proxy): +self._func = func +self._proxy = proxy + + def expand(self, input_pcolls): +def wrap_as_dict(values): + if isinstance(values, dict): +return values + elif isinstance(values, tuple): +return dict(enumerate(values)) + else: +return {None: values} + +# TODO: Infer the proxy from the input schema. +def proxy(key): + if key is None: +return self._proxy + else: +return self._proxy[key] + +# The input can be a dictionary, tuple, or plain PCollection. +# Wrap as a dict for homogeneity. +# TODO: Possibly inject batching here. +input_dict = wrap_as_dict(input_pcolls) +placeholders = { +key: frame_base.DeferredFrame.wrap( +expressions.PlaceholderExpression(proxy(key))) +for key in input_dict.keys() +} + +# The calling convention of the user-supplied func varies according to the +# type of the input. +if isinstance(input_pcolls, dict): + result_frames = self._func(**placeholders) +elif isinstance(input_pcolls, tuple): + result_frames = self._func( + *(value for _, value in sorted(placeholders.items( +else: + result_frames = self._func(placeholders[None]) + +# Likewise the output may be a dict, tuple, or raw (deferred) Dataframe. +result_dict = wrap_as_dict(result_frames) + +result_pcolls = { +placeholders[key]._expr: pcoll +for key, pcoll in input_dict.items() +} | 'Eval' >> DataframeExpressionsTransform( +{key: df._expr + for key, df in result_dict.items()}) + +# Convert the result back into a set of PCollections. +if isinstance(result_frames, dict): + return result_pcolls +elif isinstance(result_frames, tuple): + return tuple((value for _, value in sorted(result_pcolls.items( +else: + return result_pcolls[None] Review comment: Yes. Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421673) Time Spent: 1h 10m (was: 1h) > Add a Dataframe API for Python > -- > > Key: BEAM-9496 > URL: https://issues.apache.org/jira/browse/BEAM-9496 > Project: Beam > Issue Type: New Feature > Compo
[jira] [Updated] (BEAM-8472) Get default GCP region from gcloud
[ https://issues.apache.org/jira/browse/BEAM-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-8472: -- Component/s: sdk-go > Get default GCP region from gcloud > -- > > Key: BEAM-8472 > URL: https://issues.apache.org/jira/browse/BEAM-8472 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow, sdk-go >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > Currently, we default to us-central1 if --region flag is not set. The Google > Cloud SDK generally tries to get a default value in this case for > convenience, which we should follow. > [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties] > Update 11/12: this is complete for Python and Java, Go remains. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8472) Get default GCP region from gcloud
[ https://issues.apache.org/jira/browse/BEAM-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082689#comment-17082689 ] Kyle Weaver commented on BEAM-8472: --- FYI [~lostluck] [~danoliveira] Would one of you mind taking this issue? It should be pretty straightforward to port the Java/Python implementation to Go. > Get default GCP region from gcloud > -- > > Key: BEAM-8472 > URL: https://issues.apache.org/jira/browse/BEAM-8472 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > Currently, we default to us-central1 if --region flag is not set. The Google > Cloud SDK generally tries to get a default value in this case for > convenience, which we should follow. > [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties] > Update 11/12: this is complete for Python and Java, Go remains. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8253) (Go SDK) Add worker_region and worker_zone options
[ https://issues.apache.org/jira/browse/BEAM-8253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082685#comment-17082685 ] Kyle Weaver commented on BEAM-8253: --- [~lostluck] [~danoliveira] Could one of you take this over? > (Go SDK) Add worker_region and worker_zone options > -- > > Key: BEAM-8253 > URL: https://issues.apache.org/jira/browse/BEAM-8253 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow, sdk-go >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=421664&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421664 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 13/Apr/20 21:45 Start Date: 13/Apr/20 21:45 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#issuecomment-613113280 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421664) Time Spent: 26h (was: 25h 50m) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 26h > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=421663&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421663 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 13/Apr/20 21:44 Start Date: 13/Apr/20 21:44 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#issuecomment-613112821 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421663) Time Spent: 25h 50m (was: 25h 40m) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 25h 50m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9709) timezone off by 8 hours
[ https://issues.apache.org/jira/browse/BEAM-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082680#comment-17082680 ] Andrew Pilloud commented on BEAM-9709: -- My understanding is that setting the default timezone to something other than UTC is BEAM-9712. This test is with the timezone set to UTC, so it should either work or return an unimplemented exception. > timezone off by 8 hours > --- > > Key: BEAM-9709 > URL: https://issues.apache.org/jira/browse/BEAM-9709 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Yueyang Qiu >Priority: Trivial > Labels: zetasql-compliance > > two failures in shard 13, one failure in shard 19 > {code} > Expected: ARRAY>[{2014-01-31 00:00:00+00}] > Actual: ARRAY>[{2014-01-31 08:00:00+00}], > {code} > {code} > select timestamp(date '2014-01-31') > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9709) timezone off by 8 hours
[ https://issues.apache.org/jira/browse/BEAM-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082672#comment-17082672 ] Yueyang Qiu edited comment on BEAM-9709 at 4/13/20, 9:35 PM: - The cause of this issue is related to setting default time zone. ZetaSQL allows each engine to define its own default time zone. If the default time zone is UTC, then `2014-01-31 00:00:00+00` is the expected result, otherwise the current result could also be valid. This fails in compliance tests currently because in the harness we have not implemented this properly yet. was (Author: robinyqiu): The cause of this issue is related to setting default time zone. ZetaSQL allows each engine to define its own default time zone. If the default time zone is UTC, then `2014-01-31 00:00:00+00` is the expected result, otherwise the current result could also be valid. This fails in compliance tests currently because in the harness we have not implemented this properly yet ([https://cs.corp.google.com/piper///depot/google3/third_party/cloud_dataflow/sql/ExecuteQueryServiceServer.java?type=cs&q=SetDefaultTimeZone+file:%5E//depot/google3/third_party/cloud_dataflow/sql/+package:%5Epiper$&g=0&l=515]). > timezone off by 8 hours > --- > > Key: BEAM-9709 > URL: https://issues.apache.org/jira/browse/BEAM-9709 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Yueyang Qiu >Priority: Trivial > Labels: zetasql-compliance > > two failures in shard 13, one failure in shard 19 > {code} > Expected: ARRAY>[{2014-01-31 00:00:00+00}] > Actual: ARRAY>[{2014-01-31 08:00:00+00}], > {code} > {code} > select timestamp(date '2014-01-31') > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9710) Got current time instead of timestamp value
[ https://issues.apache.org/jira/browse/BEAM-9710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082658#comment-17082658 ] Yueyang Qiu edited comment on BEAM-9710 at 4/13/20, 9:34 PM: - This seems to be caused by bad code from internal compliance test harness. was (Author: robinyqiu): This seems to be caused by bad code from internal compliance test harness. I don't think this should be a Dataflow SQL GA blocker. > Got current time instead of timestamp value > --- > > Key: BEAM-9710 > URL: https://issues.apache.org/jira/browse/BEAM-9710 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Yueyang Qiu >Priority: Trivial > Labels: zetasql-compliance > > one failure in shard 13 > {code} > Expected: ARRAY>[{2014-12-01 00:00:00+00}] > Actual: ARRAY>[{2020-04-06 > 00:20:40.052+00}], > {code} > {code} > [prepare_database] > CREATE TABLE Table1 AS > SELECT timestamp '2014-12-01' as timestamp_val > -- > ARRAY>[{2014-12-01 00:00:00+00}] > == > [name=timestamp_type_2] > SELECT timestamp_val > FROM Table1 > -- > ARRAY>[{2014-12-01 00:00:00+00}] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9709) timezone off by 8 hours
[ https://issues.apache.org/jira/browse/BEAM-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082675#comment-17082675 ] Kenneth Knowles commented on BEAM-9709: --- OK, got it. > timezone off by 8 hours > --- > > Key: BEAM-9709 > URL: https://issues.apache.org/jira/browse/BEAM-9709 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Yueyang Qiu >Priority: Trivial > Labels: zetasql-compliance > > two failures in shard 13, one failure in shard 19 > {code} > Expected: ARRAY>[{2014-01-31 00:00:00+00}] > Actual: ARRAY>[{2014-01-31 08:00:00+00}], > {code} > {code} > select timestamp(date '2014-01-31') > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9709) timezone off by 8 hours
[ https://issues.apache.org/jira/browse/BEAM-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082674#comment-17082674 ] Kenneth Knowles commented on BEAM-9709: --- Just to clarify: the scope of this ticket is the Beam ZetaSQL dialect, while those are docs for Google's hosted product based on it. I believe the Beam ZetaSQL dialect should probably fail this invocation if it is not supported, rather than return a result that disagrees with the ZetaSQL spec. > timezone off by 8 hours > --- > > Key: BEAM-9709 > URL: https://issues.apache.org/jira/browse/BEAM-9709 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Yueyang Qiu >Priority: Trivial > Labels: zetasql-compliance > > two failures in shard 13, one failure in shard 19 > {code} > Expected: ARRAY>[{2014-01-31 00:00:00+00}] > Actual: ARRAY>[{2014-01-31 08:00:00+00}], > {code} > {code} > select timestamp(date '2014-01-31') > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9709) timezone off by 8 hours
[ https://issues.apache.org/jira/browse/BEAM-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082672#comment-17082672 ] Yueyang Qiu commented on BEAM-9709: --- The cause of this issue is related to setting default time zone. ZetaSQL allows each engine to define its own default time zone. If the default time zone is UTC, then `2014-01-31 00:00:00+00` is the expected result, otherwise the current result could also be valid. This fails in compliance tests currently because in the harness we have not implemented this properly yet ([https://cs.corp.google.com/piper///depot/google3/third_party/cloud_dataflow/sql/ExecuteQueryServiceServer.java?type=cs&q=SetDefaultTimeZone+file:%5E//depot/google3/third_party/cloud_dataflow/sql/+package:%5Epiper$&g=0&l=515]). > timezone off by 8 hours > --- > > Key: BEAM-9709 > URL: https://issues.apache.org/jira/browse/BEAM-9709 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Yueyang Qiu >Priority: Trivial > Labels: zetasql-compliance > > two failures in shard 13, one failure in shard 19 > {code} > Expected: ARRAY>[{2014-01-31 00:00:00+00}] > Actual: ARRAY>[{2014-01-31 08:00:00+00}], > {code} > {code} > select timestamp(date '2014-01-31') > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2939) Fn API SDF support
[ https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=421655&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421655 ] ASF GitHub Bot logged work on BEAM-2939: Author: ASF GitHub Bot Created on: 13/Apr/20 21:32 Start Date: 13/Apr/20 21:32 Worklog Time Spent: 10m Work Description: ihji commented on pull request #11409: [BEAM-2939] Update unbounded source as SDF wrapper to resume successfully. URL: https://github.com/apache/beam/pull/11409#discussion_r407734012 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java ## @@ -743,8 +727,6 @@ public boolean tryClaim(UnboundedSourceValue[] position) { currentReader.close(); } catch (IOException closeException) { e.addSuppressed(closeException); -} finally { Review comment: Intended? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421655) Time Spent: 26h 50m (was: 26h 40m) > Fn API SDF support > -- > > Key: BEAM-2939 > URL: https://issues.apache.org/jira/browse/BEAM-2939 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Major > Labels: portability > Time Spent: 26h 50m > Remaining Estimate: 0h > > The Fn API should support streaming SDF. Detailed design TBD. > Once design is ready, expand subtasks similarly to BEAM-2822. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9709) timezone off by 8 hours
[ https://issues.apache.org/jira/browse/BEAM-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082670#comment-17082670 ] Yueyang Qiu commented on BEAM-9709: --- This is not supported because it creates an intermediate DATE type, and TIMESTAMP() constructing function from DATE is not supported (see https://cloud.google.com/dataflow/docs/reference/sql/timestamp_functions#timestamp ) > timezone off by 8 hours > --- > > Key: BEAM-9709 > URL: https://issues.apache.org/jira/browse/BEAM-9709 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Yueyang Qiu >Priority: Trivial > Labels: zetasql-compliance > > two failures in shard 13, one failure in shard 19 > {code} > Expected: ARRAY>[{2014-01-31 00:00:00+00}] > Actual: ARRAY>[{2014-01-31 08:00:00+00}], > {code} > {code} > select timestamp(date '2014-01-31') > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2939) Fn API SDF support
[ https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=421650&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421650 ] ASF GitHub Bot logged work on BEAM-2939: Author: ASF GitHub Bot Created on: 13/Apr/20 21:24 Start Date: 13/Apr/20 21:24 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11409: [BEAM-2939] Update unbounded source as SDF wrapper to resume successfully. URL: https://github.com/apache/beam/pull/11409#issuecomment-613105161 R: @ihji CC: @boyuanzz This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421650) Time Spent: 26h 40m (was: 26.5h) > Fn API SDF support > -- > > Key: BEAM-2939 > URL: https://issues.apache.org/jira/browse/BEAM-2939 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Major > Labels: portability > Time Spent: 26h 40m > Remaining Estimate: 0h > > The Fn API should support streaming SDF. Detailed design TBD. > Once design is ready, expand subtasks similarly to BEAM-2822. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2939) Fn API SDF support
[ https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=421649&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421649 ] ASF GitHub Bot logged work on BEAM-2939: Author: ASF GitHub Bot Created on: 13/Apr/20 21:23 Start Date: 13/Apr/20 21:23 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11409: [BEAM-2939] Update unbounded source as SDF wrapper to resume successfully. URL: https://github.com/apache/beam/pull/11409 This fixes a bug where UnboundedReader's API for start() and advance() return false when there is no data right now but there could be data in the future which is different from BoundedReader start() and advance() which only return false on completion. Verified on Dataflow with KafkaIO. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/j
[jira] [Updated] (BEAM-9745) [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to deserialize Custom DoFns and Custom Coders.
[ https://issues.apache.org/jira/browse/BEAM-9745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-9745: -- Priority: Blocker (was: Major) > [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to > deserialize Custom DoFns and Custom Coders. > - > > Key: BEAM-9745 > URL: https://issues.apache.org/jira/browse/BEAM-9745 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, java-fn-execution, sdk-java-harness, > test-failures >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Blocker > Labels: currently-failing > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4657/] > * [Gradle Build > Scan|https://scans.gradle.com/s/c3izncsa4u24k/tests/by-project] > Initial investigation: > The bug appears to be popping up on BigQuery tests mostly, but also a > BigTable and a Datastore test. > Here's an example stacktrace of the two errors, showing _only_ the error > messages themselves. Source: > [https://scans.gradle.com/s/c3izncsa4u24k/tests/efn4wciuamvqq-ccxt3jvofvqbe] > {noformat} > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error > received from SDK harness for instruction -191: > java.lang.IllegalArgumentException: unable to deserialize Custom DoFn With > Execution Info > ... > Caused by: java.lang.ClassNotFoundException: > org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3 > ... > Caused by: java.lang.RuntimeException: Error received from SDK harness for > instruction -191: java.lang.IllegalArgumentException: unable to deserialize > Custom DoFn With Execution Info > ... > Caused by: java.lang.ClassNotFoundException: > org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3 > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error > received from SDK harness for instruction -206: > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes > ... > Caused by: > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes > ... > Caused by: java.lang.IllegalArgumentException: unable to deserialize Custom > Coder Bytes > ... > Caused by: java.lang.ClassNotFoundException: > org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder > ... > Caused by: java.lang.RuntimeException: Error received from SDK harness for > instruction -206: > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes > ... > Caused by: > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes > ... > Caused by: java.lang.IllegalArgumentException: unable to deserialize Custom > Coder Bytes > ... > Caused by: java.lang.ClassNotFoundException: > org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder > ... > {noformat} > Update: Looks like this has been failing as far back as [Apr > 4|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4566/] > after a long period where the test was consistently timing out since [Mar > 31|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4546/]. > So it's hard to narrow down what commit may have caused this. Plus, the test > was failing due to a completely different BigQuery failure before anyway, so > it seems like this test will need to be completely fixed from scratch, > instead of tracking down a specific breaking change. > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9744) Python performance tests failing
[ https://issues.apache.org/jira/browse/BEAM-9744?focusedWorklogId=421642&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421642 ] ASF GitHub Bot logged work on BEAM-9744: Author: ASF GitHub Bot Created on: 13/Apr/20 21:17 Start Date: 13/Apr/20 21:17 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11408: [BEAM-9744] Remove --region option from SQL tests. URL: https://github.com/apache/beam/pull/11408 These tests don't run on Dataflow, so they don't recognize the region option. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_V
[jira] [Work logged] (BEAM-9744) Python performance tests failing
[ https://issues.apache.org/jira/browse/BEAM-9744?focusedWorklogId=421643&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421643 ] ASF GitHub Bot logged work on BEAM-9744: Author: ASF GitHub Bot Created on: 13/Apr/20 21:17 Start Date: 13/Apr/20 21:17 Worklog Time Spent: 10m Work Description: ibzib commented on issue #11408: [BEAM-9744] Remove --region option from SQL tests. URL: https://github.com/apache/beam/pull/11408#issuecomment-613102635 Run SQL PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421643) Time Spent: 1h 10m (was: 1h) > Python performance tests failing > > > Key: BEAM-9744 > URL: https://issues.apache.org/jira/browse/BEAM-9744 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > beam_PerformanceTests_WordCountIT_Py* failing because --region is missing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9710) Got current time instead of timestamp value
[ https://issues.apache.org/jira/browse/BEAM-9710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082664#comment-17082664 ] Yueyang Qiu commented on BEAM-9710: --- I am marking this issue as a blocker of BEAM-9179. I believe this could be fixed while we fix the type translation code. > Got current time instead of timestamp value > --- > > Key: BEAM-9710 > URL: https://issues.apache.org/jira/browse/BEAM-9710 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Yueyang Qiu >Priority: Trivial > Labels: zetasql-compliance > > one failure in shard 13 > {code} > Expected: ARRAY>[{2014-12-01 00:00:00+00}] > Actual: ARRAY>[{2020-04-06 > 00:20:40.052+00}], > {code} > {code} > [prepare_database] > CREATE TABLE Table1 AS > SELECT timestamp '2014-12-01' as timestamp_val > -- > ARRAY>[{2014-12-01 00:00:00+00}] > == > [name=timestamp_type_2] > SELECT timestamp_val > FROM Table1 > -- > ARRAY>[{2014-12-01 00:00:00+00}] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9710) Got current time instead of timestamp value
[ https://issues.apache.org/jira/browse/BEAM-9710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082658#comment-17082658 ] Yueyang Qiu commented on BEAM-9710: --- This seems to be caused by bad code from internal compliance test harness. I don't think this should be a Dataflow SQL GA blocker. > Got current time instead of timestamp value > --- > > Key: BEAM-9710 > URL: https://issues.apache.org/jira/browse/BEAM-9710 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Yueyang Qiu >Priority: Trivial > Labels: zetasql-compliance > > one failure in shard 13 > {code} > Expected: ARRAY>[{2014-12-01 00:00:00+00}] > Actual: ARRAY>[{2020-04-06 > 00:20:40.052+00}], > {code} > {code} > [prepare_database] > CREATE TABLE Table1 AS > SELECT timestamp '2014-12-01' as timestamp_val > -- > ARRAY>[{2014-12-01 00:00:00+00}] > == > [name=timestamp_type_2] > SELECT timestamp_val > FROM Table1 > -- > ARRAY>[{2014-12-01 00:00:00+00}] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=421626&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421626 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 13/Apr/20 20:43 Start Date: 13/Apr/20 20:43 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #11407: [BEAM-9562] Cherry-pick: Fix output timestamp to be inferred from scheduled time w… URL: https://github.com/apache/beam/pull/11407#issuecomment-613088556 R: @ibzib This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421626) Time Spent: 24h 10m (was: 24h) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-java-harness, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 24h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=421625&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421625 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 13/Apr/20 20:42 Start Date: 13/Apr/20 20:42 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #11407: [BEAM-9562] Cherry-pick: Fix output timestamp to be inferred from scheduled time w… URL: https://github.com/apache/beam/pull/11407 …hen in the event time domain. (cherry picked from commit 009578e374523f5acd8d24543ef1ceec30542a95) **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostComm
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=421624&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-421624 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 13/Apr/20 20:36 Start Date: 13/Apr/20 20:36 Worklog Time Spent: 10m Work Description: mxm commented on issue #11314: [BEAM-9562] Send Timers over Data Channel as Elements URL: https://github.com/apache/beam/pull/11314#issuecomment-613085672 I was actually working on something related to timers in #11362 and was surprised to see that the test failed when I opened the PR, since I had run tests locally. Then figured something must have changed on master in the meantime. Thanks for following up with this! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 421624) Time Spent: 23h 50m (was: 23h 40m) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-java-harness, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 23h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)