[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2021-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=640819&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-640819
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 23/Aug/21 19:11
Start Date: 23/Aug/21 19:11
Worklog Time Spent: 10m 
  Work Description: lostluck merged pull request #15358:
URL: https://github.com/apache/beam/pull/15358


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 640819)
Time Spent: 5.5h  (was: 5h 20m)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: P3
> Fix For: 2.33.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2021-08-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=640297&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-640297
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 20/Aug/21 16:16
Start Date: 20/Aug/21 16:16
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #15358:
URL: https://github.com/apache/beam/pull/15358#issuecomment-902805140


   Run Go PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 640297)
Time Spent: 5h 20m  (was: 5h 10m)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: P3
> Fix For: 2.33.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2021-08-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=640129&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-640129
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 20/Aug/21 06:01
Start Date: 20/Aug/21 06:01
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #15358:
URL: https://github.com/apache/beam/pull/15358#issuecomment-902452550






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 640129)
Time Spent: 5h 10m  (was: 5h)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: P3
> Fix For: 2.33.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2021-08-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=640128&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-640128
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 20/Aug/21 06:00
Start Date: 20/Aug/21 06:00
Worklog Time Spent: 10m 
  Work Description: lostluck opened a new pull request #15358:
URL: https://github.com/apache/beam/pull/15358


   * Avoid the interstitial PCollection between DoFns and the DataSink, and 
collect full data from the DataSink.
   * Don't collect PCollection data from the synthetic PCollection between 
MergeAccumulators and ExtractOutputs as there are no user analogs for that 
collection.
   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   `ValidatesRunner` compliance status (on master branch)
   
   
   
 
   
 Lang
 ULR
 Dataflow
 Flink
 Samza
 Spark
 Twister2
   
 
 
   
 Go
 ---
 
   https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/";>
 https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon";>
   
 
   https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/";>
 https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon";>
   
 
 
   https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Samza/lastCompletedBuild/";>
 https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Samza/lastCompletedBuild/badge/icon";>
   
 
 
   https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/";>
 https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon";>
   
 
 ---
   
   
 Java
 
   https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_ULR/lastCompletedBuild/";>
 https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_ULR/lastCompletedBuild/badge/icon";>
   
 
 
   https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/";>
 https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon?subject=V1";>
   
   https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Streaming/lastCompletedBuild/";>
 https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Streaming/lastCompletedBuild/badge/icon?subject=V1+Streaming";>
   
   https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/";>
 https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon?subject=V1+Java+11";>
   
   https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/";>
 https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/badge/icon?subject=V2";>
   
   https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2_Streaming/lastCompletedBuild/";>
 https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2_Streaming/lastCompletedBuild/badge/icon?subject=V2+Streaming";>
   
 
 
   https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/";>
 https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon?subject=Java+8";>
   
   https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/";>
 https://ci-beam.apache.org/job/bea

[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=637409&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637409
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 12/Aug/21 16:29
Start Date: 12/Aug/21 16:29
Worklog Time Spent: 10m 
  Work Description: lostluck merged pull request #15289:
URL: https://github.com/apache/beam/pull/15289


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637409)
Time Spent: 4h 50m  (was: 4h 40m)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: P3
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=637366&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637366
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 12/Aug/21 14:36
Start Date: 12/Aug/21 14:36
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #15289:
URL: https://github.com/apache/beam/pull/15289#issuecomment-897693463


   Run GoPortable PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637366)
Time Spent: 4h 40m  (was: 4.5h)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: P3
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2021-08-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=637186&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637186
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 11/Aug/21 23:53
Start Date: 11/Aug/21 23:53
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #15289:
URL: https://github.com/apache/beam/pull/15289#issuecomment-897236662


   Run Go Postcommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637186)
Time Spent: 4.5h  (was: 4h 20m)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: P3
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2021-08-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=637011&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637011
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 11/Aug/21 17:42
Start Date: 11/Aug/21 17:42
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #15289:
URL: https://github.com/apache/beam/pull/15289#discussion_r687043611



##
File path: sdks/go/pkg/beam/core/runtime/exec/pcollection_test.go
##
@@ -0,0 +1,154 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package exec
+
+import (
+   "context"
+   "math"
+   "testing"
+
+   "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder"
+   "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime"
+   "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window"
+)
+
+// TestPCollection verifies that the PCollection node works correctly.
+// Seed is by default set to 0, so we have a "deterministic" set of
+// randomness for the samples.
+func TestPCollection(t *testing.T) {
+   a := &CaptureNode{UID: 1}
+   pcol := &PCollection{UID: 2, Out: a, Coder: coder.NewVarInt()}
+   // The "large" 2nd value is to ensure the values are encoded properly,
+   // and that Min & Max are behaving.
+   inputs := []interface{}{int64(1), int64(20), int64(3)}
+   in := &FixedRoot{UID: 3, Elements: makeInput(inputs...), Out: pcol}
+
+   p, err := NewPlan("a", []Unit{a, pcol, in})
+   if err != nil {
+   t.Fatalf("failed to construct plan: %v", err)
+   }
+
+   if err := p.Execute(context.Background(), "1", DataContext{}); err != 
nil {
+   t.Fatalf("execute failed: %v", err)
+   }
+   if err := p.Down(context.Background()); err != nil {
+   t.Fatalf("down failed: %v", err)
+   }
+
+   expected := makeValues(inputs...)
+   if !equalList(a.Elements, expected) {
+   t.Errorf("multiplex returned %v for a, want %v", 
extractValues(a.Elements...), extractValues(expected...))
+   }
+   snap := pcol.snapshot()
+   if want, got := int64(len(expected)), snap.ElementCount; got != want {
+   t.Errorf("snapshot miscounted: got %v, want %v", got, want)
+   }
+   checkPCollectionSizeSample(t, snap, 3, 7, 1, 5)
+}
+
+func TestPCollection_sizeReset(t *testing.T) {
+   // Check the initial values after resetting.
+   var pcol PCollection
+   pcol.resetSize()
+   snap := pcol.snapshot()
+   checkPCollectionSizeSample(t, snap, 0, 0, math.MaxInt64, math.MinInt64)
+}
+
+func checkPCollectionSizeSample(t *testing.T, snap PCollectionSnapshot, count, 
sum, min, max int64) {
+   t.Helper()
+   if want, got := int64(count), snap.SizeCount; got != want {
+   t.Errorf("sample count incorrect: got %v, want %v", got, want)
+   }
+   if want, got := int64(sum), snap.SizeSum; got != want {
+   t.Errorf("sample sum incorrect: got %v, want %v", got, want)
+   }
+   if want, got := int64(min), snap.SizeMin; got != want {
+   t.Errorf("sample min incorrect: got %v, want %v", got, want)
+   }
+   if want, got := int64(max), snap.SizeMax; got != want {
+   t.Errorf("sample max incorrect: got %v, want %v", got, want)
+   }
+}
+
+// BenchmarkPCollection measures the overhead of invoking a ParDo in a plan.
+//
+// On @lostluck's desktop (2020/02/20):
+// BenchmarkPCollection-12 4469980624.8 ns/op  
   0 B/op  0 allocs/op
+func BenchmarkPCollection(b *testing.B) {
+   // Pre allocate the capture buffer and process buffer to avoid
+   // unnecessary overhead.
+   out := &CaptureNode{UID: 1, Elements: make([]FullValue, 0, b.N)}
+   process := make([]MainInput, 0, b.N)
+   for i := 0; i < b.N; i++ {
+   process = append(process, MainInput{Key: FullValue{
+   Windows:   window.SingleGlobalWindow,
+   Timestamp: mtime.ZeroTimestamp,
+ 

[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2021-08-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=637008&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637008
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 11/Aug/21 17:40
Start Date: 11/Aug/21 17:40
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #15289:
URL: https://github.com/apache/beam/pull/15289#discussion_r687042489



##
File path: sdks/go/pkg/beam/core/runtime/exec/datasink.go
##
@@ -38,7 +35,6 @@ type DataSink struct {
wEnc  WindowEncoder
w io.WriteCloser
count int64

Review comment:
   Ah! Good catch. Missed that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637008)
Time Spent: 4h 10m  (was: 4h)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: P3
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2021-08-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=636675&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636675
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 11/Aug/21 00:01
Start Date: 11/Aug/21 00:01
Worklog Time Spent: 10m 
  Work Description: youngoli commented on a change in pull request #15289:
URL: https://github.com/apache/beam/pull/15289#discussion_r685648169



##
File path: sdks/go/pkg/beam/core/runtime/exec/datasink.go
##
@@ -38,7 +35,6 @@ type DataSink struct {
wEnc  WindowEncoder
w io.WriteCloser
count int64

Review comment:
   Do you still need `count` here now that you got rid of the lines using 
it?

##
File path: sdks/go/pkg/beam/core/runtime/exec/pcollection_test.go
##
@@ -0,0 +1,154 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package exec
+
+import (
+   "context"
+   "math"
+   "testing"
+
+   "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder"
+   "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime"
+   "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window"
+)
+
+// TestPCollection verifies that the PCollection node works correctly.
+// Seed is by default set to 0, so we have a "deterministic" set of
+// randomness for the samples.
+func TestPCollection(t *testing.T) {
+   a := &CaptureNode{UID: 1}
+   pcol := &PCollection{UID: 2, Out: a, Coder: coder.NewVarInt()}
+   // The "large" 2nd value is to ensure the values are encoded properly,
+   // and that Min & Max are behaving.
+   inputs := []interface{}{int64(1), int64(20), int64(3)}
+   in := &FixedRoot{UID: 3, Elements: makeInput(inputs...), Out: pcol}
+
+   p, err := NewPlan("a", []Unit{a, pcol, in})
+   if err != nil {
+   t.Fatalf("failed to construct plan: %v", err)
+   }
+
+   if err := p.Execute(context.Background(), "1", DataContext{}); err != 
nil {
+   t.Fatalf("execute failed: %v", err)
+   }
+   if err := p.Down(context.Background()); err != nil {
+   t.Fatalf("down failed: %v", err)
+   }
+
+   expected := makeValues(inputs...)
+   if !equalList(a.Elements, expected) {
+   t.Errorf("multiplex returned %v for a, want %v", 
extractValues(a.Elements...), extractValues(expected...))
+   }
+   snap := pcol.snapshot()
+   if want, got := int64(len(expected)), snap.ElementCount; got != want {
+   t.Errorf("snapshot miscounted: got %v, want %v", got, want)
+   }
+   checkPCollectionSizeSample(t, snap, 3, 7, 1, 5)
+}
+
+func TestPCollection_sizeReset(t *testing.T) {
+   // Check the initial values after resetting.
+   var pcol PCollection
+   pcol.resetSize()
+   snap := pcol.snapshot()
+   checkPCollectionSizeSample(t, snap, 0, 0, math.MaxInt64, math.MinInt64)
+}
+
+func checkPCollectionSizeSample(t *testing.T, snap PCollectionSnapshot, count, 
sum, min, max int64) {
+   t.Helper()
+   if want, got := int64(count), snap.SizeCount; got != want {
+   t.Errorf("sample count incorrect: got %v, want %v", got, want)
+   }
+   if want, got := int64(sum), snap.SizeSum; got != want {
+   t.Errorf("sample sum incorrect: got %v, want %v", got, want)
+   }
+   if want, got := int64(min), snap.SizeMin; got != want {
+   t.Errorf("sample min incorrect: got %v, want %v", got, want)
+   }
+   if want, got := int64(max), snap.SizeMax; got != want {
+   t.Errorf("sample max incorrect: got %v, want %v", got, want)
+   }
+}
+
+// BenchmarkPCollection measures the overhead of invoking a ParDo in a plan.
+//
+// On @lostluck's desktop (2020/02/20):
+// BenchmarkPCollection-12 4469980624.8 ns/op  
   0 B/op  0 allocs/op
+func BenchmarkPCollection(b *testing.B) {
+   // Pre allocate the capture buffer and process buffer to avoid
+   // unnecessary overhead.
+   out := &CaptureNode{UID: 1, Elements: make([]FullV

[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2021-08-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=636098&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636098
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 09/Aug/21 21:19
Start Date: 09/Aug/21 21:19
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #15289:
URL: https://github.com/apache/beam/pull/15289#issuecomment-895556871






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 636098)
Time Spent: 3h 50m  (was: 3h 40m)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Priority: P3
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2021-08-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=636097&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636097
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 09/Aug/21 21:17
Start Date: 09/Aug/21 21:17
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #15289:
URL: https://github.com/apache/beam/pull/15289#discussion_r685527558



##
File path: sdks/go/pkg/beam/core/runtime/harness/monitoring.go
##
@@ -163,38 +162,65 @@ func monitoring(p *exec.Plan) ([]*pipepb.MonitoringInfo, 
map[string][]byte) {
}.ExtractFrom(store)
 
// Get the execution monitoring information from the bundle plan.
-   if snapshot, ok := p.Progress(); ok {
-   payload, err := metricsx.Int64Counter(snapshot.Count)
+
+   snapshot, ok := p.Progress()
+   if !ok {
+   return monitoringInfo, payloads
+   }
+   for _, pcol := range snapshot.PCols {
+   payload, err := metricsx.Int64Counter(pcol.ElementCount)
if err != nil {
panic(err)
}
 
// TODO(BEAM-9934): This metric should account for elements in 
multiple windows.
-   payloads[getShortID(metrics.PCollectionLabels(snapshot.PID), 
metricsx.UrnElementCount)] = payload
+   payloads[getShortID(metrics.PCollectionLabels(pcol.ID), 
metricsx.UrnElementCount)] = payload
+
monitoringInfo = append(monitoringInfo,
&pipepb.MonitoringInfo{
Urn:  
metricsx.UrnToString(metricsx.UrnElementCount),
Type: 
metricsx.UrnToType(metricsx.UrnElementCount),
Labels: map[string]string{
-   "PCOLLECTION": snapshot.PID,
+   "PCOLLECTION": pcol.ID,
},
Payload: payload,
})
 
-   payloads[getShortID(metrics.PTransformLabels(snapshot.ID), 
metricsx.UrnDataChannelReadIndex)] = payload
-   monitoringInfo = append(monitoringInfo,
-   &pipepb.MonitoringInfo{
-   Urn:  
metricsx.UrnToString(metricsx.UrnDataChannelReadIndex),
-   Type: 
metricsx.UrnToType(metricsx.UrnDataChannelReadIndex),
-   Labels: map[string]string{
-   "PTRANSFORM": snapshot.ID,
-   },
-   Payload: payload,
-   })
+   // Skip pcollections without size
+   if pcol.SizeCount != 0 {
+   payload, err := 
metricsx.Int64Distribution(pcol.SizeCount, pcol.SizeSum, pcol.SizeMin, 
pcol.SizeMax)
+   if err != nil {
+   panic(err)
+   }
+   monitoringInfo = append(monitoringInfo,
+   &pipepb.MonitoringInfo{
+   Urn:  
"beam:metric:sampled_byte_size:v1",
+   Type: 
"beam:metrics:distribution_int_64",

Review comment:
   Good catch. Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 636097)
Time Spent: 3h 40m  (was: 3.5h)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Priority: P3
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2021-08-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=636093&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636093
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 09/Aug/21 21:10
Start Date: 09/Aug/21 21:10
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #15289:
URL: https://github.com/apache/beam/pull/15289#issuecomment-895551534


   R: @youngoli @jrmccluskey @riteshghorse


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 636093)
Time Spent: 3.5h  (was: 3h 20m)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Priority: P3
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2021-08-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=635982&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635982
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 09/Aug/21 17:42
Start Date: 09/Aug/21 17:42
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #15289:
URL: https://github.com/apache/beam/pull/15289#discussion_r685395050



##
File path: sdks/go/pkg/beam/core/runtime/exec/pcollection.go
##
@@ -0,0 +1,153 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package exec
+
+import (
+   "context"
+   "fmt"
+   "math"
+   "math/rand"
+   "sync"
+   "sync/atomic"
+
+   "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder"
+)
+
+// PCollection is a passthrough node to collect PCollection metrics, and
+// must be placed as the Out node of any producer of a PCollection.
+//
+// In particular, must not be placed after a Multiplex, and must be placed
+// after a Flatten.
+type PCollection struct {

Review comment:
   IIRC, either I'm adding an object or duplicating all the code. Right now 
this was simplest, vs modifying each of the other node kinds.
   
   The main optimization I can do later is concretely make the Out nodes the 
PCollection type which will help the compiler inline the method calls avoiding 
the function call overhead where possible.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 635982)
Time Spent: 3h 20m  (was: 3h 10m)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Priority: P3
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2021-08-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=635976&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635976
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 09/Aug/21 17:36
Start Date: 09/Aug/21 17:36
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #15289:
URL: https://github.com/apache/beam/pull/15289#discussion_r685391106



##
File path: sdks/go/pkg/beam/core/runtime/exec/pcollection.go
##
@@ -0,0 +1,153 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package exec
+
+import (
+   "context"
+   "fmt"
+   "math"
+   "math/rand"
+   "sync"
+   "sync/atomic"
+
+   "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder"
+)
+
+// PCollection is a passthrough node to collect PCollection metrics, and
+// must be placed as the Out node of any producer of a PCollection.
+//
+// In particular, must not be placed after a Multiplex, and must be placed
+// after a Flatten.
+type PCollection struct {

Review comment:
   If you read the code in 
[ProcessElement](https://github.com/apache/beam/pull/15289/files/b3d4d1a90d0ccc36150a7ef50d497a0e793a3d9d#diff-a01d1e6315c7f0dc04c1148d3e17203dd25ff567b4f7c88a32b6e7cdc62e7920R83),
 we use the sampling technique I credited you for a year ago.
   
   I'm leaning towards that "always do the first 3 elements" thing, since I 
went with the random of first 3 three approach which fails on single element 
PCollections.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 635976)
Time Spent: 3h 10m  (was: 3h)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Priority: P3
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2021-08-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=635973&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635973
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 09/Aug/21 17:31
Start Date: 09/Aug/21 17:31
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #15289:
URL: https://github.com/apache/beam/pull/15289#discussion_r685387873



##
File path: sdks/go/pkg/beam/core/runtime/harness/harness.go
##
@@ -305,7 +313,7 @@ func (c *control) handleInstruction(ctx context.Context, 
req *fnpb.InstructionRe
data.Close()
state.Close()
 
-   mons, pylds := monitoring(plan)
+   mons, pylds := monitoring(plan, store)

Review comment:
   I tried only having the payloads, but Dataflow doesn't produce the 
metrics at all then, so we'll keep the monitoring infos around until Dataflow 
handles only payloads properly. 
   
   There might be an experiment to toggle to fix this, but there's no harm in 
the SDK waiting for the default to switch (other than the wire cost of not 
being exclusively on the short id requests & payload duplication).
   
   See 
https://console.cloud.google.com/dataflow/jobs/us-central1/2021-08-09_10_24_28-7240084760069835414?project=google.com:clouddfe
 which has both, vs  
https://console.cloud.google.com/dataflow/jobs/us-central1/2021-08-09_10_00_52-11618126016479762400?project=google.com:clouddfe
 which is just the payloads




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 635973)
Time Spent: 3h  (was: 2h 50m)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Priority: P3
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2021-08-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=635899&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635899
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 09/Aug/21 15:00
Start Date: 09/Aug/21 15:00
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #15289:
URL: https://github.com/apache/beam/pull/15289#discussion_r685272069



##
File path: sdks/go/pkg/beam/core/runtime/harness/monitoring.go
##
@@ -163,38 +162,65 @@ func monitoring(p *exec.Plan) ([]*pipepb.MonitoringInfo, 
map[string][]byte) {
}.ExtractFrom(store)
 
// Get the execution monitoring information from the bundle plan.
-   if snapshot, ok := p.Progress(); ok {
-   payload, err := metricsx.Int64Counter(snapshot.Count)
+
+   snapshot, ok := p.Progress()
+   if !ok {
+   return monitoringInfo, payloads
+   }
+   for _, pcol := range snapshot.PCols {
+   payload, err := metricsx.Int64Counter(pcol.ElementCount)
if err != nil {
panic(err)
}
 
// TODO(BEAM-9934): This metric should account for elements in 
multiple windows.
-   payloads[getShortID(metrics.PCollectionLabels(snapshot.PID), 
metricsx.UrnElementCount)] = payload
+   payloads[getShortID(metrics.PCollectionLabels(pcol.ID), 
metricsx.UrnElementCount)] = payload
+
monitoringInfo = append(monitoringInfo,
&pipepb.MonitoringInfo{
Urn:  
metricsx.UrnToString(metricsx.UrnElementCount),
Type: 
metricsx.UrnToType(metricsx.UrnElementCount),
Labels: map[string]string{
-   "PCOLLECTION": snapshot.PID,
+   "PCOLLECTION": pcol.ID,
},
Payload: payload,
})
 
-   payloads[getShortID(metrics.PTransformLabels(snapshot.ID), 
metricsx.UrnDataChannelReadIndex)] = payload
-   monitoringInfo = append(monitoringInfo,
-   &pipepb.MonitoringInfo{
-   Urn:  
metricsx.UrnToString(metricsx.UrnDataChannelReadIndex),
-   Type: 
metricsx.UrnToType(metricsx.UrnDataChannelReadIndex),
-   Labels: map[string]string{
-   "PTRANSFORM": snapshot.ID,
-   },
-   Payload: payload,
-   })
+   // Skip pcollections without size
+   if pcol.SizeCount != 0 {
+   payload, err := 
metricsx.Int64Distribution(pcol.SizeCount, pcol.SizeSum, pcol.SizeMin, 
pcol.SizeMax)
+   if err != nil {
+   panic(err)
+   }
+   monitoringInfo = append(monitoringInfo,
+   &pipepb.MonitoringInfo{
+   Urn:  
"beam:metric:sampled_byte_size:v1",
+   Type: 
"beam:metrics:distribution_int_64",
+   Labels: map[string]string{
+   "PCOLLECTION": pcol.ID,

Review comment:
   That's what those IDs are. (They're taken from the bundle graph and 
populated in 
[exec/translate.go](https://github.com/apache/beam/pull/15289/files/b3d4d1a90d0ccc36150a7ef50d497a0e793a3d9d#diff-5f1959624a48fea7877e2dd71f648ea3fe8029b75a84c0ac6f5220888176464aR315)
 )
   
   Good to validate beyond "it works on Dataflow" :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 635899)
Time Spent: 2h 50m  (was: 2h 40m)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Priority: P3
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> e

[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2021-08-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=635361&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635361
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 06/Aug/21 16:22
Start Date: 06/Aug/21 16:22
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #15289:
URL: https://github.com/apache/beam/pull/15289#issuecomment-894371204


   @ajamato You got pinged because you were cc'd on the first version of this 
PR, which I linked at the top of the description. Thank you for pointing out 
some areas I can probably clean up and update some of the documentation due to 
this drift.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 635361)
Time Spent: 2h 40m  (was: 2.5h)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Priority: P3
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=634935&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634935
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 06/Aug/21 03:12
Start Date: 06/Aug/21 03:12
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #15289:
URL: https://github.com/apache/beam/pull/15289#issuecomment-893969502


   @ajamato i must have accidentally clicked the suggested reviewer button, but 
I didn't tag you for review. 
   Much appreciated though, thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634935)
Time Spent: 2.5h  (was: 2h 20m)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Priority: P3
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=634902&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634902
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 06/Aug/21 01:02
Start Date: 06/Aug/21 01:02
Worklog Time Spent: 10m 
  Work Description: ajamato commented on a change in pull request #15289:
URL: https://github.com/apache/beam/pull/15289#discussion_r683867941



##
File path: sdks/go/pkg/beam/core/runtime/harness/monitoring.go
##
@@ -163,38 +162,65 @@ func monitoring(p *exec.Plan) ([]*pipepb.MonitoringInfo, 
map[string][]byte) {
}.ExtractFrom(store)
 
// Get the execution monitoring information from the bundle plan.
-   if snapshot, ok := p.Progress(); ok {
-   payload, err := metricsx.Int64Counter(snapshot.Count)
+
+   snapshot, ok := p.Progress()
+   if !ok {
+   return monitoringInfo, payloads
+   }
+   for _, pcol := range snapshot.PCols {
+   payload, err := metricsx.Int64Counter(pcol.ElementCount)
if err != nil {
panic(err)
}
 
// TODO(BEAM-9934): This metric should account for elements in 
multiple windows.
-   payloads[getShortID(metrics.PCollectionLabels(snapshot.PID), 
metricsx.UrnElementCount)] = payload
+   payloads[getShortID(metrics.PCollectionLabels(pcol.ID), 
metricsx.UrnElementCount)] = payload
+
monitoringInfo = append(monitoringInfo,
&pipepb.MonitoringInfo{
Urn:  
metricsx.UrnToString(metricsx.UrnElementCount),
Type: 
metricsx.UrnToType(metricsx.UrnElementCount),
Labels: map[string]string{
-   "PCOLLECTION": snapshot.PID,
+   "PCOLLECTION": pcol.ID,
},
Payload: payload,
})
 
-   payloads[getShortID(metrics.PTransformLabels(snapshot.ID), 
metricsx.UrnDataChannelReadIndex)] = payload
-   monitoringInfo = append(monitoringInfo,
-   &pipepb.MonitoringInfo{
-   Urn:  
metricsx.UrnToString(metricsx.UrnDataChannelReadIndex),
-   Type: 
metricsx.UrnToType(metricsx.UrnDataChannelReadIndex),
-   Labels: map[string]string{
-   "PTRANSFORM": snapshot.ID,
-   },
-   Payload: payload,
-   })
+   // Skip pcollections without size
+   if pcol.SizeCount != 0 {
+   payload, err := 
metricsx.Int64Distribution(pcol.SizeCount, pcol.SizeSum, pcol.SizeMin, 
pcol.SizeMax)
+   if err != nil {
+   panic(err)
+   }
+   monitoringInfo = append(monitoringInfo,
+   &pipepb.MonitoringInfo{
+   Urn:  
"beam:metric:sampled_byte_size:v1",
+   Type: 
"beam:metrics:distribution_int_64",
+   Labels: map[string]string{
+   "PCOLLECTION": pcol.ID,

Review comment:
   This must be the exact same string as the pcollection id passed into the 
ProcessBundleDescriptor
   
   
https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/model/fn-execution/src/main/proto/beam_fn_api.proto#L198

##
File path: sdks/go/pkg/beam/core/runtime/harness/monitoring.go
##
@@ -163,38 +162,65 @@ func monitoring(p *exec.Plan) ([]*pipepb.MonitoringInfo, 
map[string][]byte) {
}.ExtractFrom(store)
 
// Get the execution monitoring information from the bundle plan.
-   if snapshot, ok := p.Progress(); ok {
-   payload, err := metricsx.Int64Counter(snapshot.Count)
+
+   snapshot, ok := p.Progress()
+   if !ok {
+   return monitoringInfo, payloads
+   }
+   for _, pcol := range snapshot.PCols {
+   payload, err := metricsx.Int64Counter(pcol.ElementCount)
if err != nil {
panic(err)
}
 
// TODO(BEAM-9934): This metric should account for elements in 
multiple windows.
-   payloads[getShortID(metrics.PCollectionLabels(snapshot.PID), 
metricsx.UrnElementCount)] = payload
+   payloads[getShortID(metrics.PCollectionLabels(pcol.ID), 
metricsx.UrnElementCount)] = payload
+
monitoringInfo = append(monitoringInfo,
&pipepb.Monit

[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=634888&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634888
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 06/Aug/21 00:12
Start Date: 06/Aug/21 00:12
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #15289:
URL: https://github.com/apache/beam/pull/15289#issuecomment-893907683


   R: @youngoli @jrmccluskey @riteshghorse 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634888)
Time Spent: 2h 10m  (was: 2h)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Priority: P3
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=634847&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634847
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 05/Aug/21 22:44
Start Date: 05/Aug/21 22:44
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #15289:
URL: https://github.com/apache/beam/pull/15289#issuecomment-893867957


   Run Go Samza ValidatesRunner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634847)
Time Spent: 1h 50m  (was: 1h 40m)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Priority: P3
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=634848&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634848
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 05/Aug/21 22:44
Start Date: 05/Aug/21 22:44
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #15289:
URL: https://github.com/apache/beam/pull/15289#issuecomment-893868018


   Run Go Spark ValidatesRunner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634848)
Time Spent: 2h  (was: 1h 50m)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Priority: P3
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=634846&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634846
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 05/Aug/21 22:44
Start Date: 05/Aug/21 22:44
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #15289:
URL: https://github.com/apache/beam/pull/15289#issuecomment-893867804






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634846)
Time Spent: 1h 40m  (was: 1.5h)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Priority: P3
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=634830&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634830
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 05/Aug/21 22:03
Start Date: 05/Aug/21 22:03
Worklog Time Spent: 10m 
  Work Description: lostluck opened a new pull request #15289:
URL: https://github.com/apache/beam/pull/15289


   Restoring https://github.com/apache/beam/pull/10942 to narrow down where the 
post submits failed previously.
   
   ---
   
   This adds PCollection metrics to the Go SDK, in particular, Element Count, 
and Sampled Size.
   
   New exec.PCollection nodes are added between every processing node in the 
bundle execution graph.
   * The new metrics are only added as MonitoringInfos, not the legacy protos.
   * There's about ~10ns added per element per PCollection node due to the 
atomic additions for every element.
   * Elements for sizes are selected randomly, then encoded to count their 
bytes (w/o window headers).
 * An initial index is selected form the first [0,1,2] at bundle start up, 
and then pre-select the next index from somewhere later on, proportional to the 
bundle so far.
 * As currently set up, it will take around 200-300 samples for the first 
1M elements, so encoded overhead is limited
   * PCollections from a DataSource do 100% "sampling", since they're reading 
the bytes directly anyway. The PCollection node that would have been added 
after the DataSource is elided from the graph during construction, but re-used 
to avoid duplicating the logic for concurrently manipulating the size 
distribution.
 * DataSources can properly handle CoGBKs as well, counting non-header 
bytes for iterables, and state backed iterables.
 * This still involves a mutex Lock for every update, so we may want to 
find a lighter weight mechanism to handle the distribution samples from 
DataSources, or simply opt for the same random sampling.
 * A similar method could be used for DataSinks as well, but not handled in 
this PR. 
 * It's important to note that the runner is already aware of the number of 
bytes sent and received from the SDK side, so we may opt to remove that this 
entirely.
   * Counts and Samples are yet not made for SideInputs, which would better 
account for data consumed by DoFns.
   
   Thank you @ajamato for reminding me of the pre-select method for sampling, 
and @lukecwik for pointing out the DataSource can avoid separate additional 
encoding costs when measuring elements.
   
   Performance impact:
   I have two jobs I use for benchmarking this: Pipeline A uses int64s as 
elements and does simple passthroughs and sums, and Pipeline B where it's using 
large protocol buffers as elements, which spends a fair amount of CPU time 
decoding them.
   
   For small "fast" elements, the overhead is about ~19.5% of the Go side 
processing (which makes sense if elements are just being passed around or 
incremented).
   For large "heavy" elements, the overhead is about ~0.125% of the Go side of 
processing.
   
   Specifically, this is only taking into account the Go SDK worker, and not 
any runner side costs. This feels acceptable for the time being, though it's 
possible we can improve this later, especially for "lighter" jobs.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   `ValidatesRunner` compliance status (on master branch)
   
   
   
 
   
 Lang
 ULR
 Dataflow
 Flink
 Samza
 Spark
 Twister2
   
 
 
   
 Go
 ---
 
   https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/";>
 https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon";>
   
 
   https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/";>
 https://ci-bea

[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=398804&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398804
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 06/Mar/20 00:49
Start Date: 06/Mar/20 00:49
Worklog Time Spent: 10m 
  Work Description: lostluck commented on issue #11061: Revert "[BEAM-6374] 
Emit PCollection metrics from GoSDK"
URL: https://github.com/apache/beam/pull/11061#issuecomment-595519768
 
 
   Run Go Postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398804)
Time Spent: 1h 20m  (was: 1h 10m)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=398801&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398801
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 06/Mar/20 00:48
Start Date: 06/Mar/20 00:48
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #11061: Revert 
"[BEAM-6374] Emit PCollection metrics from GoSDK"
URL: https://github.com/apache/beam/pull/11061
 
 
   Reverts apache/beam#10942 
   
   Seems to be breaking the post commit. Since I'm going on vacation tonight, 
I'm rolling to back, and will look into it when I get back.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398801)
Time Spent: 1h 10m  (was: 1h)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=397885&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397885
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 04/Mar/20 21:03
Start Date: 04/Mar/20 21:03
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #10942: [BEAM-6374] 
Emit PCollection metrics from GoSDK
URL: https://github.com/apache/beam/pull/10942
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397885)
Time Spent: 1h  (was: 50m)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=393015&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393015
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 25/Feb/20 22:36
Start Date: 25/Feb/20 22:36
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #10942: [BEAM-6374] 
Emit PCollection metrics from GoSDK
URL: https://github.com/apache/beam/pull/10942#discussion_r384168998
 
 

 ##
 File path: sdks/go/pkg/beam/core/runtime/exec/datasource_test.go
 ##
 @@ -201,6 +201,16 @@ func TestDataSource_Iterators(t *testing.T) {
if got, want := iVals, expectedKeys; !equalList(got, 
want) {
t.Errorf("DataSource => %#v, want %#v", 
extractValues(got...), extractValues(want...))
}
+
+   // We're using small ints, so do some quick math to 
validate.
+   sizeOfSmallInt := 1
 
 Review comment:
   You're right that my wording is ambiguous. I literally mean small ints as in 
0-127 which are definitely encoded as a single byte with varint64 encoding.  
   
   If we use larger integers (as one of the PCollection tests do), the encoded 
size is larger.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 393015)
Time Spent: 50m  (was: 40m)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=392928&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392928
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 25/Feb/20 21:09
Start Date: 25/Feb/20 21:09
Worklog Time Spent: 10m 
  Work Description: youngoli commented on pull request #10942: [BEAM-6374] 
Emit PCollection metrics from GoSDK
URL: https://github.com/apache/beam/pull/10942#discussion_r383583118
 
 

 ##
 File path: sdks/go/pkg/beam/core/runtime/exec/plan.go
 ##
 @@ -178,19 +160,25 @@ func (p *Plan) String() string {
return fmt.Sprintf("Plan[%v]:\n%v", p.ID(), strings.Join(units, "\n"))
 }
 
-// Progress returns a snapshot of input progress of the plan, and associated 
metrics.
-func (p *Plan) Progress() (ProgressReportSnapshot, bool) {
-   if p.source != nil {
-   return p.source.Progress(), true
-   }
-   return ProgressReportSnapshot{}, false
+// PlanSnapshot contains system metrics for the current run of the plan.
+type PlanSnapshot struct {
+   Source ProgressReportSnapshot
+   PCols  []PCollectionSnapshot
 }
 
-// Store returns the metric store for the last use of this plan.
-func (p *Plan) Store() *metrics.Store {
-   p.storeMu.Lock()
-   defer p.storeMu.Unlock()
-   return p.store
+// Progress returns a snapshot of progress of the plan, and associated metrics.
 
 Review comment:
   This comment should probably be updated to explain that the bool returned 
represents whether the snapshot has a DataSource, as opposed to the usual 
assumption of an "ok" value.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 392928)
Time Spent: 0.5h  (was: 20m)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=392929&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392929
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 25/Feb/20 21:09
Start Date: 25/Feb/20 21:09
Worklog Time Spent: 10m 
  Work Description: youngoli commented on pull request #10942: [BEAM-6374] 
Emit PCollection metrics from GoSDK
URL: https://github.com/apache/beam/pull/10942#discussion_r384070704
 
 

 ##
 File path: sdks/go/pkg/beam/core/runtime/exec/datasource_test.go
 ##
 @@ -201,6 +201,16 @@ func TestDataSource_Iterators(t *testing.T) {
if got, want := iVals, expectedKeys; !equalList(got, 
want) {
t.Errorf("DataSource => %#v, want %#v", 
extractValues(got...), extractValues(want...))
}
+
+   // We're using small ints, so do some quick math to 
validate.
+   sizeOfSmallInt := 1
 
 Review comment:
   Just checking, we use the size of a small int instead of an int64 because 
the keys are encoded as small ints, right?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 392929)
Time Spent: 40m  (was: 0.5h)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2020-02-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=391291&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-391291
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 23/Feb/20 04:56
Start Date: 23/Feb/20 04:56
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #10942: [BEAM-6374] 
Emit PCollection metrics from GoSDK
URL: https://github.com/apache/beam/pull/10942
 
 
   This adds PCollection metrics to the Go SDK, in particular, Element Count, 
and Sampled Size.
   
   New exec.PCollection nodes are added between every processing node in the 
bundle execution graph.
   * The new metrics are only added as MonitoringInfos, not the legacy protos.
   * There's about ~10ns added per element per PCollection node due to the 
atomic additions for every element.
   * Elements for sizes are selected randomly, then encoded to count their 
bytes (w/o window headers).
 * An initial index is selected form the first [0,1,2] at bundle start up, 
and then pre-select the next index from somewhere later on, proportional to the 
bundle so far.
 * As currently set up, it will take around 200-300 samples for the first 
1M elements, so encoded overhead is limited
   * PCollections from a DataSource do 100% "sampling", since they're reading 
the bytes directly anyway. The PCollection node that would have been added 
after the DataSource is elided from the graph during construction, but re-used 
to avoid duplicating the logic for concurrently manipulating the size 
distribution.
 * DataSources can properly handle CoGBKs as well, counting non-header 
bytes for iterables, and state backed iterables.
 * This still involves a mutex Lock for every update, so we may want to 
find a lighter weight mechanism to handle the distribution samples from 
DataSources, or simply opt for the same random sampling.
 * A similar method could be used for DataSinks as well, but not handled in 
this PR. 
 * It's important to note that the runner is already aware of the number of 
bytes sent and received from the SDK side, so we may opt to remove that this 
entirely.
   * Counts and Samples are yet not made for SideInputs, which would better 
account for data consumed by DoFns.
   
   Thank you @ajamato for reminding me of the pre-select method for sampling, 
and @lukecwik for pointing out the DataSource can avoid separate additional 
encoding costs when measuring elements.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)

[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2020-02-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=391292&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-391292
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 23/Feb/20 04:56
Start Date: 23/Feb/20 04:56
Worklog Time Spent: 10m 
  Work Description: lostluck commented on issue #10942: [BEAM-6374] Emit 
PCollection metrics from GoSDK
URL: https://github.com/apache/beam/pull/10942#issuecomment-590027126
 
 
   R: @youngoli 
   cc: @ajamato @lukecwik 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 391292)
Time Spent: 20m  (was: 10m)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)