[ 
https://issues.apache.org/jira/browse/BEAM-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-9934:
----------------------------
    Description: 
The [element 
count|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/model/pipeline/src/main/proto/metrics.proto#L206]
 metric represents the number of elements within a PCollection and is 
interpreted differently across the Beam SDK versions.

In the [Java 
SDK|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java#L207]
 this represents the number of elements and includes how many windows those 
elements are in. This metric is incremented as soon as the element has been 
output.

In the [Python 
SDK|https://github.com/apache/beam/blame/bfd151aa4c3aad29f3aea6482212ff8543ded8d7/sdks/python/apache_beam/runners/worker/opcounters.py#L247]
 this represents the number of elements and doesn't include how many windows 
those elements are in. The metric is also only incremented after the element 
has finished processing.

The [Go 
SDK|https://github.com/apache/beam/blob/7097850daa46674b88425a124bc442fc8ce0dcb8/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L260]
 does the same thing as Python.

Traditionally in Dataflow this has always been the exploded window element 
count and the counter is incremented as soon as the element is output.

  was:
The [element 
count|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/model/pipeline/src/main/proto/metrics.proto#L206]
 metric represents the number of elements within a PCollection and is 
interpreted differently across the Beam SDK versions.

In the [Java 
SDK|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java#L207]
 this represents the number of elements and includes how many windows those 
elements are in. This metric is incremented as soon as the element has been 
output.

In the [Python 
SDK|https://github.com/apache/beam/blame/bfd151aa4c3aad29f3aea6482212ff8543ded8d7/sdks/python/apache_beam/runners/worker/opcounters.py#L247]
 this represents the number of elements and doesn't include how many windows 
those elements are in. The metric is also only incremented after the element 
has finished processing.

The [Go 
SDK|https://github.com/apache/beam/blob/7097850daa46674b88425a124bc442fc8ce0dcb8/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L260]
 does the same thing as Python.

Traditionally in Dataflow this has always been the exploded window element 
count and the counter if updated on output and not when the processing is 
finished as can be seen 
[here|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowOutputCounter.java#L63]
 and 
[here|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/common/worker/OutputReceiver.java#L41].


> Resolve differences in beam:metric:element_count:v1 implementations
> -------------------------------------------------------------------
>
>                 Key: BEAM-9934
>                 URL: https://issues.apache.org/jira/browse/BEAM-9934
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-go, sdk-java-harness, sdk-py-harness
>            Reporter: Luke Cwik
>            Assignee: Luke Cwik
>            Priority: Major
>
> The [element 
> count|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/model/pipeline/src/main/proto/metrics.proto#L206]
>  metric represents the number of elements within a PCollection and is 
> interpreted differently across the Beam SDK versions.
> In the [Java 
> SDK|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java#L207]
>  this represents the number of elements and includes how many windows those 
> elements are in. This metric is incremented as soon as the element has been 
> output.
> In the [Python 
> SDK|https://github.com/apache/beam/blame/bfd151aa4c3aad29f3aea6482212ff8543ded8d7/sdks/python/apache_beam/runners/worker/opcounters.py#L247]
>  this represents the number of elements and doesn't include how many windows 
> those elements are in. The metric is also only incremented after the element 
> has finished processing.
> The [Go 
> SDK|https://github.com/apache/beam/blob/7097850daa46674b88425a124bc442fc8ce0dcb8/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L260]
>  does the same thing as Python.
> Traditionally in Dataflow this has always been the exploded window element 
> count and the counter is incremented as soon as the element is output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to