[ https://issues.apache.org/jira/browse/BEAM-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Luke Cwik updated BEAM-9934: ---------------------------- Description: The [element count|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/model/pipeline/src/main/proto/metrics.proto#L206] metric represents the number of elements within a PCollection and is interpreted differently across the Beam SDK versions. In the [Java SDK|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java#L207] this represents the number of elements and includes how many windows those elements are in. This metric is incremented as soon as the element has been output. In the [Python SDK|https://github.com/apache/beam/blame/bfd151aa4c3aad29f3aea6482212ff8543ded8d7/sdks/python/apache_beam/runners/worker/opcounters.py#L247] this represents the number of elements and doesn't include how many windows those elements are in. The metric is also only incremented after the element has finished processing. The [Go SDK|https://github.com/apache/beam/blob/7097850daa46674b88425a124bc442fc8ce0dcb8/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L260] does the same thing as Python. Traditionally in Dataflow this has always been the exploded window element count and the counter is incremented as soon as the element is output. was: The [element count|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/model/pipeline/src/main/proto/metrics.proto#L206] metric represents the number of elements within a PCollection and is interpreted differently across the Beam SDK versions. In the [Java SDK|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java#L207] this represents the number of elements and includes how many windows those elements are in. This metric is incremented as soon as the element has been output. In the [Python SDK|https://github.com/apache/beam/blame/bfd151aa4c3aad29f3aea6482212ff8543ded8d7/sdks/python/apache_beam/runners/worker/opcounters.py#L247] this represents the number of elements and doesn't include how many windows those elements are in. The metric is also only incremented after the element has finished processing. The [Go SDK|https://github.com/apache/beam/blob/7097850daa46674b88425a124bc442fc8ce0dcb8/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L260] does the same thing as Python. Traditionally in Dataflow this has always been the exploded window element count and the counter if updated on output and not when the processing is finished as can be seen [here|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowOutputCounter.java#L63] and [here|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/common/worker/OutputReceiver.java#L41]. > Resolve differences in beam:metric:element_count:v1 implementations > ------------------------------------------------------------------- > > Key: BEAM-9934 > URL: https://issues.apache.org/jira/browse/BEAM-9934 > Project: Beam > Issue Type: Bug > Components: sdk-go, sdk-java-harness, sdk-py-harness > Reporter: Luke Cwik > Assignee: Luke Cwik > Priority: Major > > The [element > count|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/model/pipeline/src/main/proto/metrics.proto#L206] > metric represents the number of elements within a PCollection and is > interpreted differently across the Beam SDK versions. > In the [Java > SDK|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java#L207] > this represents the number of elements and includes how many windows those > elements are in. This metric is incremented as soon as the element has been > output. > In the [Python > SDK|https://github.com/apache/beam/blame/bfd151aa4c3aad29f3aea6482212ff8543ded8d7/sdks/python/apache_beam/runners/worker/opcounters.py#L247] > this represents the number of elements and doesn't include how many windows > those elements are in. The metric is also only incremented after the element > has finished processing. > The [Go > SDK|https://github.com/apache/beam/blob/7097850daa46674b88425a124bc442fc8ce0dcb8/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L260] > does the same thing as Python. > Traditionally in Dataflow this has always been the exploded window element > count and the counter is incremented as soon as the element is output. -- This message was sent by Atlassian Jira (v8.3.4#803005)