[ 
https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15739694#comment-15739694
 ] 

Aviem Zur commented on BEAM-1126:
---------------------------------

Reasoning: 
The backlog accessors are a very good indicator for application monitoring. As 
such, we plan to expose backlog as aggregators in spark-runner. 
Number of events is more human comprehensible than bytes. Specifically, in 
Kafka, backlog (or lag) is reasoned about in {{number of messages}}. See: 
https://kafka.apache.org/documentation#others_monitoring
If I understand correctly, for {{PubSub}} it is more common to reason about 
backlog in bytes, however, the implementation for {{KafkaIO}} seems forced, 
applying a byte approximation on a value that is originally in {{number of 
messages}}:
{code:java}
synchronized long approxBacklogInBytes() {
  // Note that is an an estimate of uncompressed backlog.
  if (latestOffset < 0 || nextOffset < 0) {
    return UnboundedReader.BACKLOG_UNKNOWN;
  }
  return Math.max(0, (long) ((latestOffset - nextOffset) * avgRecordSize));
}
{code}
In conclusion - it seems that the API was written with {{PubSub}} in mind, 
however, {{Kafka}}, the open source equivalent, relates to backlog in terms of 
{{number of messages}}. 

> Expose UnboundedSource split backlog in number of events
> --------------------------------------------------------
>
>                 Key: BEAM-1126
>                 URL: https://issues.apache.org/jira/browse/BEAM-1126
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: Aviem Zur
>            Assignee: Daniel Halperin
>            Priority: Minor
>
> Today {{UnboundedSource}} exposes split backlog in bytes via 
> {{getSplitBacklogBytes()}}
> There is value in exposing backlog in number of events as well, since this 
> number can be more human comprehensible than bytes. something like 
> {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to