Kobi Salant created BEAM-1048: --------------------------------- Summary: Spark Runner streaming batch duration does not include duration of reading from source Key: BEAM-1048 URL: https://issues.apache.org/jira/browse/BEAM-1048 Project: Beam Issue Type: Bug Components: runner-spark Affects Versions: 0.4.0-incubating Reporter: Kobi Salant Assignee: Amit Sela
Spark Runner streaming batch duration does not include duration of reading from source this is because we perform rdd.count in SparkUnboundedSourcewhich that invokes a regular spark job outside the streaming context. We do it for reporting the batch size both for UI and back pressure -- This message was sent by Atlassian JIRA (v6.3.4#6332)