[jira] [Updated] (SPARK-18516) Separate instantaneous state from progress performance statistics
[ https://issues.apache.org/jira/browse/SPARK-18516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18516: - Description: There are two types of information that you want to be able to extract from a running query: instantaneous _status_ and metrics about the performance as make _progress_ in query processing. Today, these are conflated in a single {{StreamingQueryStatus}} object. The downside to this approach is that a user now needs to reason about what state the query is in anytime they retrieve a status object. Fields like {{statusMessage}} don't appear in updates that come from listener bus. And inputRate/processingRate statistics are usually {{0}} when you retrieve a status object from the query itself. I propose we make the follow changes: - Make {{status}} only report instantaneous things, such as if data is available or a human readable message about what phase we are currently in. - Have a separate {{progress}} message that we report for each trigger with the other performance information that lives in status today. You should be able to easily retrieve a configurable number of the most recent progress messages instead of just the most recent. While we are making these changes, I propose that we also change {{id}} to be a globally unique identifier, rather than a JVM unique one. Without this its hard to correlate performance across restarts. was: There are two types of information that you want to be able to extract from a running query: instantaneous _status_ and metrics about the performance as make _progress_ in query processing. Today, these are conflated in a single {{StreamingQueryStatus}} object. The downside to this approach is that a user now needs to reason about what state the query is in anytime they retrieve a status object. Fields like {{statusMessage}} don't appear in messages that come from listener bus. And inputRate/processingRate statistics are usually {{0}} when you retrieve a status object from the query itself. I propose we make the follow changes: - Make {{status}} only report instantaneous things, such as if data is available or a human readable message about what phase we are currently in. - Have a separate {{progress}} message that we report for each trigger with the other performance information that lives in status today. You should be able to easily retrieve a configurable number of the most recent progress messages instead of just the most recent. While we are making these changes, I propose that we also change {{id}} to be a globally unique identifier, rather than a JVM unique one. Without this its hard to correlate performance across restarts. > Separate instantaneous state from progress performance statistics > - > > Key: SPARK-18516 > URL: https://issues.apache.org/jira/browse/SPARK-18516 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Reporter: Michael Armbrust >Assignee: Michael Armbrust >Priority: Blocker > > There are two types of information that you want to be able to extract from a > running query: instantaneous _status_ and metrics about the performance as > make _progress_ in query processing. > Today, these are conflated in a single {{StreamingQueryStatus}} object. The > downside to this approach is that a user now needs to reason about what state > the query is in anytime they retrieve a status object. Fields like > {{statusMessage}} don't appear in updates that come from listener bus. And > inputRate/processingRate statistics are usually {{0}} when you retrieve a > status object from the query itself. > I propose we make the follow changes: > - Make {{status}} only report instantaneous things, such as if data is > available or a human readable message about what phase we are currently in. > - Have a separate {{progress}} message that we report for each trigger with > the other performance information that lives in status today. You should be > able to easily retrieve a configurable number of the most recent progress > messages instead of just the most recent. > While we are making these changes, I propose that we also change {{id}} to be > a globally unique identifier, rather than a JVM unique one. Without this its > hard to correlate performance across restarts. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18516) Separate instantaneous state from progress performance statistics
[ https://issues.apache.org/jira/browse/SPARK-18516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18516: - Description: There are two types of information that you want to be able to extract from a running query: instantaneous _status_ and metrics about the performance as make _progress_ in query processing. Today, these are conflated in a single {{StreamingQueryStatus}} object. The downside to this approach is that a user now needs to reason about what state the query is in anytime they retrieve a status object. Fields like {{statusMessage}} don't appear in updates that come from listener bus. Simlarly, {{inputRate}}/{{processingRate}} statistics are usually {{0}} when you retrieve a status object from the query itself. I propose we make the follow changes: - Make {{status}} only report instantaneous things, such as if data is available or a human readable message about what phase we are currently in. - Have a separate {{progress}} message that we report for each trigger with the other performance information that lives in status today. You should be able to easily retrieve a configurable number of the most recent progress messages instead of just the most recent. While we are making these changes, I propose that we also change {{id}} to be a globally unique identifier, rather than a JVM unique one. Without this its hard to correlate performance across restarts. was: There are two types of information that you want to be able to extract from a running query: instantaneous _status_ and metrics about the performance as make _progress_ in query processing. Today, these are conflated in a single {{StreamingQueryStatus}} object. The downside to this approach is that a user now needs to reason about what state the query is in anytime they retrieve a status object. Fields like {{statusMessage}} don't appear in updates that come from listener bus. And inputRate/processingRate statistics are usually {{0}} when you retrieve a status object from the query itself. I propose we make the follow changes: - Make {{status}} only report instantaneous things, such as if data is available or a human readable message about what phase we are currently in. - Have a separate {{progress}} message that we report for each trigger with the other performance information that lives in status today. You should be able to easily retrieve a configurable number of the most recent progress messages instead of just the most recent. While we are making these changes, I propose that we also change {{id}} to be a globally unique identifier, rather than a JVM unique one. Without this its hard to correlate performance across restarts. > Separate instantaneous state from progress performance statistics > - > > Key: SPARK-18516 > URL: https://issues.apache.org/jira/browse/SPARK-18516 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Reporter: Michael Armbrust >Assignee: Michael Armbrust >Priority: Blocker > > There are two types of information that you want to be able to extract from a > running query: instantaneous _status_ and metrics about the performance as > make _progress_ in query processing. > Today, these are conflated in a single {{StreamingQueryStatus}} object. The > downside to this approach is that a user now needs to reason about what state > the query is in anytime they retrieve a status object. Fields like > {{statusMessage}} don't appear in updates that come from listener bus. > Simlarly, {{inputRate}}/{{processingRate}} statistics are usually {{0}} when > you retrieve a status object from the query itself. > I propose we make the follow changes: > - Make {{status}} only report instantaneous things, such as if data is > available or a human readable message about what phase we are currently in. > - Have a separate {{progress}} message that we report for each trigger with > the other performance information that lives in status today. You should be > able to easily retrieve a configurable number of the most recent progress > messages instead of just the most recent. > While we are making these changes, I propose that we also change {{id}} to be > a globally unique identifier, rather than a JVM unique one. Without this its > hard to correlate performance across restarts. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org