[jira] [Updated] (SPARK-18516) Separate instantaneous state from progress performance statistics

2016-11-20 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-18516:
-
Description: 
There are two types of information that you want to be able to extract from a 
running query: instantaneous _status_ and metrics about the performance as make 
_progress_ in query processing.

Today, these are conflated in a single {{StreamingQueryStatus}} object.  The 
downside to this approach is that a user now needs to reason about what state 
the query is in anytime they retrieve a status object.  Fields like 
{{statusMessage}} don't appear in updates that come from listener bus.  And 
inputRate/processingRate statistics are usually {{0}} when you retrieve a 
status object from the query itself.

I propose we make the follow changes:
 - Make {{status}} only report instantaneous things, such as if data is 
available or a human readable message about what phase we are currently in.
 - Have a separate {{progress}} message that we report for each trigger with 
the other performance information that lives in status today.  You should be 
able to easily retrieve a configurable number of the most recent progress 
messages instead of just the most recent.

While we are making these changes, I propose that we also change {{id}} to be a 
globally unique identifier, rather than a JVM unique one.  Without this its 
hard to correlate performance across restarts.

  was:
There are two types of information that you want to be able to extract from a 
running query: instantaneous _status_ and metrics about the performance as make 
_progress_ in query processing.

Today, these are conflated in a single {{StreamingQueryStatus}} object.  The 
downside to this approach is that a user now needs to reason about what state 
the query is in anytime they retrieve a status object.  Fields like 
{{statusMessage}} don't appear in messages that come from listener bus.  And 
inputRate/processingRate statistics are usually {{0}} when you retrieve a 
status object from the query itself.

I propose we make the follow changes:
 - Make {{status}} only report instantaneous things, such as if data is 
available or a human readable message about what phase we are currently in.
 - Have a separate {{progress}} message that we report for each trigger with 
the other performance information that lives in status today.  You should be 
able to easily retrieve a configurable number of the most recent progress 
messages instead of just the most recent.

While we are making these changes, I propose that we also change {{id}} to be a 
globally unique identifier, rather than a JVM unique one.  Without this its 
hard to correlate performance across restarts.


> Separate instantaneous state from progress performance statistics
> -
>
> Key: SPARK-18516
> URL: https://issues.apache.org/jira/browse/SPARK-18516
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>Priority: Blocker
>
> There are two types of information that you want to be able to extract from a 
> running query: instantaneous _status_ and metrics about the performance as 
> make _progress_ in query processing.
> Today, these are conflated in a single {{StreamingQueryStatus}} object.  The 
> downside to this approach is that a user now needs to reason about what state 
> the query is in anytime they retrieve a status object.  Fields like 
> {{statusMessage}} don't appear in updates that come from listener bus.  And 
> inputRate/processingRate statistics are usually {{0}} when you retrieve a 
> status object from the query itself.
> I propose we make the follow changes:
>  - Make {{status}} only report instantaneous things, such as if data is 
> available or a human readable message about what phase we are currently in.
>  - Have a separate {{progress}} message that we report for each trigger with 
> the other performance information that lives in status today.  You should be 
> able to easily retrieve a configurable number of the most recent progress 
> messages instead of just the most recent.
> While we are making these changes, I propose that we also change {{id}} to be 
> a globally unique identifier, rather than a JVM unique one.  Without this its 
> hard to correlate performance across restarts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18516) Separate instantaneous state from progress performance statistics

2016-11-20 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-18516:
-
Description: 
There are two types of information that you want to be able to extract from a 
running query: instantaneous _status_ and metrics about the performance as make 
_progress_ in query processing.

Today, these are conflated in a single {{StreamingQueryStatus}} object.  The 
downside to this approach is that a user now needs to reason about what state 
the query is in anytime they retrieve a status object.  Fields like 
{{statusMessage}} don't appear in updates that come from listener bus.  
Simlarly, {{inputRate}}/{{processingRate}} statistics are usually {{0}} when 
you retrieve a status object from the query itself.

I propose we make the follow changes:
 - Make {{status}} only report instantaneous things, such as if data is 
available or a human readable message about what phase we are currently in.
 - Have a separate {{progress}} message that we report for each trigger with 
the other performance information that lives in status today.  You should be 
able to easily retrieve a configurable number of the most recent progress 
messages instead of just the most recent.

While we are making these changes, I propose that we also change {{id}} to be a 
globally unique identifier, rather than a JVM unique one.  Without this its 
hard to correlate performance across restarts.

  was:
There are two types of information that you want to be able to extract from a 
running query: instantaneous _status_ and metrics about the performance as make 
_progress_ in query processing.

Today, these are conflated in a single {{StreamingQueryStatus}} object.  The 
downside to this approach is that a user now needs to reason about what state 
the query is in anytime they retrieve a status object.  Fields like 
{{statusMessage}} don't appear in updates that come from listener bus.  And 
inputRate/processingRate statistics are usually {{0}} when you retrieve a 
status object from the query itself.

I propose we make the follow changes:
 - Make {{status}} only report instantaneous things, such as if data is 
available or a human readable message about what phase we are currently in.
 - Have a separate {{progress}} message that we report for each trigger with 
the other performance information that lives in status today.  You should be 
able to easily retrieve a configurable number of the most recent progress 
messages instead of just the most recent.

While we are making these changes, I propose that we also change {{id}} to be a 
globally unique identifier, rather than a JVM unique one.  Without this its 
hard to correlate performance across restarts.


> Separate instantaneous state from progress performance statistics
> -
>
> Key: SPARK-18516
> URL: https://issues.apache.org/jira/browse/SPARK-18516
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>Priority: Blocker
>
> There are two types of information that you want to be able to extract from a 
> running query: instantaneous _status_ and metrics about the performance as 
> make _progress_ in query processing.
> Today, these are conflated in a single {{StreamingQueryStatus}} object.  The 
> downside to this approach is that a user now needs to reason about what state 
> the query is in anytime they retrieve a status object.  Fields like 
> {{statusMessage}} don't appear in updates that come from listener bus.  
> Simlarly, {{inputRate}}/{{processingRate}} statistics are usually {{0}} when 
> you retrieve a status object from the query itself.
> I propose we make the follow changes:
>  - Make {{status}} only report instantaneous things, such as if data is 
> available or a human readable message about what phase we are currently in.
>  - Have a separate {{progress}} message that we report for each trigger with 
> the other performance information that lives in status today.  You should be 
> able to easily retrieve a configurable number of the most recent progress 
> messages instead of just the most recent.
> While we are making these changes, I propose that we also change {{id}} to be 
> a globally unique identifier, rather than a JVM unique one.  Without this its 
> hard to correlate performance across restarts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org