[ 
https://issues.apache.org/jira/browse/SAMZA-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14063637#comment-14063637
 ] 

Yan Fang commented on SAMZA-300:
--------------------------------

Thanks for the explanation. 

{quote}
1) expose intra-job (job-level) metrics in the YARN AM 2) expose inter-job 
(topology-level) metrics in a standalone dashboard.
{quote}

These may need to be discussed and done in other tickets too. 

{quote}
we should open some tickets for the relevant ones. All three you list are 
useful. Ganglia also, probably.
{quote}

Created separate tickets for each format. 

> Track producers and consumers of streams
> ----------------------------------------
>
>                 Key: SAMZA-300
>                 URL: https://issues.apache.org/jira/browse/SAMZA-300
>             Project: Samza
>          Issue Type: New Feature
>            Reporter: Martin Kleppmann
>
> Each Samza job runs independently, which has a lot of advantages. However, 
> there are situations in which it would be valuable to have a global overview 
> of the data flows between jobs. For example:
> - It's important for correctness that only one job ever publishes to a given 
> checkpoint or changelog stream — if several jobs publish to the same stream, 
> the result is nonsensical. However, we currently have no way of enforcing 
> that. It would be good if a job could take a "write lock" on a stream, and 
> thus prevent others from writing to it.
> - It would be awesome to have a dashboard/visualization that graphically 
> shows the job graph, and visually highlights the health of a job (e.g. 
> whether a job is fallen behind).
> - The job graph would also be generally useful for tracking data provenance 
> (finding consumers who would be affected by a schema change, finding the team 
> that is responsible for producing a particular stream, etc)
> - Potentially could include additional metadata about streams, e.g. owner, 
> serialization format, schema, documentation of semantics of the data, etc. 
> (HCatalog for streams?)
> One possibility would be for Kafka to add some of this functionality, 
> although it may also make sense to implement it in Samza (that way it would 
> be available for non-Kafka systems as well, and could use knowledge about the 
> job that Samza has, but Kafka hasn't).
> This is just a vague description to start a discussion. Please comment with 
> your ideas on how to best implement this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to