[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-07-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379850#comment-15379850
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user sumitchawla commented on the issue:

https://github.com/apache/flink/pull/1947
  
Thanks a lot @zentol .. this is great.. 


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: 1.1.0
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-07-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379835#comment-15379835
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/1947
  
@sumitchawla sure you can, as described here: 
https://ci.apache.org/projects/flink/flink-docs-master/apis/metrics.html#registering-metrics


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: 1.1.0
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-07-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379829#comment-15379829
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user sumitchawla commented on the issue:

https://github.com/apache/flink/pull/1947
  
@zentol .. by job writer i meant end user writing jobs using Flink API.  As 
of now i can create custom accumulators using 
`getRuntimeContext().addAccumulator(ACCUMULATOR_NAME,...` can i do something 
similar to register custom metrics in my transformations


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: 1.1.0
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-07-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378997#comment-15378997
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/1947
  
What do you mean by "Job writers"?


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: 1.1.0
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-07-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378685#comment-15378685
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user sumitchawla commented on the issue:

https://github.com/apache/flink/pull/1947
  
hi @zentol .. thanks for the information.  This is great and i could see 
the metrics in JMX.  I have one more question on interoperability of 
Accumulators and Metrics.  As per my understanding, currently Metrics are only 
available at system level , and User Accumulators are available to Job writers. 
 Is there any plan for supporting Metrics for Job writers? Metrics give much 
more capabilities than current Accumulators and would be a great way to track 
custom metrics at each operator level? 

  


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: 1.1.0
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372495#comment-15372495
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/1947
  
No. They are currently only accessible via JMX or a system supported by a 
reporter. They are not available in the Dashboard, this will be worked on for 
1.2 .

You could write a reporter that listens for http requests and returns the 
appropriate value.


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: 1.1.0
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-07-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371842#comment-15371842
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user sumitchawla commented on the issue:

https://github.com/apache/flink/pull/1947
  
@zentol is there any HTTP interface to these metrics that can be used to 
query the metrics? Something similar to existing JobManager Accumulators urls?


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: 1.1.0
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-05-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296335#comment-15296335
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user zentol closed the pull request at:

https://github.com/apache/flink/pull/1947


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: 1.1.0
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-05-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295685#comment-15295685
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-220850557
  
I did a pass over the code and committed the result:

Manually merged in 003ce18efc0249fae874e56c3df6acf19f5f2429 and 
707606ac40dbbbd497fcbbb5442870fec5468bf3


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-05-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291073#comment-15291073
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-220319335
  
Let me grab the token for this one. There are a few things still, like 
resource leaks, in this code.
I'll pass you back the token as soon as I am done...


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-05-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289493#comment-15289493
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-220112639
  
renaming the module is a good idea


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-05-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289384#comment-15289384
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-220099057
  
How about renaming the maven module from `flink-metrics` to 
`flink-metric-reporters`? The metrics systems itself is part of `flink-core` 
after all...


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-05-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289290#comment-15289290
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-220087815
  
I've had an offline chat with @StephanEwen and as a result updated the PR.

List of changes:
* dropped Meters, Histograms and Timers
 * we weren't convinced that the performance of the implementation used is 
efficient enough for our use-cases
 * removed Reservoir/Snapshot and various wrapper classes
* moved ScheduledDropWizardReporter into a new flink-metrics-dropwizard 
module within flink-metrics
 * removes Dropwizard usage entirely from flink-core
* the MetricRegistry no longer maintains maps of all metrics; reporters are 
from now on responsible for doing this
 * Listener interface was removed
 * AbstractReporter class was added that implements this behaviour
 * will make it easier to support multiple reporters in the future
* Counter and Gauge no longer implement the DropWizard interface
 * added Counter-/GaugeWrapper classes for DropWizard reporters


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-05-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15288897#comment-15288897
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-220015012
  
I would like to take an iteration on this, make some changes on top of 
this, and open a new pull request afterwards.


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-05-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15288895#comment-15288895
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-220014866
  
Few followups are actually needed before merging this:

  1. We need to remove the example metrics

  2. Conflicting metrics names should not result in failures. Metrics are 
tooling, and problems in the tooling should not fail the core programs.

  3. I think we should limit the available metrics types to Gauge and 
Counter for now. I looked at Timers, Meters, and Histograms - they are very 
high overhead each. As a follow-up, I would like to see if we can construct 
simple Meters as views over counters. That way, the runtime code as no overhead 
for the metering (it just maintains counters and gauges) and the registry code 
needs to turn them into Meters asynchronously.


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-05-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15288632#comment-15288632
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-219964437
  
Hmm, seeing some local test failures after the re-base. Have to look into 
this.
Typical message is `java.lang.IllegalArgumentException: This group already 
contains a metric named KeyCount`


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-05-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15288618#comment-15288618
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-219959102
  
Tests were good!
Rebased again, re-running tests, will merge after that.


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287673#comment-15287673
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-219864781
  
I rebased the branch, waiting for Travis to give a green light...


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287529#comment-15287529
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-219849631
  
The code in this PR is the most up-to-date.


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287454#comment-15287454
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-219836974
  
This looks pretty good now.
I would like to get this in soon, now that the test are passing.
Let's iterate over it on the master.

This needs a rebase to master, though,

Also saw that you have a "metrics_v4" branch now. Is that one newer than 
the pull request?


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-05-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278071#comment-15278071
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-218153465
  
I'm currently aware of the following issues:
* quite a few tests in flink-runtime fail; this is mostly due to a missing 
integration into the MockEnvironments. Fix coming up
* there is an issue in the IO metrics for operations with multiple outputs. 
This one is a bit more tricky, but a not-so-pretty solution is on the way.


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-05-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278037#comment-15278037
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-218146940
  
Looks nice, I am trying it out now


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-05-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276921#comment-15276921
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-217972596
  
I have updates this PR; it now contains a new feature: configurable system 
scope. It essentially allows users to decide how Flink entities(Taskmanager, 
jobs, tasks and operators) are represented in metric names.

Users can configure a different format strings for every entity type in the 
flink-conf.yaml; which format is applied to a metric depends to which entity it 
is bound to.

For example, users can do the following:
* omit taskmanager information for jobs/tasks/operators
* re-order properties, like having the job properties ahead of the 
taskmanager
* decided whether they want to use names, ID's or a mix of both!

I've updated the main post to include more information.


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275389#comment-15275389
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user ankitcha commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-217669339
  
@zentol thanks for the suggestion. I think that can work, I will try that 
and confirm.

Thanks!


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-05-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274400#comment-15274400
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user ankitcha commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-217507497
  
@rmetzger thanks for the response. I think I got what you explained and I 
agree that using index in configuration keys won't be a nice experience. 

But, can we support nested structure in flink conf? I am unsure about the 
scope of this change, so maybe its a bad suggestion. But, this is something 
that will really help me out to put our application in production and we have 
to use multiple reporters as part of our infrastructure requirements. 


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-05-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273812#comment-15273812
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-217389510
  
Okay. I think we need to support multiple instances of the same job on a 
TaskManager.


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-05-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273740#comment-15273740
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-217368732
  
@rmetzger you are correct that you're job failed since the previous one 
wasn't cleaned up yet. Should you try to run 2 identical jobs in parallel it 
will fail, since 2 jobs would use the same metrics due to name clashes. Note 
that in this version this also occurs when 2 operators have the same name. I 
have some additional functionality coming up that would allow you to circumvent 
this issue.

@ankitcha The problem with multiple reporters is our configuration, it only 
supports single-line key-value pairs, and you need to know the exact key to 
access it. In order to configure multiple reporters you would either need a 
nested structure (which is not supportet), or index the configuration keys 
(metrics.reporter.1.class) and add a new parameter containing the indices to 
use (e.g. metrics.reporter: 0, 1), which isn't particularly user-friendly. The 
metric system itself could deal with multiple reporters with minor 
modifications; it's all about the configuration.


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-05-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272825#comment-15272825
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user ankitcha commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-217240628
  
Guys, Thanks you for this awesome addition to Flink. 

I was just wondering if there is a way to configure multiple reporters as 
well?


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-05-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272662#comment-15272662
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-217214295
  
I'm just assuming its the missing "close()" call's I've commented already 
causing this issue.

I just tried adding some custom metrics:
```java
new RichFlatMapFunction() {
public Counter output;
public Counter el;
public String lastOut;

@Override
public void open(Configuration 
parameters) throws Exception {
super.open(parameters);
MetricGroup mg = 
getRuntimeContext().getMetricGroup();

this.el = 
mg.counter("elements");

MetricGroup detailedGroup = 
mg.addGroup("detailed");
this.output = 
detailedGroup.counter("output");
detailedGroup.gauge("lastOut", 
new Gauge() {
@Override
public String 
getValue() {
return lastOut;
}
});
}

@Override
public void flatMap(String value, 
Collector out) {
el.inc();
for (String word : 
value.split("\\s")) {
lastOut  = word;
out.collect(new 
WordWithCount(word, 1L));
output.inc();
}
}
}
```

and it works amazingly well


![image](https://cloud.githubusercontent.com/assets/89049/15050706/9ecc6fda-12f5-11e6-8c7d-cc7cb3657ecd.png)



> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-05-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272654#comment-15272654
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-217213101
  
Is it possible that submitting the same job two times doesn't work?

```
./bin/flink run ./examples/streaming/SocketWindowWordCount.jar --port 54323

05/05/2016 19:09:08 Job execution switched to status RUNNING.
05/05/2016 19:09:08 Source: Socket Stream -> Flat Map(1/1) switched to 
SCHEDULED 
05/05/2016 19:09:08 Source: Socket Stream -> Flat Map(1/1) switched to 
DEPLOYING 
05/05/2016 19:09:08 Fast SlidingProcessingTimeWindows(5000, 1000) of 
WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) 
switched to SCHEDULED 
05/05/2016 19:09:08 Fast SlidingProcessingTimeWindows(5000, 1000) of 
WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) 
switched to DEPLOYING 
05/05/2016 19:09:08 Source: Socket Stream -> Flat Map(1/1) switched to 
RUNNING 
05/05/2016 19:09:08 Source: Socket Stream -> Flat Map(1/1) switched to 
FAILED 
java.lang.IllegalArgumentException: This group ([key0, localhost, Actor, 
TaskManager, TaskManager, 964566cfcf032710aff86614010fce21, Category, Tasks, 
Job, "Socket Window WordCount", Operator, Flat Map, SubTask, 0, ChannelType, 
OutputChannel, Index, 0]) already contains a metric named numBytesOut
at org.apache.flink.metrics.MetricGroup.addMetric(MetricGroup.java:246)
at org.apache.flink.metrics.MetricGroup.counter(MetricGroup.java:123)
at 
org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.setMetrics(SpanningRecordSerializer.java:206)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.setMetrics(RecordWriter.java:216)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain.createStreamOutput(OperatorChain.java:294)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:94)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:188)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
at java.lang.Thread.run(Thread.java:745)

05/05/2016 19:09:08 Fast SlidingProcessingTimeWindows(5000, 1000) of 
WindowedStream.main(SocketWindowWordCount.java:79) -> Sink: Unnamed(1/1) 
switched to RUNNING 
05/05/2016 19:09:08 Job execution switched to status FAILING.

```


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-04-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263841#comment-15263841
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/1947#discussion_r61556020
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala
 ---
@@ -152,6 +154,9 @@ class TaskManager(
   /** Registry of metrics periodically transmitted to the JobManager */
   private val metricRegistry = TaskManager.createMetricsRegistry()
--- End diff --

when we can display the new metrics in the dashboard.


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-04-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263834#comment-15263834
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/1947#discussion_r61555221
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala
 ---
@@ -924,6 +936,15 @@ class TaskManager(
 else {
   libraryCacheManager = Some(new FallbackLibraryCacheManager)
 }
+
+metricsRegistry = new 
FlinkMetricRegistry(GlobalConfiguration.getConfiguration);
--- End diff --

I think it's better to use `config.configuration`. At some point in time we 
might get around to removing the `GlobalConfiguration`.


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-04-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263831#comment-15263831
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/1947#discussion_r61555065
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala
 ---
@@ -152,6 +154,9 @@ class TaskManager(
   /** Registry of metrics periodically transmitted to the JobManager */
   private val metricRegistry = TaskManager.createMetricsRegistry()
--- End diff --

What are the plans for removing/subsuming this?


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-04-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262284#comment-15262284
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/1947#discussion_r61443660
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java ---
@@ -683,6 +688,9 @@ else if (STATE_UPDATER.compareAndSet(this, current, 
ExecutionState.FAILED)) {
 
// remove all of the tasks library resources
libraryCache.unregisterTask(jobId, executionId);
+   
+   //Uncomment before Merging!!!
+   //metrics.close();
--- End diff --

Okay, I see.


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-04-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262277#comment-15262277
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/1947#discussion_r61443006
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java ---
@@ -683,6 +688,9 @@ else if (STATE_UPDATER.compareAndSet(this, current, 
ExecutionState.FAILED)) {
 
// remove all of the tasks library resources
libraryCache.unregisterTask(jobId, executionId);
+   
+   //Uncomment before Merging!!!
+   //metrics.close();
--- End diff --

it makes it a bit easier to try out; without this the metrics are no longer 
exposed when the job is finished. so if you run one of the examples with the 
built-in data you'll see nothing (probably).


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-04-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262259#comment-15262259
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/1947#discussion_r61440312
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java ---
@@ -683,6 +688,9 @@ else if (STATE_UPDATER.compareAndSet(this, current, 
ExecutionState.FAILED)) {
 
// remove all of the tasks library resources
libraryCache.unregisterTask(jobId, executionId);
+   
+   //Uncomment before Merging!!!
+   //metrics.close();
--- End diff --

why not now?


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-04-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262253#comment-15262253
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/1947#discussion_r61439797
  
--- Diff: flink-metrics/flink-metrics-ganglia/pom.xml ---
@@ -0,0 +1,84 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd;>
+
+   4.0.0
+
+   
+   org.apache.flink
+   flink-metrics
+   1.1-SNAPSHOT
+   ..
+   
+
+   flink-metrics-ganglia
+   flink-metrics-ganglia
+
+   
+   
+   org.apache.flink
+   flink-core
+   ${project.version}
+   provided
+   
+
+   
+   info.ganglia.gmetric4j
+   gmetric4j
+   1.0.7
--- End diff --

License seems to be ok: 
https://github.com/ganglia/gmetric4j/blob/master/COPYING 


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-04-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262213#comment-15262213
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

GitHub user zentol opened a pull request:

https://github.com/apache/flink/pull/1947

[FLINK-1502] Basic Metric System

This PR is a preview of the new metric system. 
It is not complete because
* there is no documentation for the website
* a few smaller parts also don't have code documentation
* I haven't tried out the ganglia/statsD reporter yet

In general though it works and it is now time to gather some feedback.

The PR is organized into several commits to give it some structure; 
generally divided by which part of the system they expose the metric system to. 
Note that  The last commit "Metric Usage Examples" is not technically part of 
the PR but showcases the usage.
The division was done very simple, so some changes may technically belong 
to several commits.
## General overview

A user can access a system-provided MetricGroup to register a Metric, which 
is stored in a MetricRegistry and forwarded regularly to a Reporter which 
communicates them to an external system.

## MetricGroups

MetricGroups are the user-facing part of the system. They are a nested data 
structure, containing other groups and metrics, that allow registering metrics 
with Flink while organizing them in a hierarchy.

For example, every TaskManager has a MetricGroup, and for every task that 
is deployed a new sub-group for that task is added. This task specific group is 
propagated through the task stack, with new groups/metrics being added. Within 
a UDF the operator MetricGroup is accessed through the RuntimeContext.

## Metrics

Metrics are the objects used to measure something.

Metrics include 
* Gauges, that measure a value on-demand
* Meters, that measure the rate/count of events
* Histograms, that measure the distribution of long values
* Counters, that count stuff
* Timers, that measure rate of calls and distribution of execution time for 
a given piece of code.

Under the hood we use the Metrics from the Dropwizard library. In order to 
ensure interface stability, and to give us the option to reimplement things 
without breaking everything, they (and other classes) are wrapped to match our 
interfaces. 

## Reporters

Reporters are the component that communicate the Metrics to the outside 
world. With this PR we allow exporting Metrics via JMX (default), Graphite, 
Ganglia and StatsD. They interval in which they report is configurable.

Similarly to Metrics, we partially use reporters from the DropWizard 
library (Graphite, Ganglia), again wrapped to match out interfaces.

Reporters are configured via flink-conf.yaml.

An example configuration might look like this:
metrics.reporter.class: org.apache.flink.metrics.GraphiteReporter
metrics.reporter.arguments: --host localhost --port 8080
metrics.reporter.interval: 30 SECONDS

Reporters are instantiated generically and configured with a Configuration 
containing the parsed arguments. All non-JMXReporters are not part of the 
distribution and have to be added to the classpath manually (usually by putting 
the jar into /lib)

JMX uses the port 9010 by default, This can be configured by setting the 
metrics.jmx.port property in the flink-conf.yaml

## Registry

The registry is essentially just a connection between all MetricGroups and 
the Reporter.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zentol/flink metrics_v2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1947.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1947


commit b90b53cd73824389b41978f0113ca0c6d3da1422
Author: zentol 
Date:   2016-04-15T13:57:14Z

Add basic metric structures

-add dropwizard dependency to flink-core
-add metric wrappers
-add metric groups/category organization
-add metric registry

commit 45e6e123d37a8fba1bf76386a84436e8fb04a9fa
Author: zentol 
Date:   2016-04-19T11:28:28Z

Graphite/Ganglia/StatsD Reporters

commit e634060d83f2b475e954c67424ba39e3ffd92b6b
Author: zentol 
Date:   2016-04-13T16:47:04Z

Task Integration

-included job name in TaskDeploymentDescriptor
-enabled remote JMX for TaskManager
-added TaskManager status metrics

commit 20ca6c3b19690e08335e31fcf3377f4a511e9b00
Author: zentol 
Date:   2016-04-13T14:50:16Z

Environment Integration

-add MetricGroup field 

[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-02-22 Thread Jamie Grier (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158090#comment-15158090
 ] 

Jamie Grier commented on FLINK-1502:


[~eastcirclek] Let's define our terms to  make sure we're talking about the 
same thing.

*Session*: A single instance of a Job Manager and some # of TaskManagers 
working together.   A session can be created "on-the-fly" for a single job or 
it can be a long-running thing.  Multiple jobs can start, run, and finish in 
the same session.  Think of the "yarn-session.sh" command.  This creates a 
session outside of any particular job.  This is also what I've meant when I've 
said "cluster".  A Yarn session is a "cluster" that we've spun up for some 
length of time on Yarn.  Another example of a cluster would be a standalone 
install of Flink on some # of machines.

*Job*: A single batch or streaming job that runs on a Flink cluster.

In the above scenario, and if your definition of sessions is in agreement with 
mine.  You would instead have the following.  Note that I've named the cluster 
according to the "session" name you've given, because in this case each session 
is really a different (ad-hoc) cluster.  When you run a job directly using just 
"flink run -ytm ..." on YARN you are spinning up an ad-hoc cluster for your job.

After Session 1 is finished, Node 1 would have the following metrics:

- cluster.session1.taskmanager.1.gc_time

After session 2 is finshed, Node 1 would have the following metrics:

- cluster.session1.taskmanager.1.gc_time 
- cluster.session2.taskmanager.2.gc_time
- cluster.session3.taskmanager.3.gc_time

There are many metrics in this case because that's exactly what you want.  
These are JVM scope metrics we are talking about and those are 3 different 
JVMS, not the same one so it makes total sense for them to have these different 
names/scopes.  These metrics have nothing to do with each other and it doesn't 
matter which host they are from.  They are scoped to the cluster (or session) 
and logical TaskManager index, not the host.

The above should not be confused with any host level metrics we want to report. 
 Host level metrics would be scoped simply by the hostname so they wouldn't 
grow either.

One more example, hopefully to clarify.  Let's say I spun up a long-running 
cluster (or session) using yarn-session.sh -tm 3.  Now we have a Flink cluster 
running on YARN with no jobs running and three TaskManagers.  We then run three 
different jobs one after another on this cluster.  The metrics would still 
simply be:

- cluster.yarn-session.taskmanager.1.gc_time
- cluster.yarn-session.taskmanager.2.gc_time
- cluster.yarn-session.taskmanager.3.gc_time

No matter how many jobs you ran this list would not grow, which is natural 
because there have only been 3 TaskManagers.  Now if one of these TaskManagers 
were to fail and be restarted it would assume the same name -- that's the point 
of using "logical" indexes so the set of metrics name in that case still would 
not be larger than the above.

In the initial case you describe above if you didn't want lot's of different 
metrics over time you could also just give all of your sessions the same name.  
You're metrics are growing because you're spinning up many different clusters 
(sessions) over time with different names each time.  If you used the same name 
for the cluster (session) every time this metrics namespace growth would not 
occur.

I hope any of that made sense ;)  This is getting a bit hard to describe this 
way.  We could also sync via Hangouts or something if that is easier.



> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Dongwon Kim
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-02-22 Thread Dongwon Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158023#comment-15158023
 ] 

Dongwon Kim commented on FLINK-1502:


Let's consider the following scenario:

 |  Node1 (N1)  |   N2   |   N3 
--
Session1   |   TM1|  TM2  |  TM3
Session2   |   TM2   |  TM3  |  TM1
Session3   |   TM3   |  TM2  |  TM1

After Session1 is finished, Node1 have the following metrics:
- cluster.MyCluster.taskmanager.1.gc_time 

After Session2 is finished, Node1 have the following metrics:
- cluster.MyCluster.taskmanager.1.gc_time 
- cluster.MyCluster.taskmanager.2.gc_time

After Session3 is finished, Node1 have the following metrics:
- cluster.MyCluster.taskmanager.1.gc_time 
- cluster.MyCluster.taskmanager.2.gc_time
- cluster.MyCluster.taskmanager.3.gc_time
Around this time, a user should check which metric is the one for the current 
session among the above three metrics.
The problem is getting worse if the user has to launch much more TaskManagers.
For example, 500 TaskManagers over multiple sessions will end up with 500 
metrics for each host.

Wouldn't be better to assign indexes to TaskManagers scoped to each host?

p.s.
I'm going to start without considering multiple TaskManagers on the same node 
as we haven't yet reached a consensus.
But I think we still need to develop this discussion further.

> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Dongwon Kim
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-02-21 Thread Dongwon Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156330#comment-15156330
 ] 

Dongwon Kim commented on FLINK-1502:


[~jgrier] Thank you for sharing the link :-) The page said that ganglia already 
stores counters when the slope type of a metric is set to Slope.POSITIVE, which 
I wasn't aware of. 
I found the following explanation about Slope.POSITIVE:
"Using the value positive for the slope of a new metric will cause the 
corresponding RRD file to be generated as a COUNTER, with delta values being 
displayed instead of the actual metric values."
You're right. We actually didn't need to do that.

I also don't have any idea regarding the query language.
If Ganglia supports it, we could store metrics as GAUGE and just change a way 
to draw graphs.

> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Dongwon Kim
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-02-18 Thread Jamie Grier (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152917#comment-15152917
 ] 

Jamie Grier commented on FLINK-1502:


To be clear what I meant here is to have the indexes assigned to the 
TaskManagers scoped to the *entire* cluster.  Not a particular host like what 
you're describing here.  So, for example, if you spun up a Flink cluster with 
10 TaskManagers running on 10 different hosts the TaskManager's would be given 
a unique INDEX on the cluster.  Literally, TaskManager[1-10].  Use this to 
scope the metrics, e.g.:

cluster.MyCluster.taskmanager.1.gc_time
cluster.MyCluster.taskmanager.2.gc_time
...
...
cluster.MyCluster.taskmanager.10.gc_time

It doesn't matter which hosts they are on.  These are 10 unique JVMS on some 
set of hosts.



> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Dongwon Kim
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-02-18 Thread Jamie Grier (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152895#comment-15152895
 ] 

Jamie Grier commented on FLINK-1502:


I'm suggesting that we use Dropwizard Metrics library and expose those metrics 
via JMX at a minimum/default, but also via optional configuration we could let 
user's report metrics via any of the Metrics library's available metrics 
Reporter classes.  Ganglia and Graphite are both supported via the built-in 
GangliaReporter and GraphitesReporter, but there are integrations with other 
systems as well.  Of particular interest to people running in production would 
be StatsD, Librato, InfluxDB, etc.

https://dropwizard.github.io/metrics/3.1.0/manual/third-party/

What I'm suggesting is that we should expose the ability for people to 
choose/configure which Reporters to use, but we should default to JMX.  Many 
3rd party tools will be able to consume/route these metrics if they're 
available via JMX so that should be the default.


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Dongwon Kim
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-02-18 Thread Jamie Grier (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152873#comment-15152873
 ] 

Jamie Grier commented on FLINK-1502:


[~eastcirclek] Yes, I believe it does.  It's implicit in the metric type that 
get's reported to Ganglia.  I believe what we want is Slope.POSITIVE for 
counters.  I imagine the  Dropwizard metrics library would already do this 
correctly for metrics with type Counter (as opposed to gauge) -- but maybe not.

See here:  
http://codeblog.majakorpi.net/post/16281432462/ganglia-xml-slope-attribute

Also, is there no query language in Ganglia when building a graph that allows 
you graph the rate of change rather than the actual metric?  I'm not too 
familiar with Ganglia.

> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Dongwon Kim
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-02-17 Thread Dongwon Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151460#comment-15151460
 ] 

Dongwon Kim commented on FLINK-1502:


[~jgrier] What you are suggesting is that TaskManagers publish metrics to JMX 
first and then optionally report metrics from JMX to Ganglia/Graphite?

How about using Hadoop's metrics2 library to collect and report TaskManager's 
metrics?


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Dongwon Kim
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-02-17 Thread Dongwon Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151444#comment-15151444
 ] 

Dongwon Kim commented on FLINK-1502:


[~mxm] I just meant to aggregate metrics from multiple TaskManagers running on 
the same node (not aggregating metrics from all TaskManagers)
Please refer to the below comment for the new plan :-)

> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Dongwon Kim
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-02-17 Thread Dongwon Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151415#comment-15151415
 ] 

Dongwon Kim commented on FLINK-1502:


To [~jgrier],

Okay, let's count out such additional calculation before Flink reports metrics 
to Ganglia/Graphite.
One thing I'm wondering about is whether Ganglia already have this kind of 
functionality builtin.


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Dongwon Kim
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-02-17 Thread Dongwon Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151346#comment-15151346
 ] 

Dongwon Kim commented on FLINK-1502:


To [~StephanEwen], [~mxm], [~jgrier], 
First of all, sorry for the late response.

We just need to make each TaskManager report its metrics to 
JMX/Ganglia/Graphite as you guys suggested.

To [~mxm], 
the problem mainly comes from such a design is that a newly launched 
TaskManager is given a randomly generated UUID and it will create too many 
Ganglia metrics as [~jgrier] mentioned above.
I think [~jgrier]'s solution is quite simple yet viable:

cluster..taskmanager.1.gc_time
cluster..taskmanager.2.gc_time

To that end, we need to open a new issue to assign such IDs to TaskManagers 
running on the same host.
One concern is that. despite only one TaskManager running each node, we need to 
do such numbering (e.g. .taskmanager.1.gc_time).
I'm okay with it but users could think that the numbering is quite ugly.

How do you guys think?

> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Dongwon Kim
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-02-17 Thread Jamie Grier (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151004#comment-15151004
 ] 

Jamie Grier commented on FLINK-1502:


I understand [~eastcirclek]'s points about using the InstanceID.  This is a 
unique ID that is automatically generated (I believe).  As such if you use it 
to namespace the metrics you will see new metrics names whenever new 
TaskMangers are created.  Overtime this means the total # of metrics will grow 
and grow.  From my experience it would be better to have a "logical" ID for 
each TaskManager in the cluster.  Literally like (1, 2, 3, 4, etc) and use this 
value to namespace the metrics.  This will provide better continuity over time 
as TaskManagers come up and down.  However, I don't know if this concept 
actually exists inside Flink at the moment.  Does it?

I would suggest we use logical ids/indexes for TaskManager level metrics, as 
well as task level metrics, etc, as opposed to UUIDs.

So rather than:

taskmanager..gc_time
taskmanager..gc_time

and

task..flatMap.messagesReceived
task..flatMap.messagesReceived

I would suggest something like

cluster..taskmanager.1.gc_time
cluster..taskmanager.2.gc_time

and

cluster..task.1.flatMap.messagesReceived
cluster..task.2.flatMap.messagesReceived

I hope that makes sense.  The main point is to use Logical ID's wherever 
possible, especially for things that change otherwise there will be a lack of 
continuity in the metrics.  Also I don't know that we actually have the 
CLUSTER_NAME concept right now either but we might need this.  This would be 
unique for any given YarnSession if running on YARN for example.  Basically we 
just need some way to group a set of TaskManagers uniquely.  I guess this could 
also be done by using the UUID of the JobManager.

Comments?

> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Dongwon Kim
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-02-17 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150239#comment-15150239
 ] 

Stephan Ewen commented on FLINK-1502:
-

I agree with [~mxm], not all metrics should go to the JobManager. General 
TaskManager metrics (JVM, memory, GC, ...) should be picked up from the 
TaskManager directly.

Some job-specific metrics should go to the JobManager (like for example 
numRecords/bytes, size of state, number of spilled bytes, ...) for aggregation 
and display in the web frontend.

Would be nice if one could define for a metrics group whether it should be 
reported to the JobManager or not.

> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Dongwon Kim
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-02-17 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150166#comment-15150166
 ] 

Maximilian Michels commented on FLINK-1502:
---

Hi [~eastcirclek]! In addition to what [~jgrier] wrote:

The cluster setup should be the normal use case for Ganglia/JMX monitoring. 
When multiple task managers run in the same JVM or on the same machine, the 
reporting should work similar. Just namespace the task manager metrics. Task 
managers already have an {{InstanceID}} which uniquely identifies it. On 
clusters you normal wouldn't run multiple instances on a machine and its ok if 
the output on Ganglia is not optimal then.

The metrics shouldn't go through the JobManager to be reported. Actually, they 
already go there but just for displaying them in the web interface. However, 
cluster tools should monitor processes directly at the nodes. 

Metrics shouldn't be aggregated or combined. Users should be able to monitor 
nodes and also identify differences in resource consumption.

> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Dongwon Kim
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-02-16 Thread Jamie Grier (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149109#comment-15149109
 ] 

Jamie Grier commented on FLINK-1502:


Is there no way to refer to a TaskManager by index in order to solve this 
problem.  It would be nice if we didn't have to send all the metrics through 
the JobManager but rather just report them via JMX locally on each host.  I 
think I understand the problem you are describing but would just having a 
logical index for each TaskManager solve this problem.  I would like to avoid 
having to send the metrics through a central node if possible as I would like 
to see the # of total metrics go up dramatically as we instrument the code more 
and more.

> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Dongwon Kim
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-02-16 Thread Jamie Grier (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149099#comment-15149099
 ] 

Jamie Grier commented on FLINK-1502:


[~eastcirclek] You shouldn't need to do this with counters.  Typically you just 
want to report the value of the counter as is to the metrics system.  The 
metrics system (e.g. Graphite or Ganglia) should have built-in tools for 
turning counters into other types of graphs.  For example, what you really want 
here is a "rate", how many GC invocations per second for example (1st 
derivative of counter).  Ganglia and any decent metrics tools should already 
have this function builtin.  I think we should just report the raw counters.

> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Dongwon Kim
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-02-11 Thread Dongwon Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144128#comment-15144128
 ] 

Dongwon Kim commented on FLINK-1502:


Before deciding the design, we should take into consideration an environment in 
which a user can launch multiple TaskManager instances on a single machine 
(this is my local development environment) while Ganglia is usually setup to 
run a single monitoring daemon on each machine. This could be a common case 
sooner or later when Flink is capable of dynamic runtime scaling under YARN or 
MESOS (Spark already supports dynamic runtime scaling by executing multiple 
smaller executors per node and killing some of them when underloaded). 

What could be a problem in such an environment is that, if each of two 
TaskManagers running on a cluster node reports to Ganglia its metrics as if it 
is an only Flink daemon solely running on the node, Ganglia shows two different 
metrics in a single graph without aggregating them. The graph could be sawtooth 
shaped in my experience. A workaround could distinguish metrics from two 
TaskManagers by appending TaskManager IDs to the name of each metric when 
reporting to Ganglia. The workaround, however, will generate too many Ganglia 
metrics (also RRD files each corresponding to a Ganglia metric) in the Ganglia 
master node because TaskManagers are given a randomly generated ID whenever 
newly launched.

That being said, I design a initial plan as follows:
- JobManager takes responsibility for reporting TaskManager's metrics to 
Ganglia/Graphite. Note that TaskManagers already send metrics through heartbeat 
messages to JobManager. 
- I want JobManager to aggregate metrics from TaskManagers running on the same 
node. I'm not sure whether this decision is good enough because different 
TaskManagers running on the same node could exhibit different runtime behaviors.
- After aggregating values of a metric from different TaskManagers running on a 
cluster node, JobManager reports to Ganglia the aggregated value of the metric 
with the hostname. 
- By doing that, Ganglia will end up with having a single Ganglia metric.

> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Dongwon Kim
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-02-11 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142710#comment-15142710
 ] 

Maximilian Michels commented on FLINK-1502:
---

[~eastcirclek] Yes, I was working on the aforementioned issue a while ago. Feel 
free to take over. In FLINK-3170 I wanted to expose metrics also at the job 
manager which is not necessary at all. Your approach looks feasible. Sorry 
about my comment as it was about exposing accumulators. This would be a next 
step after the metric reporting is done.

> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Dongwon Kim
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-02-11 Thread Dongwon Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142628#comment-15142628
 ] 

Dongwon Kim commented on FLINK-1502:


[~mxm] Could you elaborate more on the above explanation? is it about exposing 
accumulators along with metrics? :-)

[~StephanEwen] Thank you! I'm trying to figure out this issue ;-) I'm also 
looking at how Samza as you said. 

I plan to use https://github.com/dropwizard/metrics as this library already 
supports various reporters such as ConsoleReporter, JmxReporter, 
GangliaReporter, and GraphiteReporter.
It looks desirable as TaskManager already uses the metrics library to create 
the registry of metrics and sends them to JobManager. 
For monitoring system like Ganglia, I plan to take an approach shown in my blog 
post:
http://eastcirclek.blogspot.kr/2015/10/Collecting-JVM-metrics-to-Ganglia-using-io.dropwizard.metrics.html
Metrics like GC counts from JMX are ever growing so Ganglia graphs for such 
metrics will have a very long y-axis when task managers are running for few 
days.
For the reason, I plan to report to Ganglia only difference from the previously 
known value of such metrics.

> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Dongwon Kim
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-02-08 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136704#comment-15136704
 ] 

Maximilian Michels commented on FLINK-1502:
---

I was working on this in FLINK-3170 but priorities have shifted a bit so I 
haven't completed the work yet.

After I tied the initial collection of metrics to the existing runtime, I 
realized that it would be better to build an abstraction for publishing the 
metrics. What I did is to replace the accumulator {{HashMap}}s with a custom 
{{TaskAccumulator}} type. In the runtime implementation, the actual 
implementation can trigger publishing of the metrics during runtime. It would 
suffice to register the accumulators once and then have them pulled in by the 
BeanServer of the JVM.

This approach wouldn't touch too many runtime classes or introduce an extra 
synchronization between the runtime thread and a metrics thread. All 
non-job-related metrics which are published through the task managers (and 
heartbeated to the job manager), can be exposed much more easily.  

> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-02-08 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137435#comment-15137435
 ] 

Stephan Ewen commented on FLINK-1502:
-

[~eastcirclek] I gave you contributor permissions in the Flink JIRA, so you can 
assign issues to yourself if you want to work on them.

> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-02-08 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137439#comment-15137439
 ] 

Stephan Ewen commented on FLINK-1502:
-

Quick comment: I recently had a look at how Samza collects metrics and it 
looked very nice.
There is a factory, groups/namespaces and then individual metric objects. We 
could think of doing something similar. 

> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-02-07 Thread Dongwon Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136264#comment-15136264
 ] 

Dongwon Kim commented on FLINK-1502:


Anyone making progress in this issue?

I found that Hadoop's metrics2 library in hadoop-common could make things 
simple: http://wiki.apache.org/hadoop/HADOOP-6728-MetricsV2
The metrics2 has a timer thread to poll metrics from sources, e.g., 
Namenode/JobTracker/MapTask/ReduceTask
and multiple threads to send metrics to different sinks like Ganglia and 
Graphite.
If Flink uses the metrics2 library, we just need to make some classes implement 
the MetricsSource interface and register the sources to the MetricsSystem.

> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2015-02-10 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314262#comment-14314262
 ] 

Robert Metzger commented on FLINK-1502:
---

No. This task depends on FLINK-1501.
But I see FLINK-1501 and FLINK-1502 both as subtasks of the FLINK-456.

 Expose metrics to graphite, ganglia and JMX.
 

 Key: FLINK-1502
 URL: https://issues.apache.org/jira/browse/FLINK-1502
 Project: Flink
  Issue Type: Sub-task
  Components: JobManager, TaskManager
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Robert Metzger
Priority: Minor
 Fix For: pre-apache


 The metrics library allows to expose collected metrics easily to other 
 systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2015-02-09 Thread Henry Saputra (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312849#comment-14312849
 ] 

Henry Saputra commented on FLINK-1502:
--

Should this be subtask for FLINK-1501 ?

 Expose metrics to graphite, ganglia and JMX.
 

 Key: FLINK-1502
 URL: https://issues.apache.org/jira/browse/FLINK-1502
 Project: Flink
  Issue Type: Sub-task
  Components: JobManager, TaskManager
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Robert Metzger
Priority: Minor
 Fix For: pre-apache


 The metrics library allows to expose collected metrics easily to other 
 systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)