Re: Flink Graphire Reporter stops reporting via TCP if network issue

2017-05-05 Thread Bruno Aranda
Hi Chesnay,

Many thanks for your reply. At the end, we have decided to change the
infrastructure a bit and use StatD instead. This way, we don't need a
custom reporter and it works fine.

Thanks!

Bruno

On Fri, 5 May 2017 at 13:20 Chesnay Schepler  wrote:

> Hello,
>
> for Graphite, Flink uses the DropWizard metrics reporter. I don't know
> at the moment whether it supports any kind of reconnecting functionality.
>
> I'm not sure whether i understood you correctly; did you try upgrading
> the DropWizard metrics-core/metrics-graphite dependencies?
>
> If that didn't do the trick we could in fact implement this in Flink, it
> would be hack though. When an error occurs we can simply re-instantiate
> the reporter, but we would have to know how the reporter communicates
> the connection drop; i.e. whether it throws some exception or not.
>
> Could you check the log for a warning statements from the MetricRegistry?
>
> Regards,
> Chesnay
>
> On 05.05.2017 13:26, Bruno Aranda wrote:
> > Hi,
> >
> > We are using the Graphite reporter from Flink 1.2.0 to send the
> > metrics via TCP. Due to our network configuration we cannot use UDP at
> > the moment.
> >
> > We have observed that if there is any problem with graphite our the
> > network, basically, the TCP connection times out or something, the
> > metrics reporter does not recover. This is easy to reproduce by
> > blocking the port we are sending the metrics using iptables. If we
> > block the port for more than a minute or so, the problem will happen.
> > After the port is re-open, Flink does not continue like before.
> >
> > Is this a known issue? Googling shows some problems with the
> > metrics-graphite package that should have been solved already. We have
> > trying updated metrics-core/graphite to the latest with no success.
> >
> > Any ideas?
> >
> > Thanks!
> >
> > Bruno
>
>
>


Re: Flink Graphire Reporter stops reporting via TCP if network issue

2017-05-05 Thread Chesnay Schepler

Hello,

for Graphite, Flink uses the DropWizard metrics reporter. I don't know 
at the moment whether it supports any kind of reconnecting functionality.


I'm not sure whether i understood you correctly; did you try upgrading 
the DropWizard metrics-core/metrics-graphite dependencies?


If that didn't do the trick we could in fact implement this in Flink, it 
would be hack though. When an error occurs we can simply re-instantiate 
the reporter, but we would have to know how the reporter communicates 
the connection drop; i.e. whether it throws some exception or not.


Could you check the log for a warning statements from the MetricRegistry?

Regards,
Chesnay

On 05.05.2017 13:26, Bruno Aranda wrote:

Hi,

We are using the Graphite reporter from Flink 1.2.0 to send the 
metrics via TCP. Due to our network configuration we cannot use UDP at 
the moment.


We have observed that if there is any problem with graphite our the 
network, basically, the TCP connection times out or something, the 
metrics reporter does not recover. This is easy to reproduce by 
blocking the port we are sending the metrics using iptables. If we 
block the port for more than a minute or so, the problem will happen. 
After the port is re-open, Flink does not continue like before.


Is this a known issue? Googling shows some problems with the 
metrics-graphite package that should have been solved already. We have 
trying updated metrics-core/graphite to the latest with no success.


Any ideas?

Thanks!

Bruno





Flink Graphire Reporter stops reporting via TCP if network issue

2017-05-05 Thread Bruno Aranda
Hi,

We are using the Graphite reporter from Flink 1.2.0 to send the metrics via
TCP. Due to our network configuration we cannot use UDP at the moment.

We have observed that if there is any problem with graphite our the
network, basically, the TCP connection times out or something, the metrics
reporter does not recover. This is easy to reproduce by blocking the port
we are sending the metrics using iptables. If we block the port for more
than a minute or so, the problem will happen. After the port is re-open,
Flink does not continue like before.

Is this a known issue? Googling shows some problems with the
metrics-graphite package that should have been solved already. We have
trying updated metrics-core/graphite to the latest with no success.

Any ideas?

Thanks!

Bruno