Re: Custom Prometheus metrics disappeared in 1.16.2 => 1.17.1 upgrade

2023-12-04 Thread Javier Vegas
Reason is simple, I migrated to Flink a project that already had
Prometheus metrics integrated.

Thanks,

Javier

El mar, 3 oct 2023 a las 15:51, Mason Chen () escribió:
>
> Hi Javier,
>
> Is there a particular reason why you aren't leveraging Flink metric API? It 
> seems that functionality was internal to the PrometheusReporter 
> implementation and your usecase should've continued working if it had 
> depended on Flink's  metric API.
>
> Best,
> Mason
>
> On Thu, Sep 28, 2023 at 2:51 AM Javier Vegas  wrote:
>>
>> Thanks! I saw the first change but missed the third one, that is the
>> most that most probably explains my problem, most probably the metrics
>> I was sending with the twitter/finagle statsReceiver ended up in the
>> singleton default registry and were exposed by Flink with all the
>> other Flink metrics, but now that Flink uses its own registry I have
>> no idea where my custom metrics end up
>>
>>
>> El mié, 27 sept 2023 a las 4:56, Kenan Kılıçtepe
>> () escribió:
>> >
>> > Have you checked the metric  changes in 1.17.
>> >
>> > From release notes 1.17:
>> > https://nightlies.apache.org/flink/flink-docs-master/release-notes/flink-1.17/
>> >
>> > Metric Reporters #
>> > Only support reporter factories for instantiation #
>> > FLINK-24235 #
>> > Configuring reporters by their class is no longer supported. Reporter 
>> > implementations must provide a MetricReporterFactory, and all 
>> > configurations must be migrated to such a factory.
>> >
>> > UseLogicalIdentifier makes datadog consider metric as custom #
>> > FLINK-30383 #
>> > The Datadog reporter now adds a “flink.” prefix to metric identifiers if 
>> > “useLogicalIdentifier” is enabled. This is required for these metrics to 
>> > be recognized as Flink metrics, not custom ones.
>> >
>> > Use separate Prometheus CollectorRegistries #
>> > FLINK-30020 #
>> > The PrometheusReporters now use a separate CollectorRegistry for each 
>> > reporter instance instead of the singleton default registry. This 
>> > generally shouldn’t impact setups, but it may break code that indirectly 
>> > interacts with the reporter via the singleton instance (e.g., a test 
>> > trying to assert what metrics are reported).
>> >
>> >
>> >
>> > On Wed, Sep 27, 2023 at 11:11 AM Javier Vegas  wrote:
>> >>
>> >> I implemented some custom Prometheus metrics that were working on
>> >> 1.16.2, with my configuration
>> >>
>> >> metrics.reporter.prom.factory.class:
>> >> org.apache.flink.metrics.prometheus.PrometheusReporterFactory
>> >> metrics.reporter.prom.port: 
>> >>
>> >> I could see both Flink metrics and my custom metrics on port  of
>> >> my task managers
>> >>
>> >> After upgrading to 1.17.1, using the same configuration, I can see
>> >> only the FLink metrics on port  of the task managers,
>> >> the custom metrics are getting lost somewhere.
>> >>
>> >> The release notes for 1.17 mention
>> >> https://issues.apache.org/jira/browse/FLINK-24235
>> >> that removes instantiating reporters by name and forces using a
>> >> factory, which I was already doing in 1.16.2. Do I need to do
>> >> anything extra after those changes so my metrics are aggregated with
>> >> the Flink ones?
>> >>
>> >> I am also seeing this error message on application startup (which I
>> >> was already seeing in 1.16.2): "Multiple implementations of the same
>> >> reporter were found in 'lib' and/or 'plugins' directories for
>> >> org.apache.flink.metrics.prometheus.PrometheusReporterFactory. It is
>> >> recommended to remove redundant reporter JARs to resolve used
>> >> versions' ambiguity." Could that also explain the missing metrics?
>> >>
>> >> Thanks,
>> >>
>> >> Javier Vegas


Re: Custom Prometheus metrics disappeared in 1.16.2 => 1.17.1 upgrade

2023-10-03 Thread Mason Chen
Hi Javier,

Is there a particular reason why you aren't leveraging Flink metric API? It
seems that functionality was internal to the PrometheusReporter
implementation and your usecase should've continued working if it had
depended on Flink's  metric API.

Best,
Mason

On Thu, Sep 28, 2023 at 2:51 AM Javier Vegas  wrote:

> Thanks! I saw the first change but missed the third one, that is the
> most that most probably explains my problem, most probably the metrics
> I was sending with the twitter/finagle statsReceiver ended up in the
> singleton default registry and were exposed by Flink with all the
> other Flink metrics, but now that Flink uses its own registry I have
> no idea where my custom metrics end up
>
>
> El mié, 27 sept 2023 a las 4:56, Kenan Kılıçtepe
> () escribió:
> >
> > Have you checked the metric  changes in 1.17.
> >
> > From release notes 1.17:
> >
> https://nightlies.apache.org/flink/flink-docs-master/release-notes/flink-1.17/
> >
> > Metric Reporters #
> > Only support reporter factories for instantiation #
> > FLINK-24235 #
> > Configuring reporters by their class is no longer supported. Reporter
> implementations must provide a MetricReporterFactory, and all
> configurations must be migrated to such a factory.
> >
> > UseLogicalIdentifier makes datadog consider metric as custom #
> > FLINK-30383 #
> > The Datadog reporter now adds a “flink.” prefix to metric identifiers if
> “useLogicalIdentifier” is enabled. This is required for these metrics to be
> recognized as Flink metrics, not custom ones.
> >
> > Use separate Prometheus CollectorRegistries #
> > FLINK-30020 #
> > The PrometheusReporters now use a separate CollectorRegistry for each
> reporter instance instead of the singleton default registry. This generally
> shouldn’t impact setups, but it may break code that indirectly interacts
> with the reporter via the singleton instance (e.g., a test trying to assert
> what metrics are reported).
> >
> >
> >
> > On Wed, Sep 27, 2023 at 11:11 AM Javier Vegas  wrote:
> >>
> >> I implemented some custom Prometheus metrics that were working on
> >> 1.16.2, with my configuration
> >>
> >> metrics.reporter.prom.factory.class:
> >> org.apache.flink.metrics.prometheus.PrometheusReporterFactory
> >> metrics.reporter.prom.port: 
> >>
> >> I could see both Flink metrics and my custom metrics on port  of
> >> my task managers
> >>
> >> After upgrading to 1.17.1, using the same configuration, I can see
> >> only the FLink metrics on port  of the task managers,
> >> the custom metrics are getting lost somewhere.
> >>
> >> The release notes for 1.17 mention
> >> https://issues.apache.org/jira/browse/FLINK-24235
> >> that removes instantiating reporters by name and forces using a
> >> factory, which I was already doing in 1.16.2. Do I need to do
> >> anything extra after those changes so my metrics are aggregated with
> >> the Flink ones?
> >>
> >> I am also seeing this error message on application startup (which I
> >> was already seeing in 1.16.2): "Multiple implementations of the same
> >> reporter were found in 'lib' and/or 'plugins' directories for
> >> org.apache.flink.metrics.prometheus.PrometheusReporterFactory. It is
> >> recommended to remove redundant reporter JARs to resolve used
> >> versions' ambiguity." Could that also explain the missing metrics?
> >>
> >> Thanks,
> >>
> >> Javier Vegas
>


Re: Custom Prometheus metrics disappeared in 1.16.2 => 1.17.1 upgrade

2023-09-28 Thread Javier Vegas
Thanks! I saw the first change but missed the third one, that is the
most that most probably explains my problem, most probably the metrics
I was sending with the twitter/finagle statsReceiver ended up in the
singleton default registry and were exposed by Flink with all the
other Flink metrics, but now that Flink uses its own registry I have
no idea where my custom metrics end up


El mié, 27 sept 2023 a las 4:56, Kenan Kılıçtepe
() escribió:
>
> Have you checked the metric  changes in 1.17.
>
> From release notes 1.17:
> https://nightlies.apache.org/flink/flink-docs-master/release-notes/flink-1.17/
>
> Metric Reporters #
> Only support reporter factories for instantiation #
> FLINK-24235 #
> Configuring reporters by their class is no longer supported. Reporter 
> implementations must provide a MetricReporterFactory, and all configurations 
> must be migrated to such a factory.
>
> UseLogicalIdentifier makes datadog consider metric as custom #
> FLINK-30383 #
> The Datadog reporter now adds a “flink.” prefix to metric identifiers if 
> “useLogicalIdentifier” is enabled. This is required for these metrics to be 
> recognized as Flink metrics, not custom ones.
>
> Use separate Prometheus CollectorRegistries #
> FLINK-30020 #
> The PrometheusReporters now use a separate CollectorRegistry for each 
> reporter instance instead of the singleton default registry. This generally 
> shouldn’t impact setups, but it may break code that indirectly interacts with 
> the reporter via the singleton instance (e.g., a test trying to assert what 
> metrics are reported).
>
>
>
> On Wed, Sep 27, 2023 at 11:11 AM Javier Vegas  wrote:
>>
>> I implemented some custom Prometheus metrics that were working on
>> 1.16.2, with my configuration
>>
>> metrics.reporter.prom.factory.class:
>> org.apache.flink.metrics.prometheus.PrometheusReporterFactory
>> metrics.reporter.prom.port: 
>>
>> I could see both Flink metrics and my custom metrics on port  of
>> my task managers
>>
>> After upgrading to 1.17.1, using the same configuration, I can see
>> only the FLink metrics on port  of the task managers,
>> the custom metrics are getting lost somewhere.
>>
>> The release notes for 1.17 mention
>> https://issues.apache.org/jira/browse/FLINK-24235
>> that removes instantiating reporters by name and forces using a
>> factory, which I was already doing in 1.16.2. Do I need to do
>> anything extra after those changes so my metrics are aggregated with
>> the Flink ones?
>>
>> I am also seeing this error message on application startup (which I
>> was already seeing in 1.16.2): "Multiple implementations of the same
>> reporter were found in 'lib' and/or 'plugins' directories for
>> org.apache.flink.metrics.prometheus.PrometheusReporterFactory. It is
>> recommended to remove redundant reporter JARs to resolve used
>> versions' ambiguity." Could that also explain the missing metrics?
>>
>> Thanks,
>>
>> Javier Vegas


Re: Custom Prometheus metrics disappeared in 1.16.2 => 1.17.1 upgrade

2023-09-27 Thread Kenan Kılıçtepe
Have you checked the metric  changes in 1.17.

>From release notes 1.17:
https://nightlies.apache.org/flink/flink-docs-master/release-notes/flink-1.17/

Metric Reporters #
Only support reporter factories for instantiation #
FLINK-24235 #
Configuring reporters by their class is no longer supported. Reporter
implementations must provide a MetricReporterFactory, and all
configurations must be migrated to such a factory.

UseLogicalIdentifier makes datadog consider metric as custom #
FLINK-30383 #
The Datadog reporter now adds a “flink.” prefix to metric identifiers if
“useLogicalIdentifier” is enabled. This is required for these metrics to be
recognized as Flink metrics, not custom ones.

Use separate Prometheus CollectorRegistries #
FLINK-30020 #
The PrometheusReporters now use a separate CollectorRegistry for each
reporter instance instead of the singleton default registry. This generally
shouldn’t impact setups, but it may break code that indirectly interacts
with the reporter via the singleton instance (e.g., a test trying to assert
what metrics are reported).



On Wed, Sep 27, 2023 at 11:11 AM Javier Vegas  wrote:

> I implemented some custom Prometheus metrics that were working on
> 1.16.2, with my configuration
>
> metrics.reporter.prom.factory.class:
> org.apache.flink.metrics.prometheus.PrometheusReporterFactory
> metrics.reporter.prom.port: 
>
> I could see both Flink metrics and my custom metrics on port  of
> my task managers
>
> After upgrading to 1.17.1, using the same configuration, I can see
> only the FLink metrics on port  of the task managers,
> the custom metrics are getting lost somewhere.
>
> The release notes for 1.17 mention
> https://issues.apache.org/jira/browse/FLINK-24235
> that removes instantiating reporters by name and forces using a
> factory, which I was already doing in 1.16.2. Do I need to do
> anything extra after those changes so my metrics are aggregated with
> the Flink ones?
>
> I am also seeing this error message on application startup (which I
> was already seeing in 1.16.2): "Multiple implementations of the same
> reporter were found in 'lib' and/or 'plugins' directories for
> org.apache.flink.metrics.prometheus.PrometheusReporterFactory. It is
> recommended to remove redundant reporter JARs to resolve used
> versions' ambiguity." Could that also explain the missing metrics?
>
> Thanks,
>
> Javier Vegas
>


Re: Custom Prometheus metrics disappeared in 1.16.2 => 1.17.1 upgrade

2023-09-27 Thread Javier Vegas
Some more details on my problem:

1. The "Multiple implementations" problem was because I had the
metrics-prometheus jar both in the plugins and lib directories. I
tried putting it in only one,
and in both cases (plugins or lib), the result was the same, I got
only Flink metrics on my prom port.
2. My application extends
https://github.com/twitter/twitter-server/blob/develop/server/src/main/scala/com/twitter/server/TwitterServer.scala
and I was sending
my custom stats via the statsReceiver provided there
https://github.com/twitter/twitter-server/blob/33b3fb00635c4ab1102eb0c062cde6bb132d80c0/server/src/main/scala/com/twitter/server/Stats.scala#L14
3. I realized that my reporter configuration was:

metrics.reporter.prom.class:
org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.factory.class:
org.apache.flink.metrics.prometheus.PrometheusReporterFactory
metrics.reporter.prom.port: 

So I guess in 1.16.2 the prometheus reporter could have been
instantiated by class name, and perhaps that somehow allowed my
metrics to be merged with the Flink
ones, but in 1.17.1 the reporter gets instantiated by the factory and
somehow that renders my metrics invisible. Do you have any suggestion
so my metrics work as in 1.16.2?

Thanks again, Javier Vegas


El mar, 26 sept 2023 a las 19:42, Javier Vegas () escribió:
>
> I implemented some custom Prometheus metrics that were working on
> 1.16.2, with my configuration
>
> metrics.reporter.prom.factory.class:
> org.apache.flink.metrics.prometheus.PrometheusReporterFactory
> metrics.reporter.prom.port: 
>
> I could see both Flink metrics and my custom metrics on port  of
> my task managers
>
> After upgrading to 1.17.1, using the same configuration, I can see
> only the FLink metrics on port  of the task managers,
> the custom metrics are getting lost somewhere.
>
> The release notes for 1.17 mention
> https://issues.apache.org/jira/browse/FLINK-24235
> that removes instantiating reporters by name and forces using a
> factory, which I was already doing in 1.16.2. Do I need to do
> anything extra after those changes so my metrics are aggregated with
> the Flink ones?
>
> I am also seeing this error message on application startup (which I
> was already seeing in 1.16.2): "Multiple implementations of the same
> reporter were found in 'lib' and/or 'plugins' directories for
> org.apache.flink.metrics.prometheus.PrometheusReporterFactory. It is
> recommended to remove redundant reporter JARs to resolve used
> versions' ambiguity." Could that also explain the missing metrics?
>
> Thanks,
>
> Javier Vegas


Custom Prometheus metrics disappeared in 1.16.2 => 1.17.1 upgrade

2023-09-26 Thread Javier Vegas
I implemented some custom Prometheus metrics that were working on
1.16.2, with my configuration

metrics.reporter.prom.factory.class:
org.apache.flink.metrics.prometheus.PrometheusReporterFactory
metrics.reporter.prom.port: 

I could see both Flink metrics and my custom metrics on port  of
my task managers

After upgrading to 1.17.1, using the same configuration, I can see
only the FLink metrics on port  of the task managers,
the custom metrics are getting lost somewhere.

The release notes for 1.17 mention
https://issues.apache.org/jira/browse/FLINK-24235
that removes instantiating reporters by name and forces using a
factory, which I was already doing in 1.16.2. Do I need to do
anything extra after those changes so my metrics are aggregated with
the Flink ones?

I am also seeing this error message on application startup (which I
was already seeing in 1.16.2): "Multiple implementations of the same
reporter were found in 'lib' and/or 'plugins' directories for
org.apache.flink.metrics.prometheus.PrometheusReporterFactory. It is
recommended to remove redundant reporter JARs to resolve used
versions' ambiguity." Could that also explain the missing metrics?

Thanks,

Javier Vegas