[ 
https://issues.apache.org/jira/browse/NIFI-10666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

René Zeidler updated NIFI-10666:
--------------------------------
    Affects Version/s: 2.0.0-M2
                       1.25.0
          Description: 
We have created a default PrometheusReportingTask for our NiFi instance and 
tried to consume the metrics with Prometheus. However, Prometheus threw the 
following error:
{code:java}
ts=2022-10-19T12:25:18.110Z caller=scrape.go:1332 level=debug component="scrape 
manager" scrape_pool=nifi-cluster target=http://***nifi***:9092/metrics 
msg="Append failed" err="invalid UTF-8 label value" {code}
Upon further inspection, we noticed that the /metrics/ endpoint exposed by the 
reporting task does not use UTF-8 encoding, which is required by Prometheus (as 
documented here: [Exposition formats | 
Prometheus|https://prometheus.io/docs/instrumenting/exposition_formats/]).

Our flow uses non-ASCII characters (in our case German umlauts like "ü"). As a 
workaround, removing those characters fixes the Prometheus error, but this is 
not practical for a large flow in German language.

Opening the /metrics/ endpoint in a browser confirms that the encoding used is 
not UTF-8:
{code:java}
> document.characterSet
'windows-1252' {code}
----
The responsible code might be here:

[https://github.com/apache/nifi/blob/2be5c26f287469f4f19f0fa759d6c1b56dc0e348/nifi-nar-bundles/nifi-prometheus-bundle/nifi-prometheus-reporting-task/src/main/java/org/apache/nifi/reporting/prometheus/PrometheusServer.java#L67]

The PrometheusServer used by the reporting task uses an OutputStreamWriter with 
the default encoding, instead of explicitly using UTF-8. The Content-Type 
header set in that function also does not get passed along (see screenshot).

  was:
We have created a default PrometheusReportingTask for our NiFi instance and 
tried to consume the metrics with Prometheus. However, Prometheus threw the 
following error:
{code:java}
ts=2022-10-19T12:25:18.110Z caller=scrape.go:1332 level=debug component="scrape 
manager" scrape_pool=nifi-cluster target=http://***nifi***:9092/metrics 
msg="Append failed" err="invalid UTF-8 label value" {code}
Upon further inspection, we noticed that the /metrics/ endpoint exposed by the 
reporting task does not use UTF-8 encoding, which is required by Prometheus.

Our flow uses non-ASCII characters (in our case German umlauts like "ü"). As a 
workaround, removing those characters fixes the Prometheus error, but this is 
not practical for a large flow in German language.

Opening the /metrics/ endpoint in a browser confirms that the encoding used is 
not UTF-8:
{code:java}
> document.characterSet
'windows-1252' {code}
----
The responsible code might be here:

[https://github.com/apache/nifi/blob/2be5c26f287469f4f19f0fa759d6c1b56dc0e348/nifi-nar-bundles/nifi-prometheus-bundle/nifi-prometheus-reporting-task/src/main/java/org/apache/nifi/reporting/prometheus/PrometheusServer.java#L67]

The PrometheusServer used by the reporting task uses an OutputStreamWriter with 
the default encoding, instead of explicitly using UTF-8. The Content-Type 
header set in that function also does not get passed along (see screenshot).

          Environment: JVM with non-UTF-8 default encoding (e.g. default 
Windows installation)  (was: Windows Server 2019 Version 1809
)
               Labels: encoding prometheus utf-8  (was: )

The issue still persist in current versions 1.25.0 and 2.0.0-M2.

I have confirmed that the issue is indeed caused by a non-UTF-8 default 
encoding set in the JVM, like the issues NIFI-12669 and NIFI-12670. This 
includes all standard Windows installations, which do not use UTF-8 as the 
default encoding.

Promtheus requires UTF-8 encoding (as documented here: [Exposition formats | 
Prometheus|https://prometheus.io/docs/instrumenting/exposition_formats/]), so 
the encoding used for this endpoint should not depend on the system default 
encoding.

> PrometheusReportingTask does not use UTF-8 encoding on /metrics/ endpoint
> -------------------------------------------------------------------------
>
>                 Key: NIFI-10666
>                 URL: https://issues.apache.org/jira/browse/NIFI-10666
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: 1.17.0, 1.16.3, 1.18.0, 1.23.2, 1.25.0, 2.0.0-M2
>         Environment: JVM with non-UTF-8 default encoding (e.g. default 
> Windows installation)
>            Reporter: René Zeidler
>            Priority: Minor
>              Labels: encoding, prometheus, utf-8
>         Attachments: missing-header.png
>
>
> We have created a default PrometheusReportingTask for our NiFi instance and 
> tried to consume the metrics with Prometheus. However, Prometheus threw the 
> following error:
> {code:java}
> ts=2022-10-19T12:25:18.110Z caller=scrape.go:1332 level=debug 
> component="scrape manager" scrape_pool=nifi-cluster 
> target=http://***nifi***:9092/metrics msg="Append failed" err="invalid UTF-8 
> label value" {code}
> Upon further inspection, we noticed that the /metrics/ endpoint exposed by 
> the reporting task does not use UTF-8 encoding, which is required by 
> Prometheus (as documented here: [Exposition formats | 
> Prometheus|https://prometheus.io/docs/instrumenting/exposition_formats/]).
> Our flow uses non-ASCII characters (in our case German umlauts like "ü"). As 
> a workaround, removing those characters fixes the Prometheus error, but this 
> is not practical for a large flow in German language.
> Opening the /metrics/ endpoint in a browser confirms that the encoding used 
> is not UTF-8:
> {code:java}
> > document.characterSet
> 'windows-1252' {code}
> ----
> The responsible code might be here:
> [https://github.com/apache/nifi/blob/2be5c26f287469f4f19f0fa759d6c1b56dc0e348/nifi-nar-bundles/nifi-prometheus-bundle/nifi-prometheus-reporting-task/src/main/java/org/apache/nifi/reporting/prometheus/PrometheusServer.java#L67]
> The PrometheusServer used by the reporting task uses an OutputStreamWriter 
> with the default encoding, instead of explicitly using UTF-8. The 
> Content-Type header set in that function also does not get passed along (see 
> screenshot).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to