Hi Grant, and all, Thanks for sharing the data point — cardinality from per-table attributes is exactly the kind of real-world failure mode the design should account for, and your experience is fair.
I pushed commit 5d867e49d to #16250 that addresses this by making the attribute set configurable and giving users more control over cardinality. A new catalog property iceberg.otel.metrics.attributes accepts a comma-separated allowlist of attribute short names (table-name, schema-id, operation). Attributes whose short names are not listed are omitted from emitted metric points. The default attribute set is table-name and operation; schema-id is opt-in. Workloads with thousands of tables can flip table-name off and keep operation-level aggregates when it is preferred. For users who want to keep iceberg.table.name but only for a subset of tables, I've also filed #16573 to propose a framework-level table-name filter that would apply uniformly across all MetricsReporter implementations — complementary to the per-reporter attribute pruning above. This would also address your concern. On the span-based reporter suggestion: I took some time to think through whether it makes sense to layer that into this PR or as a sibling reporter alongside OtelMetricsReporter. I'd like to defer it, mainly because emitting OpenTelemetry spans through the MetricsReporter callback feels semantically off — MetricsReporter fires after the operation has finished, so the reporter would have to synthesize spans retroactively from the report's duration rather than open and close them at the real operation boundaries, and the class name MetricsReporter emitting traces is itself a friction point. The natural home for span-based observability is probably an Iceberg-side instrumentation hook in the scan planner / commit code paths that opens spans at the real boundaries, which is a larger design discussion that I'd want to handle as a separate Issue / PR rather than bolting onto this one. For #16250 specifically, my preference is to keep it as a metrics-only reporter with the control above. Thanks, Nori On Tue, May 26, 2026 at 1:14 AM Grant Nicholas < [email protected]> wrote: > +1 with OTEL implementation of MetricsReporter, but have you considered a > span-based implementation instead of/in addition to a metrics-based > implementation? > > High cardinality metrics should be avoided and (schema_name, > table_name) attributes can be high cardinality depending on your workload. > Spans do not have problems with high cardinality. > > For context, we built a metrics-based MetricsReporter, ran into high > cardinality cost issues with thousands of tables, then switched to a > span-based MetricsReporter. > > On Mon, May 25, 2026 at 2:08 AM Noritaka Sekiyama via dev < > [email protected]> wrote: > >> Hi JB, and all, >> >> Thanks for the suggestion. Pushed efc48d429 which adds an >> OtelMetricsReporter section to docs/docs/metrics-reporting.md. It documents >> the host's responsibility for packaging the OpenTelemetry API, SDK, and a >> metric exporter (Gradle plus a spark-submit --packages example), the >> programmatic SDK registration path, exporter-wiring examples for the >> OpenTelemetry Collector, Prometheus (pull and push), and Amazon CloudWatch >> via the sigv4auth Collector extension, plus the emitted metric names and >> attribute set. >> >> Verified end-to-end against the Prometheus pull pattern from the docs >> (host SDK with PrometheusHttpServer + OtelMetricsReporter reporting >> synthetic ScanReport/CommitReport, all 12 iceberg.* series visible on >> /metrics with the documented attribute set); each Collector YAML in the >> docs was otelcol-contrib validate-checked. >> >> Looking forward to your PR review. >> >> Thanks, >> Nori >> >> On Mon, May 25, 2026 at 3:00 PM Jean-Baptiste Onofré <[email protected]> >> wrote: >> >>> Hi, >>> >>> I think this is a great proposal. >>> >>> I would suggest documenting how to package the exporter, as I believe it >>> is up to the user to package the specific OpenTelemetry exporter they need. >>> >>> I will take a look at the PR. >>> >>> Regards, >>> JB >>> >>> On Thu, May 21, 2026 at 3:39 AM Noritaka Sekiyama via dev < >>> [email protected]> wrote: >>> >>>> Hi all, >>>> >>>> I'd like to propose adding an OpenTelemetry-based MetricsReporter to >>>> iceberg-core that exports ScanReport and CommitReport to any >>>> OTLP-compatible >>>> backend. >>>> >>>> # Background >>>> Iceberg ships three built-in MetricsReporter implementations today: >>>> LoggingMetricsReporter, InMemoryMetricsReporter (Spark-internal), and >>>> RESTMetricsReporter (REST catalog only). >>>> None of them give users an out-of-the-box way to ship scan/commit >>>> metrics to an external observability platform. >>>> The gap applies to Spark users on non-REST catalogs and to all >>>> non-Spark engines (Trino, Flink, etc.). >>>> >>>> # Motivation >>>> OpenTelemetry is the vendor-neutral CNCF standard for telemetry, >>>> supported by every major observability backend (Prometheus, CloudWatch, >>>> Datadog, Grafana Cloud, etc.). >>>> A single OTLP-based MetricsReporter in Iceberg lets users reach all of >>>> these without per-vendor integrations in the project. >>>> This is complementary to #14360, which adds OTel support to HTTPClient >>>> at the REST-catalog HTTP layer; this proposal covers the Iceberg-level >>>> ScanReport / CommitReport layer. >>>> >>>> # Proposal >>>> Issue: https://github.com/apache/iceberg/issues/16169 >>>> PR: https://github.com/apache/iceberg/pull/16250 >>>> >>>> The reporter follows the same SDK-ownership philosophy as #14360 - the >>>> host application (Spark/Flink/Trino/...) registers an OpenTelemetrySdk via >>>> GlobalOpenTelemetry, and the reporter just looks up a Meter from it. >>>> The reporter has zero Iceberg-specific catalog properties; everything >>>> else is owned by the host. >>>> >>>> The PR has been validated end-to-end against two unrelated OTLP >>>> backends (Databricks Zerobus and Amazon CloudWatch) - full procedures and >>>> queries are linked from the PR. >>>> >>>> # On dependencies >>>> Given the current sensitivity around new runtime dependencies in 1.11, >>>> the PR adds only opentelemetry-api to iceberg-core as compileOnly. >>>> The OpenTelemetry SDK and OTLP exporters are not added to the runtime >>>> classpath >>>> - they come from the host application. >>>> opentelemetry-sdk / -sdk-testing are testImplementation only. >>>> >>>> # Questions for the community >>>> >>>> Q1. Any objection to taking the opentelemetry-api compileOnly >>>> dependency in iceberg-core? >>>> Q2. Module placement: iceberg-core (current PR), or a >>>> separate iceberg-opentelemetry module? >>>> >>>> Thanks, >>>> Noritaka Sekiyama, Databricks >>>> >>>
