obelix74 opened a new pull request, #3348: URL: https://github.com/apache/polaris/pull/3348
<!-- ๐ Describe what changes you're proposing, especially breaking or user-facing changes. ๐ See https://github.com/apache/polaris/blob/main/CONTRIBUTING.md for more. --> ## Checklist - [x] ๐ก๏ธ Don't disclose security issues! (contact [email protected]) - [x] ๐ Clearly explained why the changes are needed, or linked related issues: Fixes # - [x] ๐งช Added/updated tests with good coverage, or manually tested (and explained how) - [x] ๐ก Added comments for complex logic - [ ] ๐งพ Updated `CHANGELOG.md` (if needed) - [x] ๐ Updated documentation in `site/content/in-dev/unreleased` (if needed) ## Summary This PR implements the Compute Client Audit Reporting feature as described in GitHub issue #3337. It enables end-to-end audit correlation between catalog operations, credential vending, and compute engine metrics reports. ## Motivation Organizations need to track and audit data access from compute engines (Spark, Trino, Flink) through to actual storage access. This feature captures metrics reports sent by compute engines via the Iceberg REST Catalog `/metrics` endpoint and correlates them with other audit events using OpenTelemetry trace IDs. ## Changes ### New Event Infrastructure - **`AfterReportMetricsEvent`**: New event class emitted after the `reportMetrics` endpoint processes a request - **Event emission in `IcebergRestCatalogApiService.reportMetrics()`**: Emits the event with full request context ### Enhanced Event Listener - **`PolarisPersistenceEventListener.onAfterReportMetrics()`**: Processes metrics events with: - Extraction of `trace-id` and other metadata from ScanReport/CommitReport `metadata` map - Capture of key scan metrics (result_data_files, total_file_size_bytes, etc.) - Capture of key commit metrics (added_data_files, added_records, operation, etc.) - OpenTelemetry context from HTTP headers - Null-safe handling of namespace and table identifiers ### Trace Correlation When AWS STS session tags are enabled (`INCLUDE_SESSION_TAGS_IN_SUBSCOPED_CREDENTIAL=true`), the `trace_id` is included in vended credentials and appears in CloudTrail logs, enabling correlation: Polaris Event (loadTable) โ AWS CloudTrail (S3 access) โ Polaris Event (reportMetrics) โ โ โ trace_id=abc123 polaris:trace_id=abc123 report.trace-id=abc123 ### Documentation - Added comprehensive documentation in `docs/telemetry.md` covering: - Metrics reporting endpoint specification - Trace correlation architecture - Session tags configuration for AWS - Compute engine integration (Spark, Trino, Flink) - Example SQL queries for correlating audit events ## New Tests Added ### Integration Tests (`InMemoryBufferEventListenerIntegrationTest`) | Test Method | Description | |-------------|-------------| | `testReportMetricsEventWithTraceContext()` | Verifies that `AfterReportMetricsEvent` is emitted when a ScanReport is submitted to the metrics endpoint, and that OpenTelemetry trace context from HTTP headers is captured in the event's additional properties | | `testReportMetricsWithTraceIdInMetadata()` | Verifies that `trace-id` and other metadata from the ScanReport's `metadata` map are extracted and stored with the `report.` prefix, enabling compute engines to pass trace context for correlation | | `testReportCommitMetrics()` | Verifies that CommitReport metrics are properly extracted and stored, including operation type, sequence number, snapshot ID, and commit metrics data | ### Test Infrastructure Updates | File | Change | |------|--------| | `TestPolarisEventListener.java` | Added `onAfterReportMetrics()` method to capture metrics events in tests | | `InMemoryBufferEventListenerIntegrationTest.java` | Added `@BeforeEach` cleanup and `ALLOW_OVERLAPPING_CATALOG_URLS` feature flag for test isolation | ## Event Data Captured ### ScanReport Events | Property | Description | |----------|-------------| | `report_type` | "scan" | | `snapshot_id` | Snapshot being scanned | | `schema_id` | Schema ID | | `result_data_files` | Number of data files in result | | `total_file_size_bytes` | Total size of files scanned | | `report.trace-id` | Trace ID from compute engine (if provided) | | `otel.trace_id` | OpenTelemetry trace ID from HTTP headers | ### CommitReport Events | Property | Description | |----------|-------------| | `report_type` | "commit" | | `snapshot_id` | New snapshot ID | | `sequence_number` | Sequence number | | `operation` | Operation type (append, overwrite, etc.) | | `added_data_files` | Number of files added | | `added_records` | Number of records added | | `report.trace-id` | Trace ID from compute engine (if provided) | ## Configuration No new configuration required. The feature uses existing infrastructure: ```properties # Enable event persistence (already required for audit) polaris.event-listener.type=persistence-in-memory-buffer # Enable session tags for CloudTrail correlation (optional) polaris.features."INCLUDE_SESSION_TAGS_IN_SUBSCOPED_CREDENTIAL"=true -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
