obelix74 opened a new pull request, #3348:
URL: https://github.com/apache/polaris/pull/3348

   <!--
   ๐Ÿ“ Describe what changes you're proposing, especially breaking or user-facing 
changes. 
   ๐Ÿ“– See https://github.com/apache/polaris/blob/main/CONTRIBUTING.md for more.
   -->
   
   ## Checklist
   - [x] ๐Ÿ›ก๏ธ Don't disclose security issues! (contact [email protected])
   - [x] ๐Ÿ”— Clearly explained why the changes are needed, or linked related 
issues: Fixes #
   - [x] ๐Ÿงช Added/updated tests with good coverage, or manually tested (and 
explained how)
   - [x] ๐Ÿ’ก Added comments for complex logic
   - [ ] ๐Ÿงพ Updated `CHANGELOG.md` (if needed)
   - [x] ๐Ÿ“š Updated documentation in `site/content/in-dev/unreleased` (if needed)
   
   ## Summary
   
   This PR implements the Compute Client Audit Reporting feature as described 
in GitHub issue #3337. It enables end-to-end audit correlation between catalog 
operations, credential vending, and compute engine metrics reports.
   
   ## Motivation
   
   Organizations need to track and audit data access from compute engines 
(Spark, Trino, Flink) through to actual storage access. This feature captures 
metrics reports sent by compute engines via the Iceberg REST Catalog `/metrics` 
endpoint and correlates them with other audit events using OpenTelemetry trace 
IDs.
   
   ## Changes
   
   ### New Event Infrastructure
   
   - **`AfterReportMetricsEvent`**: New event class emitted after the 
`reportMetrics` endpoint processes a request
   - **Event emission in `IcebergRestCatalogApiService.reportMetrics()`**: 
Emits the event with full request context
   
   ### Enhanced Event Listener
   
   - **`PolarisPersistenceEventListener.onAfterReportMetrics()`**: Processes 
metrics events with:
     - Extraction of `trace-id` and other metadata from ScanReport/CommitReport 
`metadata` map
     - Capture of key scan metrics (result_data_files, total_file_size_bytes, 
etc.)
     - Capture of key commit metrics (added_data_files, added_records, 
operation, etc.)
     - OpenTelemetry context from HTTP headers
     - Null-safe handling of namespace and table identifiers
   
   ### Trace Correlation
   
   When AWS STS session tags are enabled 
(`INCLUDE_SESSION_TAGS_IN_SUBSCOPED_CREDENTIAL=true`), the `trace_id` is 
included in vended credentials and appears in CloudTrail logs, enabling 
correlation:
   
   Polaris Event (loadTable) โ†’ AWS CloudTrail (S3 access) โ†’ Polaris Event 
(reportMetrics)
   โ†“ โ†“ โ†“
   trace_id=abc123 polaris:trace_id=abc123 report.trace-id=abc123
   
   
   ### Documentation
   
   - Added comprehensive documentation in `docs/telemetry.md` covering:
     - Metrics reporting endpoint specification
     - Trace correlation architecture
     - Session tags configuration for AWS
     - Compute engine integration (Spark, Trino, Flink)
     - Example SQL queries for correlating audit events
   
   ## New Tests Added
   
   ### Integration Tests (`InMemoryBufferEventListenerIntegrationTest`)
   
   | Test Method | Description |
   |-------------|-------------|
   | `testReportMetricsEventWithTraceContext()` | Verifies that 
`AfterReportMetricsEvent` is emitted when a ScanReport is submitted to the 
metrics endpoint, and that OpenTelemetry trace context from HTTP headers is 
captured in the event's additional properties |
   | `testReportMetricsWithTraceIdInMetadata()` | Verifies that `trace-id` and 
other metadata from the ScanReport's `metadata` map are extracted and stored 
with the `report.` prefix, enabling compute engines to pass trace context for 
correlation |
   | `testReportCommitMetrics()` | Verifies that CommitReport metrics are 
properly extracted and stored, including operation type, sequence number, 
snapshot ID, and commit metrics data |
   
   ### Test Infrastructure Updates
   
   | File | Change |
   |------|--------|
   | `TestPolarisEventListener.java` | Added `onAfterReportMetrics()` method to 
capture metrics events in tests |
   | `InMemoryBufferEventListenerIntegrationTest.java` | Added `@BeforeEach` 
cleanup and `ALLOW_OVERLAPPING_CATALOG_URLS` feature flag for test isolation |
   
   ## Event Data Captured
   
   ### ScanReport Events
   | Property | Description |
   |----------|-------------|
   | `report_type` | "scan" |
   | `snapshot_id` | Snapshot being scanned |
   | `schema_id` | Schema ID |
   | `result_data_files` | Number of data files in result |
   | `total_file_size_bytes` | Total size of files scanned |
   | `report.trace-id` | Trace ID from compute engine (if provided) |
   | `otel.trace_id` | OpenTelemetry trace ID from HTTP headers |
   
   ### CommitReport Events
   | Property | Description |
   |----------|-------------|
   | `report_type` | "commit" |
   | `snapshot_id` | New snapshot ID |
   | `sequence_number` | Sequence number |
   | `operation` | Operation type (append, overwrite, etc.) |
   | `added_data_files` | Number of files added |
   | `added_records` | Number of records added |
   | `report.trace-id` | Trace ID from compute engine (if provided) |
   
   ## Configuration
   
   No new configuration required. The feature uses existing infrastructure:
   
   ```properties
   # Enable event persistence (already required for audit)
   polaris.event-listener.type=persistence-in-memory-buffer
   
   # Enable session tags for CloudTrail correlation (optional)
   polaris.features."INCLUDE_SESSION_TAGS_IN_SUBSCOPED_CREDENTIAL"=true


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to