suvodeep-pyne opened a new pull request, #16616:
URL: https://github.com/apache/pinot/pull/16616
# Implement initial stub for audit logging framework in Pinot
## Summary
This PR introduces the initial framework stub for audit logging in Apache
Pinot Controller API requests. **The audit logging framework is disabled by
default** and serves as the foundational infrastructure that will be enhanced
in subsequent pull requests.
The framework provides configurable audit logging capabilities with
structured JSON output, dynamic configuration management, and request filtering
- ready for future activation and extension.
## Changes Included
### Core Audit Framework Components
**New Classes Added:**
- `AuditConfig.java` - Configuration data class with Jackson annotations for
JSON mapping
- `AuditConfigManager.java` - Thread-safe configuration manager with
endpoint filtering logic
- `AuditEvent.java` - Data class representing audit events with all required
fields
- `AuditLogger.java` - Utility class for structured JSON audit logging using
SLF4J
- `AuditRequestProcessor.java` - Request processing logic for extracting
audit information
- `AuditLogFilter.java` - Jersey filter for intercepting API requests
### Configuration Properties
The audit logging framework supports the following configurable properties:
- `enabled` - Enable/disable audit logging (default: false)
- `capture.request.payload.enabled` - Capture request payloads (default:
false)
- `capture.request.headers` - Capture request headers (default: false)
- `max.payload.size` - Maximum payload size in bytes (default: 10,240)
- `excluded.endpoints` - Comma-separated list of endpoints to exclude from
auditing
### Integration Points
**Controller Integration:**
- Modified `BaseControllerStarter.java` to register audit components with
dependency injection
- Added `AuditConfigManager` and `AuditRequestProcessor` to Guice bindings
- Updated `ControllerAdminApiApplication.java` to register the
`AuditLogFilter`
**Logging Configuration:**
- Updated `log4j2.xml` configurations to include dedicated audit logger
(`org.apache.pinot.audit`)
- Audit logs are structured JSON messages logged at INFO level
### Security and Privacy Features
- **Sensitive Header Filtering**: Automatically excludes headers containing
"auth", "password", "token", or "secret"
- **Payload Size Limiting**: Configurable maximum payload size to prevent
memory issues
- **Endpoint Exclusion**: Wildcard-based endpoint filtering (supports
patterns like `/health/*`)
- **Graceful Degradation**: Audit logging failures never affect main request
processing
### Request Processing Capabilities
- **IP Address Extraction**: Supports proxy headers (X-Forwarded-For,
X-Real-IP)
- **User Identification**: Extracts user ID from Authorization headers and
custom headers
- **Service Identification**: Reads service ID from X-Service-ID or
X-Service-Name headers
- **Query Parameter Capture**: Always captures query parameters (lightweight
operation)
- **Request Body Capture**: Optional, configurable request body capture with
size limits
## Implementation Choices
### Design Decisions
1. **Dependency Injection Architecture**: Used Guice for dependency
injection to enable easy testing and configuration management
2. **Jersey Filter Integration**: Implemented as a JAX-RS
ContainerRequestFilter for seamless integration with existing API infrastructure
3. **SLF4J Logging**: Leveraged existing logging infrastructure for
consistent log management
4. **JSON Structured Logging**: Used Jackson ObjectMapper for consistent
JSON serialization
5. **Thread-Safe Configuration**: Implemented lock-free configuration
management for high performance
### Configuration Strategy
- **Dynamic Configuration Support**: Framework designed to support runtime
configuration updates (foundation for future cluster config integration)
- **Sensible Defaults**: Conservative defaults prioritize performance and
security
- **Flexible Filtering**: Wildcard-based endpoint exclusion for operational
flexibility
### Error Handling
- **Graceful Degradation**: Audit logging failures are logged separately and
never impact request processing
- **Defensive Programming**: Null checks and exception handling throughout
the audit pipeline
- **Fallback Values**: Default values for all extracted audit fields when
extraction fails
## Evidence of Sufficient Testing
### Manual Testing Performed
1. **Basic Functionality Testing**:
- Verified audit logs are generated for Controller API requests
- Confirmed JSON structure matches specification requirements
- Tested endpoint exclusion with wildcard patterns
2. **Configuration Testing**:
- Validated all configuration properties work as expected
- Tested enabling/disabling various capture options
- Verified payload size limiting functionality
3. **Security Testing**:
- Confirmed sensitive headers are filtered out
- Tested payload truncation with large request bodies
- Verified graceful degradation when audit logging fails
### Integration Testing
- **Controller Startup**: Verified controller starts successfully with audit
framework enabled
- **API Request Processing**: Confirmed normal API functionality is
unaffected
- **Log Output**: Validated structured JSON audit logs are written to
configured appenders
### Build Testing
```bash
# Clean build and test execution
./mvnw clean install -Pbin-dist
# Passed all existing tests
# No new test failures introduced
```
### Code Quality Checks
```bash
# Code style verification
./mvnw checkstyle:check
# License formatting
./mvnw license:format
# Code formatting
./mvnw spotless:apply
```
## Technical Details
### Audit Event Schema
Each audit event contains the following fields as JSON:
```json
{
"timestamp": "2025-01-15T10:30:00.000Z",
"service_id": "pinot-controller",
"endpoint": "/tables",
"method": "POST",
"origin_ip_address": "192.168.1.100",
"user_id": "admin",
"request": {
"queryParameters": {...},
"body": "...",
"headers": {...}
}
}
```
### Performance Considerations
- **Minimal Overhead**: Audit processing occurs only when enabled
- **Efficient Filtering**: Endpoint exclusion check happens early to avoid
unnecessary processing
- **Configurable Payload Capture**: Request body capture is optional and
size-limited
- **Non-Blocking**: Audit logging uses existing SLF4J infrastructure for
async processing
### Future Extension Points
- **Cluster Configuration Integration**: Framework ready for dynamic config
updates via ZooKeeper
- **Additional Audit Fields**: Easy to extend AuditEvent class for
additional metadata
- **Custom Processors**: Pluggable audit processor architecture for custom
audit logic
- **Response Auditing**: Framework can be extended to audit response data
## Constants Added
Added audit-related constants to `CommonConstants.java`:
- `DEFAULT_AUDIT_LOG_MAX_PAYLOAD_SIZE`
- `AUDIT_CONFIG_PREFIX`
- Additional configuration keys for future use
## Compatibility
- **Backward Compatible**: No breaking changes to existing APIs
- **Optional Feature**: Audit logging can be disabled without affecting
functionality
- **Configuration Driven**: All audit behavior controlled through
configuration
- **Zero Impact**: When disabled, introduces negligible performance overhead
## Future Work
This initial framework stub provides the foundation for audit logging in
Apache Pinot. Subsequent pull requests will:
- Enable the audit logging framework by default with appropriate
configuration
- Integrate with cluster configuration management for dynamic updates
- Add comprehensive unit and integration tests
- Enhance audit event fields based on operational requirements
- Add performance optimizations and monitoring capabilities
This implementation establishes the infrastructure while maintaining
complete backward compatibility and zero impact on existing functionality.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]