suvodeep-pyne opened a new pull request, #16616:
URL: https://github.com/apache/pinot/pull/16616

   # Implement initial stub for audit logging framework in Pinot
   
   ## Summary
   
   This PR introduces the initial framework stub for audit logging in Apache 
Pinot Controller API requests. **The audit logging framework is disabled by 
default** and serves as the foundational infrastructure that will be enhanced 
in subsequent pull requests.
   
   The framework provides configurable audit logging capabilities with 
structured JSON output, dynamic configuration management, and request filtering 
- ready for future activation and extension.
   
   ## Changes Included
   
   ### Core Audit Framework Components
   
   **New Classes Added:**
   - `AuditConfig.java` - Configuration data class with Jackson annotations for 
JSON mapping
   - `AuditConfigManager.java` - Thread-safe configuration manager with 
endpoint filtering logic
   - `AuditEvent.java` - Data class representing audit events with all required 
fields
   - `AuditLogger.java` - Utility class for structured JSON audit logging using 
SLF4J
   - `AuditRequestProcessor.java` - Request processing logic for extracting 
audit information
   - `AuditLogFilter.java` - Jersey filter for intercepting API requests
   
   ### Configuration Properties
   
   The audit logging framework supports the following configurable properties:
   - `enabled` - Enable/disable audit logging (default: false)
   - `capture.request.payload.enabled` - Capture request payloads (default: 
false)
   - `capture.request.headers` - Capture request headers (default: false)
   - `max.payload.size` - Maximum payload size in bytes (default: 10,240)
   - `excluded.endpoints` - Comma-separated list of endpoints to exclude from 
auditing
   
   ### Integration Points
   
   **Controller Integration:**
   - Modified `BaseControllerStarter.java` to register audit components with 
dependency injection
   - Added `AuditConfigManager` and `AuditRequestProcessor` to Guice bindings
   - Updated `ControllerAdminApiApplication.java` to register the 
`AuditLogFilter`
   
   **Logging Configuration:**
   - Updated `log4j2.xml` configurations to include dedicated audit logger 
(`org.apache.pinot.audit`)
   - Audit logs are structured JSON messages logged at INFO level
   
   ### Security and Privacy Features
   
   - **Sensitive Header Filtering**: Automatically excludes headers containing 
"auth", "password", "token", or "secret"
   - **Payload Size Limiting**: Configurable maximum payload size to prevent 
memory issues
   - **Endpoint Exclusion**: Wildcard-based endpoint filtering (supports 
patterns like `/health/*`)
   - **Graceful Degradation**: Audit logging failures never affect main request 
processing
   
   ### Request Processing Capabilities
   
   - **IP Address Extraction**: Supports proxy headers (X-Forwarded-For, 
X-Real-IP)
   - **User Identification**: Extracts user ID from Authorization headers and 
custom headers
   - **Service Identification**: Reads service ID from X-Service-ID or 
X-Service-Name headers
   - **Query Parameter Capture**: Always captures query parameters (lightweight 
operation)
   - **Request Body Capture**: Optional, configurable request body capture with 
size limits
   
   ## Implementation Choices
   
   ### Design Decisions
   
   1. **Dependency Injection Architecture**: Used Guice for dependency 
injection to enable easy testing and configuration management
   2. **Jersey Filter Integration**: Implemented as a JAX-RS 
ContainerRequestFilter for seamless integration with existing API infrastructure
   3. **SLF4J Logging**: Leveraged existing logging infrastructure for 
consistent log management
   4. **JSON Structured Logging**: Used Jackson ObjectMapper for consistent 
JSON serialization
   5. **Thread-Safe Configuration**: Implemented lock-free configuration 
management for high performance
   
   ### Configuration Strategy
   
   - **Dynamic Configuration Support**: Framework designed to support runtime 
configuration updates (foundation for future cluster config integration)
   - **Sensible Defaults**: Conservative defaults prioritize performance and 
security
   - **Flexible Filtering**: Wildcard-based endpoint exclusion for operational 
flexibility
   
   ### Error Handling
   
   - **Graceful Degradation**: Audit logging failures are logged separately and 
never impact request processing
   - **Defensive Programming**: Null checks and exception handling throughout 
the audit pipeline
   - **Fallback Values**: Default values for all extracted audit fields when 
extraction fails
   
   ## Evidence of Sufficient Testing
   
   ### Manual Testing Performed
   
   1. **Basic Functionality Testing**:
      - Verified audit logs are generated for Controller API requests
      - Confirmed JSON structure matches specification requirements
      - Tested endpoint exclusion with wildcard patterns
   
   2. **Configuration Testing**:
      - Validated all configuration properties work as expected
      - Tested enabling/disabling various capture options
      - Verified payload size limiting functionality
   
   3. **Security Testing**:
      - Confirmed sensitive headers are filtered out
      - Tested payload truncation with large request bodies
      - Verified graceful degradation when audit logging fails
   
   ### Integration Testing
   
   - **Controller Startup**: Verified controller starts successfully with audit 
framework enabled
   - **API Request Processing**: Confirmed normal API functionality is 
unaffected
   - **Log Output**: Validated structured JSON audit logs are written to 
configured appenders
   
   ### Build Testing
   
   ```bash
   # Clean build and test execution
   ./mvnw clean install -Pbin-dist
   # Passed all existing tests
   # No new test failures introduced
   ```
   
   ### Code Quality Checks
   
   ```bash
   # Code style verification
   ./mvnw checkstyle:check
   # License formatting
   ./mvnw license:format
   # Code formatting
   ./mvnw spotless:apply
   ```
   
   ## Technical Details
   
   ### Audit Event Schema
   
   Each audit event contains the following fields as JSON:
   ```json
   {
     "timestamp": "2025-01-15T10:30:00.000Z",
     "service_id": "pinot-controller",
     "endpoint": "/tables",
     "method": "POST",
     "origin_ip_address": "192.168.1.100",
     "user_id": "admin",
     "request": {
       "queryParameters": {...},
       "body": "...",
       "headers": {...}
     }
   }
   ```
   
   ### Performance Considerations
   
   - **Minimal Overhead**: Audit processing occurs only when enabled
   - **Efficient Filtering**: Endpoint exclusion check happens early to avoid 
unnecessary processing
   - **Configurable Payload Capture**: Request body capture is optional and 
size-limited
   - **Non-Blocking**: Audit logging uses existing SLF4J infrastructure for 
async processing
   
   ### Future Extension Points
   
   - **Cluster Configuration Integration**: Framework ready for dynamic config 
updates via ZooKeeper
   - **Additional Audit Fields**: Easy to extend AuditEvent class for 
additional metadata
   - **Custom Processors**: Pluggable audit processor architecture for custom 
audit logic
   - **Response Auditing**: Framework can be extended to audit response data
   
   ## Constants Added
   
   Added audit-related constants to `CommonConstants.java`:
   - `DEFAULT_AUDIT_LOG_MAX_PAYLOAD_SIZE`
   - `AUDIT_CONFIG_PREFIX` 
   - Additional configuration keys for future use
   
   ## Compatibility
   
   - **Backward Compatible**: No breaking changes to existing APIs
   - **Optional Feature**: Audit logging can be disabled without affecting 
functionality  
   - **Configuration Driven**: All audit behavior controlled through 
configuration
   - **Zero Impact**: When disabled, introduces negligible performance overhead
   
   ## Future Work
   
   This initial framework stub provides the foundation for audit logging in 
Apache Pinot. Subsequent pull requests will:
   
   - Enable the audit logging framework by default with appropriate 
configuration
   - Integrate with cluster configuration management for dynamic updates
   - Add comprehensive unit and integration tests
   - Enhance audit event fields based on operational requirements
   - Add performance optimizations and monitoring capabilities
   
   This implementation establishes the infrastructure while maintaining 
complete backward compatibility and zero impact on existing functionality.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to