vamshikrishnakyatham opened a new pull request, #13804: URL: https://github.com/apache/hudi/pull/13804
### Change Logs This PR introduces a new comprehensive `show_timeline` procedure for Hudi Spark SQL that provides detailed timeline information for all table operations. The procedure displays timeline instants including commits, deltacommits, compactions, clustering, cleaning, and rollback operations etc with support for both active and archived timelines and completed, pending state instants Key features implemented: - **Comprehensive timeline view**: Shows all timeline instants with detailed metadata including state transitions (REQUESTED, INFLIGHT, COMPLETED) - **Time-based filtering**: Support for `startTime` and `endTime` parameters to filter results within specific time ranges - **Archive timeline support**: `showArchived` parameter to include archived timeline data for complete historical view - **Generic SQL filtering**: `filter` parameter supporting SQL expressions for flexible result filtering - **Rich metadata output**: Includes formatted timestamps, rollback information, and table type details The procedure replaces multiple fragmented timeline-related procedures with a single unified interface that provides both pending and completed instant information with partition-specific metadata support. ### Impact **Public API Changes:** - **New procedure**: `show_timeline` with comprehensive parameter support (table, path, limit, showArchived, filter, startTime, endTime) - **Enhanced schema**: 8-column output including instant_time, action, state, requested_time, inflight_time, completed_time, timeline_type, rollback_info - **Backward compatibility**: Existing timeline procedures remain functional (deprecated with guidance to use new procedure) **User-facing Features:** - **Unified timeline interface**: Single procedure for all timeline operations instead of multiple specialized procedures - **Advanced filtering capabilities**: Time-based and SQL expression filtering for precise result control - **Historical data access**: Archive timeline support for complete audit trails **Performance Impact:** - **Optimized timeline scanning**: Efficient file system scanning with proper extension filtering - **Configurable limits**: Default 20-entry limit if there is no start and end time mentioned with user override capability - **Selective archive access**: Archive timeline only loaded when explicitly requested ### Risk level: **Low** **Verification performed:** - **Comprehensive test coverage**: 4 focused test cases covering basic functionality, MoR tables, rollback operations, and state transitions - **Schema validation**: All output fields properly typed and validated - **Error handling**: Graceful handling of invalid filters, missing tables, and timeline access failures - **Timeline consistency**: Proper handling of both active and archived timelines with correct state mapping ### Documentation Update - Add `show_timeline` procedure to Hudi Spark SQL procedures documentation - Update timeline management examples to use new procedure - Add advanced filtering examples and use cases ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Change Logs and Impact were stated clearly - [x] Adequate tests were added if applicable - [x] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
