vamshikrishnakyatham opened a new pull request, #13804:
URL: https://github.com/apache/hudi/pull/13804

   ### Change Logs
   
   This PR introduces a new comprehensive `show_timeline` procedure for Hudi 
Spark SQL that provides detailed timeline information for all table operations. 
The procedure displays timeline instants including commits, deltacommits, 
compactions, clustering, cleaning, and rollback operations etc with support for 
both active and archived timelines and completed, pending state instants
   
   Key features implemented:
   - **Comprehensive timeline view**: Shows all timeline instants with detailed 
metadata including state transitions (REQUESTED, INFLIGHT, COMPLETED)
   - **Time-based filtering**: Support for `startTime` and `endTime` parameters 
to filter results within specific time ranges
   - **Archive timeline support**: `showArchived` parameter to include archived 
timeline data for complete historical view
   - **Generic SQL filtering**: `filter` parameter supporting SQL expressions 
for flexible result filtering
   - **Rich metadata output**: Includes formatted timestamps, rollback 
information, and table type details
   
   The procedure replaces multiple fragmented timeline-related procedures with 
a single unified interface that provides both pending and completed instant 
information with partition-specific metadata support.
   
   ### Impact
   
   **Public API Changes:**
   - **New procedure**: `show_timeline` with comprehensive parameter support 
(table, path, limit, showArchived, filter, startTime, endTime)
   - **Enhanced schema**: 8-column output including instant_time, action, 
state, requested_time, inflight_time, completed_time, timeline_type, 
rollback_info
   - **Backward compatibility**: Existing timeline procedures remain functional 
(deprecated with guidance to use new procedure)
   
   **User-facing Features:**
   - **Unified timeline interface**: Single procedure for all timeline 
operations instead of multiple specialized procedures
   - **Advanced filtering capabilities**: Time-based and SQL expression 
filtering for precise result control
   - **Historical data access**: Archive timeline support for complete audit 
trails
   
   **Performance Impact:**
   - **Optimized timeline scanning**: Efficient file system scanning with 
proper extension filtering
   - **Configurable limits**: Default 20-entry limit if there is no start and 
end time mentioned with user override capability
   - **Selective archive access**: Archive timeline only loaded when explicitly 
requested
   
   ### Risk level: **Low**
   
   **Verification performed:**
   - **Comprehensive test coverage**: 4 focused test cases covering basic 
functionality, MoR tables, rollback operations, and state transitions
   - **Schema validation**: All output fields properly typed and validated
   - **Error handling**: Graceful handling of invalid filters, missing tables, 
and timeline access failures
   - **Timeline consistency**: Proper handling of both active and archived 
timelines with correct state mapping
   
   ### Documentation Update
   - Add `show_timeline` procedure to Hudi Spark SQL procedures documentation
   - Update timeline management examples to use new procedure
   - Add advanced filtering examples and use cases
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Change Logs and Impact were stated clearly
   - [x] Adequate tests were added if applicable
   - [x] CI passed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to