voonhous commented on code in PR #13152: URL: https://github.com/apache/hudi/pull/13152#discussion_r3401316160
########## rfc/rfc-94/rfc-94.md: ########## @@ -0,0 +1,515 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +# RFC-94: Hudi Timeline User Interface (UI) + +## Proposers + +- @voonhous + +## Approvers + +- @danny0405 +- @rahil-c +- @yihua + +## Status + +JIRA: [HUDI-9315](https://issues.apache.org/jira/browse/HUDI-9315) + +## Abstract + +Hudi Timeline metadata is stored as timestamped files representing state transitions of actions like +`commit`, `deltacommit` and `compaction`. These files are accessible via the CLI or a file explorer, +but it's hard to visualize concurrent actions, spot missing transitions, or tell how long each step +took. Debugging timeline issues by reading filenames is tedious. + +This RFC proposes a UI-based timeline visualization tool that parses these metadata files, groups +related actions, and renders them in a time-ordered, interactive view. Users can track the lifecycle +of each operation, see concurrency patterns, and spot anomalies or long-running tasks. The +implementation extends `hudi-timeline-service` with new `/v2/` REST APIs and a static HTML + +JavaScript frontend powered by [vis-timeline](https://github.com/visjs/vis-timeline), served via +Javalin's built-in static file serving with zero new Java compile-time dependencies. + +## Background + +Today, we rely on the CLI or direct filesystem inspection to understand timeline state through +metadata files. These files represent different actions (e.g., `deltacommit`, `compaction`) and +their lifecycle states (`requested`, `inflight`, `completed`), encoded in file names like: + +```shell +20250409102118815.deltacommit.inflight +20250409102118815.deltacommit.requested +20250409102118815_20250409102124339.deltacommit +20250409102121593.compaction.inflight +20250409102121593.compaction.requested +20250409102121593_20250409102122232.commit +20250409102124581.deltacommit.inflight +20250409102124581.deltacommit.requested +20250409102124581_20250409102125667.deltacommit +20250409102124612.compaction.inflight +20250409102124612.compaction.requested +20250409102124612_20250409102124892.commit +20250409102127348.deltacommit.inflight +20250409102127348.deltacommit.requested +20250409102127348_20250409102128481.deltacommit +20250409102127500.compaction.inflight +20250409102127500.compaction.requested +20250409102127500_20250409102127721.commit +``` + +This works, but has a few problems: + +1. No visibility into concurrency + - Multiple actions (e.g., `deltacommit` and `compaction`) often run concurrently. + - The CLI doesn't help correlate or visualize overlapping operations. +2. Lack of temporal context + - Timestamps are embedded in filenames but are hard to compare visually - year, month and + day can be quickly determined, but minutes and seconds are harder to parse. + - No easy way to tell how long an action took or whether it's stalling unless you + manually calculate the difference between requested and completion time. +3. Hard to spot inconsistencies or missing states + - An `inflight` compaction without a corresponding `commit` can indicate a starved/stuck + compaction, which usually blocks archiving/cleaning. + - These gaps are easy to miss when scanning filenames. + +On top of that, all timeline files are now stored as Avro binaries. Inspecting their contents +requires custom Avro readers to convert the binaries to JSON. + +## Scope + +This RFC covers visualization of metadata available in Hudi tables. All features are **READ-ONLY** - +there is no support for starting or spawning jobs that mutate a Hudi table. + +The following are **out of scope**: + +- **Archived timeline:** Only the active timeline is rendered. Loading instants from LSM-based + archive files is left for future work. +- **Metadata table overlay:** The metadata table's own timeline is not shown alongside the main + table timeline. +- **Write/mutation operations:** The UI cannot trigger compactions, clustering, or any write action. +- **Authentication/authorization:** No access control is added. The timeline server is assumed to + run in a trusted network, same as today. + +## Implementation + +Keeping the implementation lightweight is a priority - we should add as few dependencies as +possible. Changes go into the existing `hudi-timeline-service` module, which contains a Javalin +web-application that caches filesystem metadata of a Hudi table for job executors during +tagging/writing. + +To use the Hudi Timeline UI, users can either start the Timeline Server in **STANDALONE** mode +(which is already supported) or enable the UI on the **EMBEDDED** timeline server that runs within +a Spark application's driver process (see [Configuration](#configuration)). + +The Hudi Timeline UI has two parts: the frontend and backend. + +### Architecture + +The timeline server can run standalone or embedded inside a Spark driver. In embedded mode, a tab +in the Spark UI links directly to the Hudi Timeline UI. + +```mermaid +graph LR + Browser["Browser"] + + subgraph Driver["Standalone / Spark Driver"] + subgraph TimelineServer["Javalin (Timeline Server)"] + Static["/ui/* - Static Files\n(HTML, JS, CSS)"] + API["/v2/timeline/* - UiHandler"] + FSVM["FileSystemViewManager"] + Meta["HoodieTimeline / MetaClient"] + + API --> FSVM --> Meta + end + + subgraph SparkUI["Spark UI (:4040) - embedded mode only"] + direction TB + SparkUIPad[ ] ~~~ Tabs["[Jobs] [Stages] ... [Hudi Timeline]"] + end + + style SparkUIPad fill:none,stroke:none,color:none + + Tabs -- "link" --> Static + end + + Browser -- "HTTP" --> Static + Browser -- "HTTP" --> API + Browser -. "HTTP\n(embedded mode)" .-> SparkUI +``` + +There are two categories of requests: + +1. **Static file requests** (`/ui/*`) - Javalin serves HTML, JavaScript, and CSS files from the + classpath (`src/main/resources/public/`). No server-side rendering or template engine is needed. +2. **REST API requests** (`/v2/timeline/*`) - A new `UiHandler` processes these requests, reading + timeline data from the `FileSystemViewManager` and `HoodieTableMetaClient`, then returning JSON + responses. + +### Frontend + +The frontend is static HTML pages with vanilla JavaScript, similar to the Spark Web UI. Javalin's +built-in static file serving handles files from the classpath - no template engine (e.g., +Thymeleaf) is needed and no new Java compile-time dependencies are added. + +No frontend build pipeline (npm, webpack, vite) is needed. Contributing to the UI requires only a +text editor. The only external library is vis-timeline for timeline rendering. + +#### File Structure + +``` +hudi-timeline-service/src/main/resources/public/ +├── index.html # Landing page with basepath input form +├── js/ +│ └── timeline.js # vis-timeline initialization and REST API calls +├── css/ +│ └── style.css # Basic styling +└── lib/ + └── vis-timeline/ # Bundled fallback copy of vis-timeline + ├── vis-timeline-graph2d.min.js + └── vis-timeline-graph2d.min.css +``` + +#### JavaScript Delivery: CDN with Bundled Fallback + Review Comment: Agreed, an outbound CDN call is the wrong default for an internal debugging tool. Flipped it: the UI now serves the bundled vis-timeline copy by default and makes no external network calls, so air-gapped and security-conscious setups work out of the box. Dropped the CDN-first strategy; a CDN source can be added later as an opt-in flag if automatic patch updates are ever wanted. Updated the "JavaScript Delivery" section. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
