voonhous commented on code in PR #13152:
URL: https://github.com/apache/hudi/pull/13152#discussion_r3401249650


##########
rfc/rfc-94/rfc-94.md:
##########
@@ -0,0 +1,515 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+# RFC-94: Hudi Timeline User Interface (UI)
+
+## Proposers
+
+- @voonhous
+
+## Approvers
+
+- @danny0405
+- @rahil-c
+- @yihua
+
+## Status
+
+JIRA: [HUDI-9315](https://issues.apache.org/jira/browse/HUDI-9315)
+
+## Abstract
+
+Hudi Timeline metadata is stored as timestamped files representing state 
transitions of actions like
+`commit`, `deltacommit` and `compaction`. These files are accessible via the 
CLI or a file explorer,
+but it's hard to visualize concurrent actions, spot missing transitions, or 
tell how long each step
+took. Debugging timeline issues by reading filenames is tedious.
+
+This RFC proposes a UI-based timeline visualization tool that parses these 
metadata files, groups
+related actions, and renders them in a time-ordered, interactive view. Users 
can track the lifecycle
+of each operation, see concurrency patterns, and spot anomalies or 
long-running tasks. The
+implementation extends `hudi-timeline-service` with new `/v2/` REST APIs and a 
static HTML +
+JavaScript frontend powered by 
[vis-timeline](https://github.com/visjs/vis-timeline), served via
+Javalin's built-in static file serving with zero new Java compile-time 
dependencies.
+
+## Background
+
+Today, we rely on the CLI or direct filesystem inspection to understand 
timeline state through
+metadata files. These files represent different actions (e.g., `deltacommit`, 
`compaction`) and
+their lifecycle states (`requested`, `inflight`, `completed`), encoded in file 
names like:
+
+```shell
+20250409102118815.deltacommit.inflight
+20250409102118815.deltacommit.requested
+20250409102118815_20250409102124339.deltacommit
+20250409102121593.compaction.inflight
+20250409102121593.compaction.requested
+20250409102121593_20250409102122232.commit
+20250409102124581.deltacommit.inflight
+20250409102124581.deltacommit.requested
+20250409102124581_20250409102125667.deltacommit
+20250409102124612.compaction.inflight
+20250409102124612.compaction.requested
+20250409102124612_20250409102124892.commit
+20250409102127348.deltacommit.inflight
+20250409102127348.deltacommit.requested
+20250409102127348_20250409102128481.deltacommit
+20250409102127500.compaction.inflight
+20250409102127500.compaction.requested
+20250409102127500_20250409102127721.commit
+```
+
+This works, but has a few problems:
+
+1. No visibility into concurrency
+    - Multiple actions (e.g., `deltacommit` and `compaction`) often run 
concurrently.
+    - The CLI doesn't help correlate or visualize overlapping operations.
+2. Lack of temporal context
+    - Timestamps are embedded in filenames but are hard to compare visually - 
year, month and
+      day can be quickly determined, but minutes and seconds are harder to 
parse.
+    - No easy way to tell how long an action took or whether it's stalling 
unless you
+      manually calculate the difference between requested and completion time.
+3. Hard to spot inconsistencies or missing states
+    - An `inflight` compaction without a corresponding `commit` can indicate a 
starved/stuck
+      compaction, which usually blocks archiving/cleaning.
+    - These gaps are easy to miss when scanning filenames.
+
+On top of that, all timeline files are now stored as Avro binaries. Inspecting 
their contents
+requires custom Avro readers to convert the binaries to JSON.
+
+## Scope
+
+This RFC covers visualization of metadata available in Hudi tables. All 
features are **READ-ONLY** -
+there is no support for starting or spawning jobs that mutate a Hudi table.
+
+The following are **out of scope**:
+
+- **Archived timeline:** Only the active timeline is rendered. Loading 
instants from LSM-based
+  archive files is left for future work.
+- **Metadata table overlay:** The metadata table's own timeline is not shown 
alongside the main
+  table timeline.
+- **Write/mutation operations:** The UI cannot trigger compactions, 
clustering, or any write action.
+- **Authentication/authorization:** No access control is added. The timeline 
server is assumed to
+  run in a trusted network, same as today.
+

Review Comment:
   Good point. Added a threat-model note to the Scope section. Summary:
   - The `/v2/` UI endpoints expose the same timeline/filesystem metadata the 
existing `/v1/` APIs already serve on the same interface, so the UI does not 
widen the exposure surface.
   - It is opt-in and off by default (`--enable-ui` / 
`hoodie.embed.timeline.server.ui.enable=false`).
   - For untrusted networks, operators should front it with a reverse proxy or 
restrict it to a private interface.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to