zhangshenghang commented on issue #10227: URL: https://github.com/apache/seatunnel/issues/10227#issuecomment-3742247835
<!-- seatunnel-pr-reviewer --> Thanks for the feature request! You're correct that Hudi Source is still not supported. Based on the current codebase: **Current Status:** - The Hudi connector only implements **Sink** functionality (HudiSink, HudiSinkFactory) - No **Source** implementation exists (no HudiSource or HudiSourceFactory classes) - The connector currently uses Hudi 0.15.0 with `hudi-java-client` **Why It Was Removed:** The original concern was valid: Hudi's reading capabilities historically required engine-specific implementations (Spark DataSource, Flink Source). However, this has evolved with newer Hudi versions. **Possible Approaches:** 1. **Use Hudi Java Client** (hudi-hadoop-mr): Hudi 0.14+ provides better engine-agnostic read APIs via the Hadoop FileSystem interface 2. **Implement as a Source connector**: Similar to how Iceberg/Paimon sources work using their table APIs 3. **Use HudiInputFormat/FileSystem**: Read base files directly through Hadoop FileSystem API **Questions for the Community:** - Which Hudi table types do you need to read? (COPY_ON_WRITE, MERGE_ON_READ) - Do you need incremental reading (CDC) or just snapshot/batch queries? - Are you using Hudi's Hadoop sync or Hive metastore for table metadata? Contributions are welcome! If anyone is interested in implementing this, please comment on this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
