GitHub user rahil-c edited a discussion: RFC-100: Lance File Format support in Hudi
## ✅ Lance File Format Integration Tasks See the following feature for more context: https://github.com/apache/hudi/issues/14127 In regards to the following new feature for supporting unstructured data in Hudi via formats like Lance that are focused on AI/ML use cases. Here is the initial scope of what we are targeting(Note this list will continue to grow as we find get deeper within the integration, for now it aims to first support the Hudi Spark Client): - [ ] Add base `HoodieFileWriter` for Lance with a Spark implementation, [PR](https://github.com/apache/hudi/pull/14131) - [ ] Add base `HoodieFileReader` for Lance with a Spark implementation, [PR](https://github.com/apache/hudi/pull/14132) - [ ] Add basic Avro → Arrow schema conversion - [ ] Add `SparkColumnarFileReader` implementation for Lance - [ ] Implement append-only validation (bulk insert) - [ ] Integrate Lance as a log file format - [ ] Implement insert / upsert / delete validation - [ ] Add predicate (filter) push-down - [ ] Support `ColumnarBatch` vectorized reading Will be making on the following open source feature branch: https://github.com/apache/hudi/tree/feature-branch-rfc100-unstructured-data GitHub link: https://github.com/apache/hudi/discussions/14128 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
