Re: [PR] docs: [RFC-98] Design doc of DSv2 read support [hudi]

via GitHub Thu, 05 Mar 2026 01:00:21 -0800


geserdugarov commented on code in PR #18276:
URL: https://github.com/apache/hudi/pull/18276#discussion_r2888622795



##########
rfc/rfc-98/rfc-98.md:
##########
@@ -52,25 +53,240 @@ The current implementation of Spark Datasource V2 
integration is presented in th
 
 ## Implementation
 
-<!--  -->
+The approach is hybrid: DSv2 for reads, DSv1 fallback for writes 
(`V2TableWithV1Fallback`).
+
+Overall proposed architecture for this hybrid approach is shown in the 
following schema:
+
+![Proposed approach with hybrid V1 write and V2 
read](integration_with_DSv2_read.jpg)
+
+### DataFrame API
+
+A new SPI short name `"hudi_v2"` activates the DSv2 path for reading using 
Spark DataFrame API. 
+The existing `"hudi"` path remains unchanged.
+
+<table>
+<tr>
+<th>Operation</th>
+<th>Current implementation</th>
+<th>Additional functionality proposed in this RFC</th>
+</tr>
+<tr>
+<td>Write</td>
+<td>
+<pre>
+df.write.format("hudi").mode(...).save(path)
+        v
+BaseDefaultSource (V1) -> DefaultSource
+        v
+CreatableRelationProvider.createRelation(...)
+        v
+HoodieSparkSqlWriter.write(...)
+        v
+SparkRDDWriteClient -> upsert/insert/bulk_insert
+</pre>
+</td>
+<td>
+<pre>
+df.write.format("hudi_v2").mode(...).save(path)

Review Comment:
   Agree with you, but I don't see any alternatives. I want to add this to 
unblock incremental development of this huge changes, and make it the default 
and the only way in the end. I've added this in "Future Work" chapter in 
41997c495dd69359309d34d8f8c7a261b02b5a7b



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] docs: [RFC-98] Design doc of DSv2 read support [hudi]

Reply via email to