geserdugarov commented on code in PR #18276:
URL: https://github.com/apache/hudi/pull/18276#discussion_r2888622795
##########
rfc/rfc-98/rfc-98.md:
##########
@@ -52,25 +53,240 @@ The current implementation of Spark Datasource V2
integration is presented in th
## Implementation
-<!-- -->
+The approach is hybrid: DSv2 for reads, DSv1 fallback for writes
(`V2TableWithV1Fallback`).
+
+Overall proposed architecture for this hybrid approach is shown in the
following schema:
+
+
+
+### DataFrame API
+
+A new SPI short name `"hudi_v2"` activates the DSv2 path for reading using
Spark DataFrame API.
+The existing `"hudi"` path remains unchanged.
+
+<table>
+<tr>
+<th>Operation</th>
+<th>Current implementation</th>
+<th>Additional functionality proposed in this RFC</th>
+</tr>
+<tr>
+<td>Write</td>
+<td>
+<pre>
+df.write.format("hudi").mode(...).save(path)
+ v
+BaseDefaultSource (V1) -> DefaultSource
+ v
+CreatableRelationProvider.createRelation(...)
+ v
+HoodieSparkSqlWriter.write(...)
+ v
+SparkRDDWriteClient -> upsert/insert/bulk_insert
+</pre>
+</td>
+<td>
+<pre>
+df.write.format("hudi_v2").mode(...).save(path)
Review Comment:
Agree with you, but I don't see any alternatives. I want to add this to
unblock incremental development of this huge changes, and make it the default
and the only way in the end. I've added this in "Future Work" chapter in
41997c495dd69359309d34d8f8c7a261b02b5a7b
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]