voonhous commented on code in PR #18839:
URL: https://github.com/apache/hudi/pull/18839#discussion_r3302511048


##########
website/docs/variant_type.md:
##########
@@ -254,12 +257,94 @@ binary `value` field.
 | Engine | VARIANT Support |
 |:-------|:---------------|
 | **Spark 4.0+** | Native `VariantType` — full read/write/query |
-| **Spark 3.x** | Reads as `STRUCT<value: BINARY, metadata: BINARY>` — 
backward compatible |
+| **Spark 3.x** | Reads as `STRUCT<value: BINARY, metadata: BINARY>` — see 
[Reading from Spark 3.5](#reading-from-spark-35-backward-compatibility) |

Review Comment:
   Addressed



##########
website/docs/variant_type.md:
##########
@@ -254,12 +257,94 @@ binary `value` field.
 | Engine | VARIANT Support |
 |:-------|:---------------|
 | **Spark 4.0+** | Native `VariantType` — full read/write/query |
-| **Spark 3.x** | Reads as `STRUCT<value: BINARY, metadata: BINARY>` — 
backward compatible |
+| **Spark 3.x** | Reads as `STRUCT<value: BINARY, metadata: BINARY>` — see 
[Reading from Spark 3.5](#reading-from-spark-35-backward-compatibility) |
 | **Flink** | Reads as `ROW<metadata BYTES, value BYTES>` — cross-engine 
compatible |
 
 A VARIANT table written by Spark 4.0 can be read by Spark 3.x or Flink, and 
vice versa. The
 binary encoding is engine-independent.
 
+## Reading from Spark 3.5 (Backward Compatibility)
+
+Spark 3.5 cannot construct a native `VariantType`, so it cannot consume a 
VARIANT-typed Hudi
+table the same way Spark 4.0+ does. This section covers the supported read 
path: how to point
+a Spark 3.5 job at a 1.2.0 VARIANT table, what you get back, and what you give 
up.
+
+### Why a special path is needed
+
+If you let Hudi resolve the table schema from commit metadata on Spark 3.5, 
the read fails
+fast because the Spark 3.x adapter rejects the `VariantType` conversion:
+
+```python
+spark.read.format("hudi").load("/path/to/events").show()
+# org.apache.hudi.exception.HoodieSchemaException: ...
+# Caused by: java.lang.UnsupportedOperationException:
+#   VARIANT type is only supported in Spark 4.0+
+```
+
+The table data on disk is fine — Hudi just needs to be told the column's 
*physical* shape
+on the reader side so it can skip the unsupported logical conversion.

Review Comment:
   Addressed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to