Re: [PR] feat: Add Unshredded Variant read & write support [hudi]

via GitHub Tue, 24 Feb 2026 16:57:47 -0800


the-other-tim-brown commented on code in PR #17833:
URL: https://github.com/apache/hudi/pull/17833#discussion_r2850265163



##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/hudi/SparkAdapter.scala:
##########
@@ -374,4 +377,69 @@ trait SparkAdapter extends Serializable {
    * @return A streaming [[DataFrame]]
    */
   def createStreamingDataFrame(sqlContext: SQLContext, relation: 
HadoopFsRelation, requiredSchema: StructType): DataFrame
+
+  /**
+   * Gets the VariantType DataType if supported by this Spark version.
+   * Spark 3.x returns None (VariantType not supported).
+   * Spark 4.x returns Some(VariantType).
+   *
+   * @return Option[DataType] - Some(VariantType) for Spark 4.x, None for 
Spark 3.x
+   */
+  def getVariantDataType: Option[DataType]
+
+  /**
+   * Checks if two data types are equal for Parquet file format purposes.
+   * This handles version-specific types like VariantType (Spark 4.0+).
+   *
+   * Returns Some(true) if types are equal, Some(false) if not equal, or None 
if
+   * this adapter doesn't handle this specific type comparison (fallback to 
default logic).
+   *
+   * @param requiredType The required/expected data type
+   * @param fileType The data type from the file
+   * @return Option[Boolean] - Some(result) if handled by adapter, None 
otherwise
+   */
+  def isDataTypeEqualForParquet(requiredType: DataType, fileType: DataType): 
Option[Boolean]

Review Comment:
   Is this limited to parquet or can it apply to other formats like ORC?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat: Add Unshredded Variant read & write support [hudi]

Reply via email to