the-other-tim-brown commented on code in PR #8574:
URL: https://github.com/apache/hudi/pull/8574#discussion_r1190425359


##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/transform/Transformer.java:
##########
@@ -45,4 +47,9 @@ public interface Transformer {
    */
   @PublicAPIMethod(maturity = ApiMaturityLevel.STABLE)
   Dataset<Row> apply(JavaSparkContext jsc, SparkSession sparkSession, 
Dataset<Row> rowDataset, TypedProperties properties);
+
+  @PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING)
+  default Option<Schema> transformedSchema(JavaSparkContext jsc, SparkSession 
sparkSession, Schema incomingSchema, TypedProperties properties) {

Review Comment:
   I don't think it makes sense for this to return an Option. All rows will 
have a schema of some sorts so this option would never be empty in practice.



##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/transform/Transformer.java:
##########
@@ -45,4 +47,9 @@ public interface Transformer {
    */
   @PublicAPIMethod(maturity = ApiMaturityLevel.STABLE)
   Dataset<Row> apply(JavaSparkContext jsc, SparkSession sparkSession, 
Dataset<Row> rowDataset, TypedProperties properties);
+
+  @PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING)
+  default Option<Schema> transformedSchema(JavaSparkContext jsc, SparkSession 
sparkSession, Schema incomingSchema, TypedProperties properties) {
+    return Option.empty();

Review Comment:
   The default here in my opinion should create an empty dataset with the 
`incomingSchema` and then apply the transformer and call `.schema()` on the 
resulting dataset to get the struct type and convert that back to avro. 
   
   Another note, since transforms deal with Rows, does it make more sense to 
track the schema as a StructType?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to