Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-12 Thread via GitHub


yihua merged PR #10615:
URL: https://github.com/apache/hudi/pull/10615


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-12 Thread via GitHub


hudi-bot commented on PR #10615:
URL: https://github.com/apache/hudi/pull/10615#issuecomment-2052767220

   
   ## CI report:
   
   * 805ba35b65afbb1daccbcf00291fd520a69c5584 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23232)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-12 Thread via GitHub


hudi-bot commented on PR #10615:
URL: https://github.com/apache/hudi/pull/10615#issuecomment-2052703726

   
   ## CI report:
   
   * dfab8e1285bf0241eea2e71f9d85607c647446d7 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23212)
 
   * 805ba35b65afbb1daccbcf00291fd520a69c5584 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23232)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-12 Thread via GitHub


hudi-bot commented on PR #10615:
URL: https://github.com/apache/hudi/pull/10615#issuecomment-2052699814

   
   ## CI report:
   
   * dfab8e1285bf0241eea2e71f9d85607c647446d7 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23212)
 
   * 805ba35b65afbb1daccbcf00291fd520a69c5584 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-12 Thread via GitHub


yihua commented on code in PR #10615:
URL: https://github.com/apache/hudi/pull/10615#discussion_r1563323590


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##
@@ -530,6 +539,40 @@ object ProvidesHoodieConfig {
   filterNullValues(overridingOpts)
   }
 
+  /**
+   * @param tableConfigKeyGeneratorClassName key generator class name in 
the table config.
+   * @param partitionFieldNamesWithoutKeyGenType partition field names without 
key generator types
+   * from the table config.
+   * @param catalogTable HoodieCatalogTable instance 
to fetch table properties.
+   * @return the write config value to set for 
"hoodie.datasource.write.partitionpath.field".
+   */
+  def getPartitionPathFieldWriteConfig(tableConfigKeyGeneratorClassName: 
String,
+   partitionFieldNamesWithoutKeyGenType: 
String,
+   catalogTable: HoodieCatalogTable): 
String = {
+if (StringUtils.isNullOrEmpty(tableConfigKeyGeneratorClassName)) {
+  partitionFieldNamesWithoutKeyGenType
+} else {
+  val writeConfigPartitionField = 
catalogTable.catalogProperties.get(PARTITIONPATH_FIELD.key())
+  val keyGenClass = 
ReflectionUtils.getClass(tableConfigKeyGeneratorClassName)
+  if (classOf[CustomKeyGenerator].equals(keyGenClass)

Review Comment:
   The assumption is that these key generators should not be extended.  We 
should keep it this way for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-12 Thread via GitHub


yihua commented on code in PR #10615:
URL: https://github.com/apache/hudi/pull/10615#discussion_r1563245298


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestSparkSqlWithCustomKeyGenerator.scala:
##
@@ -0,0 +1,571 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.functional
+
+import org.apache.hudi.HoodieSparkUtils
+import org.apache.hudi.common.config.TypedProperties
+import org.apache.hudi.common.table.HoodieTableMetaClient
+import org.apache.hudi.common.util.StringUtils
+import org.apache.hudi.exception.HoodieException
+import org.apache.hudi.functional.TestSparkSqlWithCustomKeyGenerator._
+import org.apache.hudi.util.SparkKeyGenUtils
+import org.apache.spark.sql.SaveMode
+import org.apache.spark.sql.hudi.common.HoodieSparkSqlTestBase
+import org.joda.time.DateTime
+import org.joda.time.format.DateTimeFormat
+import org.junit.jupiter.api.Assertions.{assertEquals, assertFalse, assertTrue}
+import org.slf4j.LoggerFactory
+
+import java.io.IOException
+
+/**
+ * Tests Spark SQL DML with custom key generator and write configs.
+ */
+class TestSparkSqlWithCustomKeyGenerator extends HoodieSparkSqlTestBase {
+  private val LOG = LoggerFactory.getLogger(getClass)
+
+  test("Test Spark SQL DML with custom key generator") {
+withTempDir { tmp =>
+  Seq(
+Seq("COPY_ON_WRITE", "ts:timestamp,segment:simple",
+  "(ts=202401, segment='cat2')", "202401/cat2",
+  Seq("202312/cat2", "202312/cat4", "202401/cat1", "202401/cat3", 
"202402/cat1", "202402/cat3", "202402/cat5"),
+  TS_FORMATTER_FUNC,
+  (ts: Integer, segment: String) => TS_FORMATTER_FUNC.apply(ts) + "/" 
+ segment),
+Seq("MERGE_ON_READ", "segment:simple",
+  "(segment='cat3')", "cat3",
+  Seq("cat1", "cat2", "cat4", "cat5"),
+  TS_TO_STRING_FUNC,
+  (_: Integer, segment: String) => segment),
+Seq("MERGE_ON_READ", "ts:timestamp",
+  "(ts=202312)", "202312",
+  Seq("202401", "202402"),
+  TS_FORMATTER_FUNC,
+  (ts: Integer, _: String) => TS_FORMATTER_FUNC.apply(ts)),
+Seq("MERGE_ON_READ", "ts:timestamp,segment:simple",
+  "(ts=202401, segment='cat2')", "202401/cat2",
+  Seq("202312/cat2", "202312/cat4", "202401/cat1", "202401/cat3", 
"202402/cat1", "202402/cat3", "202402/cat5"),
+  TS_FORMATTER_FUNC,
+  (ts: Integer, segment: String) => TS_FORMATTER_FUNC.apply(ts) + "/" 
+ segment)
+  ).foreach { testParams =>
+withTable(generateTableName) { tableName =>
+  LOG.warn("Testing with parameters: " + testParams)
+  val tableType = testParams(0).asInstanceOf[String]
+  val writePartitionFields = testParams(1).asInstanceOf[String]
+  val dropPartitionStatement = testParams(2).asInstanceOf[String]
+  val droppedPartition = testParams(3).asInstanceOf[String]
+  val expectedPartitions = testParams(4).asInstanceOf[Seq[String]]
+  val tsGenFunc = testParams(5).asInstanceOf[Integer => String]
+  val partitionGenFunc = testParams(6).asInstanceOf[(Integer, String) 
=> String]
+  val tablePath = tmp.getCanonicalPath + "/" + tableName
+  val timestampKeyGeneratorConfig = if 
(writePartitionFields.contains("timestamp")) {
+TS_KEY_GEN_CONFIGS
+  } else {
+Map[String, String]()
+  }
+  val timestampKeyGenProps = if (timestampKeyGeneratorConfig.nonEmpty) 
{
+", " + timestampKeyGeneratorConfig.map(e => e._1 + " = '" + e._2 + 
"'").mkString(", ")
+  } else {
+""
+  }
+
+  prepareTableWithKeyGenerator(
+tableName, tablePath, tableType,
+CUSTOM_KEY_GEN_CLASS_NAME, writePartitionFields, 
timestampKeyGeneratorConfig)
+
+  // SQL CTAS with table properties containing key generator write 
configs
+  createTableWithSql(tableName, tablePath,
+s"hoodie.datasource.write.partitionpath.field = 
'$writePartitionFields'" + timestampKeyGenProps)
+
+  // Prepare source and test SQL INSERT INTO
+  val sourceTableName = tableName + "_source"
+ 

Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-12 Thread via GitHub


yihua commented on code in PR #10615:
URL: https://github.com/apache/hudi/pull/10615#discussion_r1563198254


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##
@@ -530,6 +539,40 @@ object ProvidesHoodieConfig {
   filterNullValues(overridingOpts)
   }
 
+  /**
+   * @param tableConfigKeyGeneratorClassName key generator class name in 
the table config.
+   * @param partitionFieldNamesWithoutKeyGenType partition field names without 
key generator types
+   * from the table config.
+   * @param catalogTable HoodieCatalogTable instance 
to fetch table properties.
+   * @return the write config value to set for 
"hoodie.datasource.write.partitionpath.field".
+   */
+  def getPartitionPathFieldWriteConfig(tableConfigKeyGeneratorClassName: 
String,
+   partitionFieldNamesWithoutKeyGenType: 
String,
+   catalogTable: HoodieCatalogTable): 
String = {
+if (StringUtils.isNullOrEmpty(tableConfigKeyGeneratorClassName)) {
+  partitionFieldNamesWithoutKeyGenType
+} else {
+  val writeConfigPartitionField = 
catalogTable.catalogProperties.get(PARTITIONPATH_FIELD.key())

Review Comment:
   As an example, the table looks like this in Spark catalog:
   ```
   spark-sql (default)> DESCRIBE TABLE formatted h0;
   24/04/12 13:59:53 WARN ObjectStore: Failed to get database global_temp, 
returning NoSuchObjectException
   _hoodie_commit_time  string  
   _hoodie_commit_seqno string  
   _hoodie_record_key   string  
   _hoodie_partition_path   string  
   _hoodie_file_namestring  
   id   int 
   name string  
   pricedecimal(5,1)
   ts   int 
   segment  string  
   # Partition Information  
   # col_name   data_type   comment 
   ts   int 
   segment  string  

   # Detailed Table Information 
   Catalog  spark_catalog   
   Database default 
   Tableh0  
   Ownerethan   
   Created Time Fri Apr 12 13:58:05 PDT 2024
   Last Access  UNKNOWN 
   Created By   Spark 3.5.1 
   Type EXTERNAL
   Provider hudi
   Table Properties 
[hoodie.datasource.write.partitionpath.field=ts:timestamp,segment:simple, 
preCombineField=name, primaryKey=id, provider=hudi, type=cow]   
  
   Location 
file:/private/var/folders/60/wk8qzx310fd32b2dp7mhzvdcgn/T/spark-4ac6fb47-e20b-4679-a668-e28238ec3e05/h0
 
   Serde Library
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe 

   InputFormat  org.apache.hudi.hadoop.HoodieParquetInputFormat 

   OutputFormat 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat  

   Time taken: 1.694 seconds, Fetched 30 row(s)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-12 Thread via GitHub


yihua commented on code in PR #10615:
URL: https://github.com/apache/hudi/pull/10615#discussion_r1563196323


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##
@@ -530,6 +539,40 @@ object ProvidesHoodieConfig {
   filterNullValues(overridingOpts)
   }
 
+  /**
+   * @param tableConfigKeyGeneratorClassName key generator class name in 
the table config.
+   * @param partitionFieldNamesWithoutKeyGenType partition field names without 
key generator types
+   * from the table config.
+   * @param catalogTable HoodieCatalogTable instance 
to fetch table properties.
+   * @return the write config value to set for 
"hoodie.datasource.write.partitionpath.field".
+   */
+  def getPartitionPathFieldWriteConfig(tableConfigKeyGeneratorClassName: 
String,
+   partitionFieldNamesWithoutKeyGenType: 
String,
+   catalogTable: HoodieCatalogTable): 
String = {
+if (StringUtils.isNullOrEmpty(tableConfigKeyGeneratorClassName)) {
+  partitionFieldNamesWithoutKeyGenType
+} else {
+  val writeConfigPartitionField = 
catalogTable.catalogProperties.get(PARTITIONPATH_FIELD.key())

Review Comment:
   Yes, the table properties associated with `HoodieCatalogTable` are persisted 
across Spark sessions.  The persisted partition field write config 
`hoodie.datasource.write.partitionpath.field` is a custom config outside Spark, 
which is used by Hudi logic only.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-12 Thread via GitHub


yihua commented on code in PR #10615:
URL: https://github.com/apache/hudi/pull/10615#discussion_r1563151868


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##
@@ -528,6 +536,40 @@ object ProvidesHoodieConfig {
   filterNullValues(overridingOpts)
   }
 
+  /**
+   * @param tableConfigKeyGeneratorClassName key generator class name in 
the table config.
+   * @param partitionFieldNamesWithoutKeyGenType partition field names without 
key generator types
+   * from the table config.
+   * @param catalogTable HoodieCatalogTable instance 
to fetch table properties.
+   * @return the write config value to set for 
"hoodie.datasource.write.partitionpath.field".
+   */
+  def getPartitionPathFieldWriteConfig(tableConfigKeyGeneratorClassName: 
String,
+   partitionFieldNamesWithoutKeyGenType: 
String,
+   catalogTable: HoodieCatalogTable): 
String = {
+if (StringUtils.isNullOrEmpty(tableConfigKeyGeneratorClassName)) {
+  partitionFieldNamesWithoutKeyGenType
+} else {

Review Comment:
   Flink writer should provide the correct partition field write config.  The 
query side may have some gaps.
   
   Created [HUDI-7613](https://issues.apache.org/jira/browse/HUDI-7613) as a 
follow-up.



##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieWriterUtils.scala:
##
@@ -201,8 +201,26 @@ object HoodieWriterUtils {
   
diffConfigs.append(s"KeyGenerator:\t$datasourceKeyGen\t$tableConfigKeyGen\n")
 }
 
+// Please note that the validation of partition path fields needs the 
key generator class
+// for the table, since the custom key generator expects a different 
format of
+// the value of the write config 
"hoodie.datasource.write.partitionpath.field"
+// e.g., "col:simple,ts:timestamp", whereas the table config 
"hoodie.table.partition.fields"
+// in hoodie.properties stores "col,ts".
+// The "params" here may only contain the write config of partition 
path field,
+// so we need to pass in the validated key generator class name.
+val validatedKeyGenClassName = if (tableConfigKeyGen != null) {

Review Comment:
   Only the `hoodie.datasource.write.partitionpath.field` takes effect in the 
writer path.  Before the fix, the write config is automatically set by the SQL 
writer based on the value of table config `hoodie.table.partition.fields`.



##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##
@@ -528,6 +536,40 @@ object ProvidesHoodieConfig {
   filterNullValues(overridingOpts)
   }
 
+  /**
+   * @param tableConfigKeyGeneratorClassName key generator class name in 
the table config.
+   * @param partitionFieldNamesWithoutKeyGenType partition field names without 
key generator types
+   * from the table config.
+   * @param catalogTable HoodieCatalogTable instance 
to fetch table properties.
+   * @return the write config value to set for 
"hoodie.datasource.write.partitionpath.field".
+   */
+  def getPartitionPathFieldWriteConfig(tableConfigKeyGeneratorClassName: 
String,
+   partitionFieldNamesWithoutKeyGenType: 
String,
+   catalogTable: HoodieCatalogTable): 
String = {
+if (StringUtils.isNullOrEmpty(tableConfigKeyGeneratorClassName)) {
+  partitionFieldNamesWithoutKeyGenType
+} else {
+  val writeConfigPartitionField = 
catalogTable.catalogProperties.get(PARTITIONPATH_FIELD.key())
+  val keyGenClass = 
ReflectionUtils.getClass(tableConfigKeyGeneratorClassName)
+  if (classOf[CustomKeyGenerator].equals(keyGenClass)
+|| classOf[CustomAvroKeyGenerator].equals(keyGenClass)) {
+// For custom key generator, we have to take the write config value 
from
+// "hoodie.datasource.write.partitionpath.field" which contains the 
key generator
+// type, whereas the table config only contains the prtition field 
names without
+// key generator types.
+if (writeConfigPartitionField.isDefined) {
+  writeConfigPartitionField.get
+} else {
+  log.warn("Write config 
\"hoodie.datasource.write.partitionpath.field\" is not set for "
++ "custom key generator. This may fail the write operation.")
+  partitionFieldNamesWithoutKeyGenType

Review Comment:
   It fails with the error message `Unable to find field names for partition 
path in proper format` in the `CustomKeyGenerator` indicating that the config 
is not set properly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use 

Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-12 Thread via GitHub


yihua commented on PR #10615:
URL: https://github.com/apache/hudi/pull/10615#issuecomment-2052277245

   > I like that this has the benefit of not breaking tables with their 
existing hoodie.table.recordkey.fields, but I am curious about any other 
approaches you thought about. From you test code, it looks like we can't use 
`partitioned by (dt:int,idk:string)` when creating the table. I don't think 
that should block this pr from landing, but in the documentation for SQL: 
https://hudi.apache.org/docs/sql_ddl#create-partitioned-table I think we should 
add an example
   
   Good point.  I tried `partitioned by` statement but it did not work either, 
due to the same the write config of the partition fields.  But you're right 
that adding a new table config indicating the partition field types should 
solve the problem fundamentally.  We should update the SQL docs on any gaps 
here.
   
   > 
   > Also, I think think this change will help us to fix partition pruning 
which currently does not work with timestamp keygen: 
https://issues.apache.org/jira/browse/HUDI-6614
   
   Right.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-12 Thread via GitHub


jonvex commented on code in PR #10615:
URL: https://github.com/apache/hudi/pull/10615#discussion_r1562569055


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieWriterUtils.scala:
##
@@ -201,8 +201,26 @@ object HoodieWriterUtils {
   
diffConfigs.append(s"KeyGenerator:\t$datasourceKeyGen\t$tableConfigKeyGen\n")
 }
 
+// Please note that the validation of partition path fields needs the 
key generator class
+// for the table, since the custom key generator expects a different 
format of
+// the value of the write config 
"hoodie.datasource.write.partitionpath.field"
+// e.g., "col:simple,ts:timestamp", whereas the table config 
"hoodie.table.partition.fields"
+// in hoodie.properties stores "col,ts".
+// The "params" here may only contain the write config of partition 
path field,
+// so we need to pass in the validated key generator class name.
+val validatedKeyGenClassName = if (tableConfigKeyGen != null) {

Review Comment:
   So when `hoodie.datasource.write.partitionpath.field` is set, we don't set 
`hoodie.table.partition.fields` ?



##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##
@@ -530,6 +539,40 @@ object ProvidesHoodieConfig {
   filterNullValues(overridingOpts)
   }
 
+  /**
+   * @param tableConfigKeyGeneratorClassName key generator class name in 
the table config.
+   * @param partitionFieldNamesWithoutKeyGenType partition field names without 
key generator types
+   * from the table config.
+   * @param catalogTable HoodieCatalogTable instance 
to fetch table properties.
+   * @return the write config value to set for 
"hoodie.datasource.write.partitionpath.field".
+   */
+  def getPartitionPathFieldWriteConfig(tableConfigKeyGeneratorClassName: 
String,
+   partitionFieldNamesWithoutKeyGenType: 
String,
+   catalogTable: HoodieCatalogTable): 
String = {
+if (StringUtils.isNullOrEmpty(tableConfigKeyGeneratorClassName)) {
+  partitionFieldNamesWithoutKeyGenType
+} else {
+  val writeConfigPartitionField = 
catalogTable.catalogProperties.get(PARTITIONPATH_FIELD.key())
+  val keyGenClass = 
ReflectionUtils.getClass(tableConfigKeyGeneratorClassName)
+  if (classOf[CustomKeyGenerator].equals(keyGenClass)

Review Comment:
   Do we want to make this cover any classes that extend customkeygen as well?



##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##
@@ -528,6 +536,40 @@ object ProvidesHoodieConfig {
   filterNullValues(overridingOpts)
   }
 
+  /**
+   * @param tableConfigKeyGeneratorClassName key generator class name in 
the table config.
+   * @param partitionFieldNamesWithoutKeyGenType partition field names without 
key generator types
+   * from the table config.
+   * @param catalogTable HoodieCatalogTable instance 
to fetch table properties.
+   * @return the write config value to set for 
"hoodie.datasource.write.partitionpath.field".
+   */
+  def getPartitionPathFieldWriteConfig(tableConfigKeyGeneratorClassName: 
String,
+   partitionFieldNamesWithoutKeyGenType: 
String,
+   catalogTable: HoodieCatalogTable): 
String = {
+if (StringUtils.isNullOrEmpty(tableConfigKeyGeneratorClassName)) {
+  partitionFieldNamesWithoutKeyGenType
+} else {

Review Comment:
   So does this mean that it's still an issue for flink and hive etc?



##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestSparkSqlWithCustomKeyGenerator.scala:
##
@@ -0,0 +1,571 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.functional
+
+import org.apache.hudi.HoodieSparkUtils
+import org.apache.hudi.common.config.TypedProperties
+import 

Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-12 Thread via GitHub


hudi-bot commented on PR #10615:
URL: https://github.com/apache/hudi/pull/10615#issuecomment-2051092856

   
   ## CI report:
   
   * dfab8e1285bf0241eea2e71f9d85607c647446d7 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23212)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-11 Thread via GitHub


hudi-bot commented on PR #10615:
URL: https://github.com/apache/hudi/pull/10615#issuecomment-2050971324

   
   ## CI report:
   
   * 989ffd5220e4f5ae666a05afdd0e7de3c6543972 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23202)
 
   * dfab8e1285bf0241eea2e71f9d85607c647446d7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23212)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-11 Thread via GitHub


hudi-bot commented on PR #10615:
URL: https://github.com/apache/hudi/pull/10615#issuecomment-2050965097

   
   ## CI report:
   
   * 989ffd5220e4f5ae666a05afdd0e7de3c6543972 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23202)
 
   * dfab8e1285bf0241eea2e71f9d85607c647446d7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-11 Thread via GitHub


hudi-bot commented on PR #10615:
URL: https://github.com/apache/hudi/pull/10615#issuecomment-2050776752

   
   ## CI report:
   
   * 989ffd5220e4f5ae666a05afdd0e7de3c6543972 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23202)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-11 Thread via GitHub


hudi-bot commented on PR #10615:
URL: https://github.com/apache/hudi/pull/10615#issuecomment-2050723314

   
   ## CI report:
   
   * 50b27846bf118909f3fd69f20cf5d7654d8a87c7 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23189)
 
   * 989ffd5220e4f5ae666a05afdd0e7de3c6543972 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23202)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-11 Thread via GitHub


hudi-bot commented on PR #10615:
URL: https://github.com/apache/hudi/pull/10615#issuecomment-2050717174

   
   ## CI report:
   
   * 50b27846bf118909f3fd69f20cf5d7654d8a87c7 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23189)
 
   * 989ffd5220e4f5ae666a05afdd0e7de3c6543972 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-11 Thread via GitHub


codope commented on code in PR #10615:
URL: https://github.com/apache/hudi/pull/10615#discussion_r1561326209


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##
@@ -528,6 +536,40 @@ object ProvidesHoodieConfig {
   filterNullValues(overridingOpts)
   }
 
+  /**
+   * @param tableConfigKeyGeneratorClassName key generator class name in 
the table config.
+   * @param partitionFieldNamesWithoutKeyGenType partition field names without 
key generator types
+   * from the table config.
+   * @param catalogTable HoodieCatalogTable instance 
to fetch table properties.
+   * @return the write config value to set for 
"hoodie.datasource.write.partitionpath.field".
+   */
+  def getPartitionPathFieldWriteConfig(tableConfigKeyGeneratorClassName: 
String,
+   partitionFieldNamesWithoutKeyGenType: 
String,
+   catalogTable: HoodieCatalogTable): 
String = {
+if (StringUtils.isNullOrEmpty(tableConfigKeyGeneratorClassName)) {
+  partitionFieldNamesWithoutKeyGenType
+} else {

Review Comment:
   Got it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-11 Thread via GitHub


yihua commented on code in PR #10615:
URL: https://github.com/apache/hudi/pull/10615#discussion_r1561319630


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##
@@ -528,6 +536,40 @@ object ProvidesHoodieConfig {
   filterNullValues(overridingOpts)
   }
 
+  /**
+   * @param tableConfigKeyGeneratorClassName key generator class name in 
the table config.
+   * @param partitionFieldNamesWithoutKeyGenType partition field names without 
key generator types
+   * from the table config.
+   * @param catalogTable HoodieCatalogTable instance 
to fetch table properties.
+   * @return the write config value to set for 
"hoodie.datasource.write.partitionpath.field".
+   */
+  def getPartitionPathFieldWriteConfig(tableConfigKeyGeneratorClassName: 
String,
+   partitionFieldNamesWithoutKeyGenType: 
String,
+   catalogTable: HoodieCatalogTable): 
String = {
+if (StringUtils.isNullOrEmpty(tableConfigKeyGeneratorClassName)) {
+  partitionFieldNamesWithoutKeyGenType
+} else {

Review Comment:
   For the custom key generator, we have to take it from the properties stored 
in the Spark catalog table.  `partitionFieldNamesWithoutKeyGenType` is derived 
from the existing table configs, which can be wrong.  Also, in some code path, 
`tableConfigKeyGeneratorClassName` is not passed in.



##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##
@@ -528,6 +536,40 @@ object ProvidesHoodieConfig {
   filterNullValues(overridingOpts)
   }
 
+  /**
+   * @param tableConfigKeyGeneratorClassName key generator class name in 
the table config.
+   * @param partitionFieldNamesWithoutKeyGenType partition field names without 
key generator types
+   * from the table config.
+   * @param catalogTable HoodieCatalogTable instance 
to fetch table properties.
+   * @return the write config value to set for 
"hoodie.datasource.write.partitionpath.field".
+   */
+  def getPartitionPathFieldWriteConfig(tableConfigKeyGeneratorClassName: 
String,
+   partitionFieldNamesWithoutKeyGenType: 
String,
+   catalogTable: HoodieCatalogTable): 
String = {
+if (StringUtils.isNullOrEmpty(tableConfigKeyGeneratorClassName)) {
+  partitionFieldNamesWithoutKeyGenType
+} else {
+  val writeConfigPartitionField = 
catalogTable.catalogProperties.get(PARTITIONPATH_FIELD.key())
+  val keyGenClass = 
ReflectionUtils.getClass(tableConfigKeyGeneratorClassName)
+  if (classOf[CustomKeyGenerator].equals(keyGenClass)
+|| classOf[CustomAvroKeyGenerator].equals(keyGenClass)) {
+// For custom key generator, we have to take the write config value 
from
+// "hoodie.datasource.write.partitionpath.field" which contains the 
key generator
+// type, whereas the table config only contains the prtition field 
names without
+// key generator types.
+if (writeConfigPartitionField.isDefined) {
+  writeConfigPartitionField.get
+} else {
+  log.warn("Write config 
\"hoodie.datasource.write.partitionpath.field\" is not set for "
++ "custom key generator. This may fail the write operation.")
+  partitionFieldNamesWithoutKeyGenType

Review Comment:
   The write fails in the overall validation method.  There is no need to fail 
in this util method again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-11 Thread via GitHub


yihua commented on PR #10615:
URL: https://github.com/apache/hudi/pull/10615#issuecomment-2050092146

   > 1. is there any change to partitions in `hoodie.proerties`? Do we now 
write it as `field1:type,field2:type2` when using CustomKeyGenerator?
   
   There is no change to the table configs in `hoodie.properties`, i.e., the 
`hoodie.table.partition.fields` contains the comma-separated list of partition 
field names like `"segment,ts"` (no type for custom key generator).  This PR 
opens the opportunity to override the 
`hoodie.datasource.write.partitionpath.field` with `SET TBLPROPERTIES` at the 
table level in the Spark catalog, so that SQL DML can derive the correct write 
config of the partition fields (e.g., `"segment:simple,ts:timestamp"` instead 
of `"segment,ts"`).
   
   > 2. Thanks for adding extensive tests. Can you please look into the 
failures? They seem related to the patch.
   
   Failures for Spark 3.2 and above are fixed.  I'm looking into failures for 
older Spark versions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-11 Thread via GitHub


hudi-bot commented on PR #10615:
URL: https://github.com/apache/hudi/pull/10615#issuecomment-2049197186

   
   ## CI report:
   
   * 50b27846bf118909f3fd69f20cf5d7654d8a87c7 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23189)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-11 Thread via GitHub


hudi-bot commented on PR #10615:
URL: https://github.com/apache/hudi/pull/10615#issuecomment-2049032647

   
   ## CI report:
   
   * c376900f104a979535fe7b4b9bb7e9a2d236a2b9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23182)
 
   * 50b27846bf118909f3fd69f20cf5d7654d8a87c7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23189)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-11 Thread via GitHub


hudi-bot commented on PR #10615:
URL: https://github.com/apache/hudi/pull/10615#issuecomment-2049023818

   
   ## CI report:
   
   * c376900f104a979535fe7b4b9bb7e9a2d236a2b9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23182)
 
   * 50b27846bf118909f3fd69f20cf5d7654d8a87c7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-10 Thread via GitHub


hudi-bot commented on PR #10615:
URL: https://github.com/apache/hudi/pull/10615#issuecomment-2048869318

   
   ## CI report:
   
   * c376900f104a979535fe7b4b9bb7e9a2d236a2b9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23182)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-10 Thread via GitHub


hudi-bot commented on PR #10615:
URL: https://github.com/apache/hudi/pull/10615#issuecomment-2048834962

   
   ## CI report:
   
   * 185d0fc1b26344563514603f9f5e600972feaaac Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23050)
 
   * c376900f104a979535fe7b4b9bb7e9a2d236a2b9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23182)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-10 Thread via GitHub


hudi-bot commented on PR #10615:
URL: https://github.com/apache/hudi/pull/10615#issuecomment-2048829426

   
   ## CI report:
   
   * 185d0fc1b26344563514603f9f5e600972feaaac Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23050)
 
   * c376900f104a979535fe7b4b9bb7e9a2d236a2b9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-04 Thread via GitHub


codope commented on code in PR #10615:
URL: https://github.com/apache/hudi/pull/10615#discussion_r1551075779


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##
@@ -528,6 +536,40 @@ object ProvidesHoodieConfig {
   filterNullValues(overridingOpts)
   }
 
+  /**
+   * @param tableConfigKeyGeneratorClassName key generator class name in 
the table config.
+   * @param partitionFieldNamesWithoutKeyGenType partition field names without 
key generator types
+   * from the table config.
+   * @param catalogTable HoodieCatalogTable instance 
to fetch table properties.
+   * @return the write config value to set for 
"hoodie.datasource.write.partitionpath.field".
+   */
+  def getPartitionPathFieldWriteConfig(tableConfigKeyGeneratorClassName: 
String,
+   partitionFieldNamesWithoutKeyGenType: 
String,
+   catalogTable: HoodieCatalogTable): 
String = {
+if (StringUtils.isNullOrEmpty(tableConfigKeyGeneratorClassName)) {
+  partitionFieldNamesWithoutKeyGenType
+} else {

Review Comment:
   Should we instead directly infer from the passed string 
`tableConfigKeyGeneratorClassName`? I mean if the string has no `:` then return 
`partitionFieldNamesWithoutKeyGenType`. I am not following why 
`tableConfigKeyGeneratorClassName` being null or empty means partition field 
names are without keygen type. Suppose, in a future release we drop the keygen 
config from table properties, then will this hold true?



##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##
@@ -528,6 +536,40 @@ object ProvidesHoodieConfig {
   filterNullValues(overridingOpts)
   }
 
+  /**
+   * @param tableConfigKeyGeneratorClassName key generator class name in 
the table config.
+   * @param partitionFieldNamesWithoutKeyGenType partition field names without 
key generator types
+   * from the table config.
+   * @param catalogTable HoodieCatalogTable instance 
to fetch table properties.
+   * @return the write config value to set for 
"hoodie.datasource.write.partitionpath.field".
+   */
+  def getPartitionPathFieldWriteConfig(tableConfigKeyGeneratorClassName: 
String,
+   partitionFieldNamesWithoutKeyGenType: 
String,
+   catalogTable: HoodieCatalogTable): 
String = {
+if (StringUtils.isNullOrEmpty(tableConfigKeyGeneratorClassName)) {
+  partitionFieldNamesWithoutKeyGenType
+} else {
+  val writeConfigPartitionField = 
catalogTable.catalogProperties.get(PARTITIONPATH_FIELD.key())
+  val keyGenClass = 
ReflectionUtils.getClass(tableConfigKeyGeneratorClassName)
+  if (classOf[CustomKeyGenerator].equals(keyGenClass)
+|| classOf[CustomAvroKeyGenerator].equals(keyGenClass)) {
+// For custom key generator, we have to take the write config value 
from
+// "hoodie.datasource.write.partitionpath.field" which contains the 
key generator
+// type, whereas the table config only contains the prtition field 
names without
+// key generator types.
+if (writeConfigPartitionField.isDefined) {
+  writeConfigPartitionField.get
+} else {
+  log.warn("Write config 
\"hoodie.datasource.write.partitionpath.field\" is not set for "
++ "custom key generator. This may fail the write operation.")
+  partitionFieldNamesWithoutKeyGenType

Review Comment:
   Should we then fail early if write is going to fail? Maybe, make it like a 
validation?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-03-28 Thread via GitHub


hudi-bot commented on PR #10615:
URL: https://github.com/apache/hudi/pull/10615#issuecomment-2024589288

   
   ## CI report:
   
   * 185d0fc1b26344563514603f9f5e600972feaaac Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23050)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-03-28 Thread via GitHub


hudi-bot commented on PR #10615:
URL: https://github.com/apache/hudi/pull/10615#issuecomment-2024516329

   
   ## CI report:
   
   * afc107a681bb6df8e1b856239a811ccac6b3b3db Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22314)
 
   * 185d0fc1b26344563514603f9f5e600972feaaac Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23050)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-03-28 Thread via GitHub


hudi-bot commented on PR #10615:
URL: https://github.com/apache/hudi/pull/10615#issuecomment-2024506879

   
   ## CI report:
   
   * afc107a681bb6df8e1b856239a811ccac6b3b3db Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22314)
 
   * 185d0fc1b26344563514603f9f5e600972feaaac UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-02-03 Thread via GitHub


hudi-bot commented on PR #10615:
URL: https://github.com/apache/hudi/pull/10615#issuecomment-1925553485

   
   ## CI report:
   
   * afc107a681bb6df8e1b856239a811ccac6b3b3db Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22314)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-02-03 Thread via GitHub


hudi-bot commented on PR #10615:
URL: https://github.com/apache/hudi/pull/10615#issuecomment-1925509035

   
   ## CI report:
   
   * afc107a681bb6df8e1b856239a811ccac6b3b3db Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22314)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-02-03 Thread via GitHub


hudi-bot commented on PR #10615:
URL: https://github.com/apache/hudi/pull/10615#issuecomment-1925507573

   
   ## CI report:
   
   * afc107a681bb6df8e1b856239a811ccac6b3b3db UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-02-03 Thread via GitHub


yihua opened a new pull request, #10615:
URL: https://github.com/apache/hudi/pull/10615

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org