Re: [I] Don't understand the result [hudi]

via GitHub Mon, 29 Dec 2025 23:07:21 -0800


deepakpanda93 commented on issue #17734:
URL: https://github.com/apache/hudi/issues/17734#issuecomment-3698483887


   Hello @bithw1 , Tried the same with dataframe approach in spark-shell and it 
worked !!
   
   ```
   scala> import org.apache.spark.sql.functions._
   import org.apache.spark.sql.functions._
   
   scala> val df = Seq(
        |   (1, 2, 3),
        |   (1, 4, 7),
        |   (1, 3, 6)
        | ).toDF("a", "b", "c")
   25/12/30 06:44:31 WARN MetricsConfig: Cannot locate configuration: tried 
hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
   df: org.apache.spark.sql.DataFrame = [a: int, b: int ... 1 more field]
   
   scala> df.write
     .format("hudi")
     .option("hoodie.table.name", "hudi_cow_20251229_07")
     .option("hoodie.datasource.write.table.type", "COPY_ON_WRITE")
     .option("hoodie.datasource.write.recordkey.field", "a")
     .option("hoodie.datasource.write.precombine.field", "c")
     .option("hoodie.datasource.write.operation", "insert")
     .option("hoodie.datasource.write.insert.drop.duplicates", "false")
     .option("hoodie.datasource.write.insert.dup.policy", "none")
     .option("hoodie.datasource.write.keygenerator.class", 
"org.apache.hudi.keygen.NonpartitionedKeyGenerator")
     .mode("append")
     .save("file:///tmp/hudi_cow_20251229_07")
   
   25/12/30 06:46:15 WARN MetricsConfig: Cannot locate configuration: tried 
hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
   
   scala> val basePath = "file:///tmp/hudi_cow_20251229_07"
   basePath: String = file:///tmp/hudi_cow_20251229_07
   
   scala> spark.sql(s"""CREATE TABLE IF NOT EXISTS hudi_cow_20251229_07 USING 
hudi LOCATION '$basePath'""".stripMargin)
   25/12/30 06:49:34 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
since hive.security.authorization.manager is set to instance of 
HiveAuthorizerFactory.
   res19: org.apache.spark.sql.DataFrame = []
   
   scala> spark.sql("select * from hudi_cow_20251229_07").show()
   
+-------------------+--------------------+------------------+----------------------+--------------------+---+---+---+
   
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|
   _hoodie_file_name|  a|  b|  c|
   
+-------------------+--------------------+------------------+----------------------+--------------------+---+---+---+
   |  20251230064613766|20251230064613766...|                 1|                
      |aa196a68-02a8-4f0...|  1|  2|  3|
   |  20251230064613766|20251230064613766...|                 1|                
      |aa196a68-02a8-4f0...|  1|  4|  7|
   |  20251230064613766|20251230064613766...|                 1|                
      |aa196a68-02a8-4f0...|  1|  3|  6|
   
+-------------------+--------------------+------------------+----------------------+--------------------+---+---+---+
   ```
   
   With spark-sql `hoodie.datasource.write.insert.dup.policy` is used which is 
`none` by default so only no action is being taken in this case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Don't understand the result [hudi]

Reply via email to