deepakpanda93 commented on issue #17734:
URL: https://github.com/apache/hudi/issues/17734#issuecomment-3698483887
Hello @bithw1 , Tried the same with dataframe approach in spark-shell and it
worked !!
```
scala> import org.apache.spark.sql.functions._
import org.apache.spark.sql.functions._
scala> val df = Seq(
| (1, 2, 3),
| (1, 4, 7),
| (1, 3, 6)
| ).toDF("a", "b", "c")
25/12/30 06:44:31 WARN MetricsConfig: Cannot locate configuration: tried
hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
df: org.apache.spark.sql.DataFrame = [a: int, b: int ... 1 more field]
scala> df.write
.format("hudi")
.option("hoodie.table.name", "hudi_cow_20251229_07")
.option("hoodie.datasource.write.table.type", "COPY_ON_WRITE")
.option("hoodie.datasource.write.recordkey.field", "a")
.option("hoodie.datasource.write.precombine.field", "c")
.option("hoodie.datasource.write.operation", "insert")
.option("hoodie.datasource.write.insert.drop.duplicates", "false")
.option("hoodie.datasource.write.insert.dup.policy", "none")
.option("hoodie.datasource.write.keygenerator.class",
"org.apache.hudi.keygen.NonpartitionedKeyGenerator")
.mode("append")
.save("file:///tmp/hudi_cow_20251229_07")
25/12/30 06:46:15 WARN MetricsConfig: Cannot locate configuration: tried
hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
scala> val basePath = "file:///tmp/hudi_cow_20251229_07"
basePath: String = file:///tmp/hudi_cow_20251229_07
scala> spark.sql(s"""CREATE TABLE IF NOT EXISTS hudi_cow_20251229_07 USING
hudi LOCATION '$basePath'""".stripMargin)
25/12/30 06:49:34 WARN SessionState: METASTORE_FILTER_HOOK will be ignored,
since hive.security.authorization.manager is set to instance of
HiveAuthorizerFactory.
res19: org.apache.spark.sql.DataFrame = []
scala> spark.sql("select * from hudi_cow_20251229_07").show()
+-------------------+--------------------+------------------+----------------------+--------------------+---+---+---+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|
_hoodie_file_name| a| b| c|
+-------------------+--------------------+------------------+----------------------+--------------------+---+---+---+
| 20251230064613766|20251230064613766...| 1|
|aa196a68-02a8-4f0...| 1| 2| 3|
| 20251230064613766|20251230064613766...| 1|
|aa196a68-02a8-4f0...| 1| 4| 7|
| 20251230064613766|20251230064613766...| 1|
|aa196a68-02a8-4f0...| 1| 3| 6|
+-------------------+--------------------+------------------+----------------------+--------------------+---+---+---+
```
With spark-sql `hoodie.datasource.write.insert.dup.policy` is used which is
`none` by default so only no action is being taken in this case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]