[I] Don't understand the result [hudi]

via GitHub Sun, 28 Dec 2025 23:51:44 -0800


bithw1 opened a new issue, #17734:
URL: https://github.com/apache/hudi/issues/17734


   ### Describe the problem you faced
   
   In spark sql, I run following simple query,
   
   When I run `select * from  hudi_cow_20251229_07`, the result is as follows, 
I wonder why 1,2,3 and 1,3,6 are gone(I am using insert, no duplicates should 
be dropped)
   
   ```
   park-sql> select * from hudi_cow_20251229_07;
   _hoodie_commit_time     _hoodie_commit_seqno    _hoodie_record_key      
_hoodie_partition_path  _hoodie_file_name       a       b       c
   20251229154740370       20251229154740370_0_0   1  
   ```
   
   
   ```
   set hoodie.spark.sql.insert.into.operation=insert;
   set hoodie.datasource.write.insert.drop.duplicates=false;
   set hoodie.datasource.write.insert.dup.policy=none;
   
   CREATE TABLE IF NOT EXISTS hudi_cow_20251229_07 (
     a INT,
     b INT,
     c INT
   ) 
   
   USING hudi
   
   tblproperties(
   type='cow',
   primaryKey='a',
   hoodie.datasource.write.precombine.field='c'
   );
   
   insert into  hudi_cow_20251229_07(a,b,c) values(1,2,3),(1,4,7),(1,3,6);
   ```
   
   ### To Reproduce
   
   1.
   2.
   3.
   4.
   
   
   ### Expected behavior
   
   1
   
   ### Environment Description
   
   * Hudi version:1
   * Spark version:
   * Flink version:
   * Hive version:
   * Hadoop version:
   * Storage (HDFS/S3/GCS..):
   * Running on Docker? (yes/no):
   
   
   ### Additional context
   
   1
   
   ### Stacktrace
   
   ```shell
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Don't understand the result [hudi]

Reply via email to