watermelon12138 commented on PR #10143:
URL: https://github.com/apache/hudi/pull/10143#issuecomment-1837074125

   > @danny0405 Hello, Danny I would like to ask that why data with the same 
primary key is written to different log files (with the same FileId and 
different timestamps) in upsert mode? As a result, I cannot write ut to test 
the LogIndex capability. My test code is as follows:
   > 
   > ` public void testHoodiePipelineBuilderSource() throws Exception { 
//create a StreamExecutionEnvironment instance. StreamExecutionEnvironment 
execEnv = StreamExecutionEnvironment.getExecutionEnvironment(); 
execEnv.getConfig().disableObjectReuse(); execEnv.setParallelism(1); // set up 
checkpoint interval execEnv.enableCheckpointing(4000, 
CheckpointingMode.EXACTLY_ONCE); 
execEnv.getCheckpointConfig().setMaxConcurrentCheckpoints(1); Configuration 
conf = TestConfigurations.getDefaultConf(tempFile.toURI().toString()); 
conf.setString(FlinkOptions.TABLE_NAME, "t1"); 
conf.setString(FlinkOptions.TABLE_TYPE, "MERGE_ON_READ"); 
conf.setString(FlinkOptions.INDEX_TYPE, "BUCKET"); 
conf.setInteger(FlinkOptions.BUCKET_INDEX_NUM_BUCKETS, 1); 
conf.setBoolean(FlinkOptions.LOG_INDEX_ENABLE, true); 
conf.setString(FlinkOptions.PRECOMBINE_FIELD, "ts"); 
conf.setString(FlinkOptions.RECORD_KEY_FIELD, "uuid"); 
conf.setBoolean(FlinkOptions.PRE_COMBINE, true); 
conf.setString(FlinkOptions.OPERATION, "upsert")
 ;
   > 
   > ```
   > // write 3 batches of data set
   > TestData.writeData(TestData.dataSetInsert(1), conf);
   > TestData.writeData(TestData.dataSetInsert(1), conf);`
   > ```
   
   @ad1happy2go 
   Hi great man !
   Can you help me to resolve this? Thank you very mach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to