watermelon12138 commented on PR #10143: URL: https://github.com/apache/hudi/pull/10143#issuecomment-1837074125
> @danny0405 Hello, Danny I would like to ask that why data with the same primary key is written to different log files (with the same FileId and different timestamps) in upsert mode? As a result, I cannot write ut to test the LogIndex capability. My test code is as follows: > > ` public void testHoodiePipelineBuilderSource() throws Exception { //create a StreamExecutionEnvironment instance. StreamExecutionEnvironment execEnv = StreamExecutionEnvironment.getExecutionEnvironment(); execEnv.getConfig().disableObjectReuse(); execEnv.setParallelism(1); // set up checkpoint interval execEnv.enableCheckpointing(4000, CheckpointingMode.EXACTLY_ONCE); execEnv.getCheckpointConfig().setMaxConcurrentCheckpoints(1); Configuration conf = TestConfigurations.getDefaultConf(tempFile.toURI().toString()); conf.setString(FlinkOptions.TABLE_NAME, "t1"); conf.setString(FlinkOptions.TABLE_TYPE, "MERGE_ON_READ"); conf.setString(FlinkOptions.INDEX_TYPE, "BUCKET"); conf.setInteger(FlinkOptions.BUCKET_INDEX_NUM_BUCKETS, 1); conf.setBoolean(FlinkOptions.LOG_INDEX_ENABLE, true); conf.setString(FlinkOptions.PRECOMBINE_FIELD, "ts"); conf.setString(FlinkOptions.RECORD_KEY_FIELD, "uuid"); conf.setBoolean(FlinkOptions.PRE_COMBINE, true); conf.setString(FlinkOptions.OPERATION, "upsert") ; > > ``` > // write 3 batches of data set > TestData.writeData(TestData.dataSetInsert(1), conf); > TestData.writeData(TestData.dataSetInsert(1), conf);` > ``` @ad1happy2go Hi great man ! Can you help me to resolve this? Thank you very mach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org