xiarixiaoyao commented on code in PR #5830: URL: https://github.com/apache/hudi/pull/5830#discussion_r1022504653
########## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseMergeHelper.java: ########## @@ -130,4 +145,48 @@ protected Void getResult() { return null; } } + + protected Iterator<GenericRecord> getRecordIterator( + HoodieTable<T, ?, ?, ?> table, + HoodieMergeHandle<T, ?, ?, ?> mergeHandle, + HoodieBaseFile baseFile, + HoodieFileReader<GenericRecord> reader, + Schema readSchema) throws IOException { + Option<InternalSchema> querySchemaOpt = SerDeHelper.fromJson(table.getConfig().getInternalSchema()); + if (!querySchemaOpt.isPresent()) { + querySchemaOpt = new TableSchemaResolver(table.getMetaClient()).getTableInternalSchemaFromCommitMetadata(); + } + boolean needToReWriteRecord = false; + Map<String, String> renameCols = new HashMap<>(); + // TODO support bootstrap + if (querySchemaOpt.isPresent() && !baseFile.getBootstrapBaseFile().isPresent()) { Review Comment: @trushev can we avoid moved this code snippet, i donnot think flink evolution need to modify those codes. https://github.com/apache/hudi/pull/6358 and https://github.com/apache/hudi/pull/7183 will optimize this code @danny0405 we need check evolution for each base file. Once we have made multiple columns changes, different base files may have different schemas, and we cannot use the schema of the current table to read these files directly, an exception will be thrown directly tableA: a int, b string, c double and there exist three files in this table: f1, f2, f3 drop column from tableA and add new column d, and then we update tableA, but we only update f2,and f3, f1 is not touched now schema ``` schema1 from tableA: a int, b string, d long. schema2 from f2,f3: a int, b string, d long schema3 from f1 is: a int, b string , c double ``` we should not use schema1 to read f1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org