[ https://issues.apache.org/jira/browse/HIVE-25690?focusedWorklogId=681396&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-681396 ]
ASF GitHub Bot logged work on HIVE-25690: ----------------------------------------- Author: ASF GitHub Bot Created on: 15/Nov/21 10:09 Start Date: 15/Nov/21 10:09 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2779: URL: https://github.com/apache/hive/pull/2779#discussion_r749174577 ########## File path: iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveSchemaUtil.java ########## @@ -178,28 +179,57 @@ public static SchemaDifference getSchemaDiff(Collection<FieldSchema> minuendColl } /** - * Compares a list of columns to another list, by name, to find an out of order column. - * It iterates through updated one by one, and compares the name of the column to the name of the column in the old - * list, in the same position. It returns the first mismatch it finds in updated, if any. + * Compares two lists of columns to each other, by name and index, to find the column that was moved by the + * schema evolution update (i.e. a column which was either moved to the first position, or moved after some specified + * column). Review comment: That's a good point. Will do ########## File path: iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveSchemaUtil.java ########## @@ -178,28 +179,57 @@ public static SchemaDifference getSchemaDiff(Collection<FieldSchema> minuendColl } /** - * Compares a list of columns to another list, by name, to find an out of order column. - * It iterates through updated one by one, and compares the name of the column to the name of the column in the old - * list, in the same position. It returns the first mismatch it finds in updated, if any. + * Compares two lists of columns to each other, by name and index, to find the column that was moved by the + * schema evolution update (i.e. a column which was either moved to the first position, or moved after some specified + * column). * - * @param updated The list of the columns after some updates have taken place + * @param updated The list of the columns after some updates have taken place (if any) * @param old The list of the original columns * @param renameMapping A map of name aliases for the updated columns (e.g. if a column rename occurred) - * @return A pair consisting of the first out of order column name, and its preceding column name (if any). + * @return A pair consisting of the reordered column's name, and its preceding column's name (if any). * Returns a null in case there are no out of order columns. */ - public static Pair<String, Optional<String>> getFirstOutOfOrderColumn(List<FieldSchema> updated, + public static Pair<String, Optional<String>> getReorderedColumn(List<FieldSchema> updated, List<FieldSchema> old, Map<String, String> renameMapping) { - for (int i = 0; i < updated.size() && i < old.size(); ++i) { + // first collect the updated index for each column + Map<String, Integer> nameToNewIndex = Maps.newHashMap(); + for (int i = 0; i < updated.size(); ++i) { String updatedCol = renameMapping.getOrDefault(updated.get(i).getName(), updated.get(i).getName()); - String oldCol = old.get(i).getName(); - if (!oldCol.equals(updatedCol)) { - Optional<String> previousCol = i > 0 ? Optional.of(updated.get(i - 1).getName()) : Optional.empty(); - return Pair.of(updatedCol, previousCol); + nameToNewIndex.put(updatedCol, i); + } + + // find the column which has the highest index difference between its position in the old vs the updated list + String reorderedColName = null; + int maxIndexDiff = 0; + for (int oldIndex = 0; oldIndex < old.size(); ++oldIndex) { + String oldName = old.get(oldIndex).getName(); + Integer newIndex = nameToNewIndex.get(oldName); + if (newIndex != null) { + if (maxIndexDiff < Math.abs(newIndex - oldIndex)) { Review comment: Sure, will extract -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 681396) Time Spent: 40m (was: 0.5h) > Fix column reorder detection for Iceberg schema evolution > --------------------------------------------------------- > > Key: HIVE-25690 > URL: https://issues.apache.org/jira/browse/HIVE-25690 > Project: Hive > Issue Type: Bug > Reporter: Marton Bod > Assignee: Marton Bod > Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Current algorithm for detecting schema differences between HMS and Iceberg > schema is broken when it comes to column reorders. This patch should fix that > up and add more extensive testing. -- This message was sent by Atlassian Jira (v8.20.1#820001)