[ 
https://issues.apache.org/jira/browse/HIVE-25690?focusedWorklogId=681396&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-681396
 ]

ASF GitHub Bot logged work on HIVE-25690:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 15/Nov/21 10:09
            Start Date: 15/Nov/21 10:09
    Worklog Time Spent: 10m 
      Work Description: marton-bod commented on a change in pull request #2779:
URL: https://github.com/apache/hive/pull/2779#discussion_r749174577



##########
File path: 
iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveSchemaUtil.java
##########
@@ -178,28 +179,57 @@ public static SchemaDifference 
getSchemaDiff(Collection<FieldSchema> minuendColl
   }
 
   /**
-   * Compares a list of columns to another list, by name, to find an out of 
order column.
-   * It iterates through updated one by one, and compares the name of the 
column to the name of the column in the old
-   * list, in the same position. It returns the first mismatch it finds in 
updated, if any.
+   * Compares two lists of columns to each other, by name and index, to find 
the column that was moved by the
+   * schema evolution update (i.e. a column which was either moved to the 
first position, or moved after some specified
+   * column).

Review comment:
       That's a good point. Will do

##########
File path: 
iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveSchemaUtil.java
##########
@@ -178,28 +179,57 @@ public static SchemaDifference 
getSchemaDiff(Collection<FieldSchema> minuendColl
   }
 
   /**
-   * Compares a list of columns to another list, by name, to find an out of 
order column.
-   * It iterates through updated one by one, and compares the name of the 
column to the name of the column in the old
-   * list, in the same position. It returns the first mismatch it finds in 
updated, if any.
+   * Compares two lists of columns to each other, by name and index, to find 
the column that was moved by the
+   * schema evolution update (i.e. a column which was either moved to the 
first position, or moved after some specified
+   * column).
    *
-   * @param updated The list of the columns after some updates have taken place
+   * @param updated The list of the columns after some updates have taken 
place (if any)
    * @param old The list of the original columns
    * @param renameMapping A map of name aliases for the updated columns (e.g. 
if a column rename occurred)
-   * @return A pair consisting of the first out of order column name, and its 
preceding column name (if any).
+   * @return A pair consisting of the reordered column's name, and its 
preceding column's name (if any).
    *         Returns a null in case there are no out of order columns.
    */
-  public static Pair<String, Optional<String>> 
getFirstOutOfOrderColumn(List<FieldSchema> updated,
+  public static Pair<String, Optional<String>> 
getReorderedColumn(List<FieldSchema> updated,
                                                                         
List<FieldSchema> old,
                                                                         
Map<String, String> renameMapping) {
-    for (int i = 0; i < updated.size() && i < old.size(); ++i) {
+    // first collect the updated index for each column
+    Map<String, Integer> nameToNewIndex = Maps.newHashMap();
+    for (int i = 0; i < updated.size(); ++i) {
       String updatedCol = renameMapping.getOrDefault(updated.get(i).getName(), 
updated.get(i).getName());
-      String oldCol = old.get(i).getName();
-      if (!oldCol.equals(updatedCol)) {
-        Optional<String> previousCol = i > 0 ? Optional.of(updated.get(i - 
1).getName()) : Optional.empty();
-        return Pair.of(updatedCol, previousCol);
+      nameToNewIndex.put(updatedCol, i);
+    }
+
+    // find the column which has the highest index difference between its 
position in the old vs the updated list
+    String reorderedColName = null;
+    int maxIndexDiff = 0;
+    for (int oldIndex = 0; oldIndex < old.size(); ++oldIndex) {
+      String oldName = old.get(oldIndex).getName();
+      Integer newIndex = nameToNewIndex.get(oldName);
+      if (newIndex != null) {
+        if (maxIndexDiff < Math.abs(newIndex - oldIndex)) {

Review comment:
       Sure, will extract




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 681396)
    Time Spent: 40m  (was: 0.5h)

> Fix column reorder detection for Iceberg schema evolution
> ---------------------------------------------------------
>
>                 Key: HIVE-25690
>                 URL: https://issues.apache.org/jira/browse/HIVE-25690
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Marton Bod
>            Assignee: Marton Bod
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Current algorithm for detecting schema differences between HMS and Iceberg 
> schema is broken when it comes to column reorders. This patch should fix that 
> up and add more extensive testing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to