-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65716/#review198753
-----------------------------------------------------------




standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
Line 2887 (original), 2889 (patched)
<https://reviews.apache.org/r/65716/#comment278975>

    Here you can specify the length:
    
    `List<Partition> partitionsToAdd = new ArrayList<>(parts.size());
    List<PartValEqWrapper> partValWrappers = new ArrayList<>(parts.size());
    `
    
    But also why do we needd this list at all? We can just use partValWrappers 
as a collection of partitions we care about.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
Line 2888 (original), 2890 (patched)
<https://reviews.apache.org/r/65716/#comment278976>

    You use this to check for duplicates and list is pretty bad structure for 
this - please use set instead.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
Line 2891 (original), 2897 (patched)
<https://reviews.apache.org/r/65716/#comment278972>

    In cases like this it is also quite useful to know actual table and dbname 
that was supplied - it could help to figure out what wrong partition ended up 
here.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
Line 2898 (original), 2904 (patched)
<https://reviews.apache.org/r/65716/#comment278974>

    Can you fix this as well to use
    
    `LOG.info("Not adding partition {} as it already exists", part)`



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
Lines 2922 (patched)
<https://reviews.apache.org/r/65716/#comment278979>

    Please use new ArrayList<>(partitionsToAdd.size())



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
Line 2902 (original), 2925 (patched)
<https://reviews.apache.org/r/65716/#comment278981>

    ugi doesn't change in the loop, so it can be moved outside. Same goes for 
currentUser - it can be done just once outside the loop.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
Line 2906 (original), 2929 (patched)
<https://reviews.apache.org/r/65716/#comment278983>

    Why are we converting IO exception to RuntimeException? This doesn't look 
right.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
Line 2909 (original), 2932 (patched)
<https://reviews.apache.org/r/65716/#comment279007>

    I am wondering what is the motivation for doing this concurrently. I guess 
that if the list of partitions is huge, it may be useful, but for smaller lists 
it is probably just an overhead. This is putside your scope but definitely 
worth investigating.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
Line 2913 (original), 2936 (patched)
<https://reviews.apache.org/r/65716/#comment278987>

    Since we are using Java 8 we can use lambdas now, so this can become:
    
              partFutures.add(threadPool.submit(() -> {
                if (failureOccurred.get()) {
                  return null;
                }
                ugi.doAs((PrivilegedExceptionAction<Object>) () -> {
                  try {
                    boolean madeDir = createLocationForAddedPartition(table, 
part);
                    addedPartitions.put(partWrapper, madeDir);
                    initializeAddedPartition(table, part, madeDir);
                  } catch (MetaException e) {
                    throw new IOException(e.getMessage(), e);
                  }
                  return null;
                });
                return part;
              }));



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
Line 2917 (original), 2941 (patched)
<https://reviews.apache.org/r/65716/#comment278985>

    If we iterate over PartValEqWrapper objects, then we do not need to create 
a new one here.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
Line 2950 (original), 2984 (patched)
<https://reviews.apache.org/r/65716/#comment279009>

    From the code below it looks equivalent to just
    
        if (!newParts.isEmpty()) {
              ms.addPartitions(dbName, tblName, newParts);
        }



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
Line 2957 (original), 2991 (patched)
<https://reviews.apache.org/r/65716/#comment279008>

    I would suggest using different variable for this and for the success 
above. But anyway, setting of success above seems useless.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
Line 3095 (original), 3129 (patched)
<https://reviews.apache.org/r/65716/#comment279010>

    Similar comments apply to this function.


- Alexander Kolbasov


On March 6, 2018, 5:30 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65716/
> -----------------------------------------------------------
> 
> (Updated March 6, 2018, 5:30 p.m.)
> 
> 
> Review request for hive, Alexander Kolbasov, Peter Vary, and Adam Szita.
> 
> 
> Bugs: HIVE-18696
>     https://issues.apache.org/jira/browse/HIVE-18696
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The idea behind the patch is
> 
> 1) Separate the partition validation from starting the tasks which create the 
> partition folders. 
> Instead of doing the checks on the partitions and submit the tasks in one 
> loop, separated the validation into a different loop. So first iterate 
> through the partitions, validate the table/db names, and check for 
> duplicates. Then if all partitions were correct, in the second loop submit 
> the tasks to create the partition folders. This way if one of the partitions 
> is incorrect, the exception will be thrown in the first loop, before the 
> tasks are submitted. So we can be sure that no partition folder will be 
> created if the list contains an invalid partition.
> 
> 2) Handle the exceptions which occur during the execution of the tasks 
> differently.
> Previously if an exception occured in one task, the remaining tasks were 
> canceled, and the newly created partition folders were cleaned up in the 
> finally part. The problem was that it could happen that some tasks were still 
> not finished with the folder creation when cleaning up the others, so there 
> could have been leftover folders. After doing some testing it turned out that 
> this use case cannot be avoided completely when canceling the tasks.
> The idea of this patch is to set a flag if an exception is thrown in one of 
> the tasks. This flag is visible in the tasks and if its value is true, the 
> partition folders won't be created. Then iterate through the remaining tasks 
> and wait for them to finish. The tasks which are started before the flag got 
> set will then finish creating the partition folders. The tasks which are 
> started after the flag got set, won't create the partition folders, to avoid 
> unnecessary work. This way it is sure that all tasks are finished, when 
> entering the finally part where the partition folders are cleaned up.
> 
> 
> Diffs
> -----
> 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  2be018b 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitions.java
>  4d9cb1b 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitionsFromPartSpec.java
>  1122057 
> 
> 
> Diff: https://reviews.apache.org/r/65716/diff/2/
> 
> 
> Testing
> -------
> 
> Added some new tests cases to the TestAddPartitions and 
> TestAddPartitionsFromPartSpec tests.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>

Reply via email to