[jira] [Commented] (HIVE-22964) MM table split computation is very slow

Peter Vary (Jira) Wed, 11 Mar 2020 03:20:24 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-22964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056824#comment-17056824
 ]


Peter Vary commented on HIVE-22964:
-----------------------------------

Hi Aditya Shah,
 * Yestus errors to fix:
{code:java}
./ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:490:            
, finalDirs = Collections.synchronizedList( new ArrayList<>());:12: warning: 
',' is preceded with whitespace.
./ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:490:            
, finalDirs = Collections.synchronizedList( new ArrayList<>());:56: warning: 
'(' is followed by whitespace.
./ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:602:            
processForWriteIdsForMmRead(dir, conf, validWriteIdList, allowOriginals, 
finalPaths, pathsWithFileOriginals);: warning: Line is longer than 120 
characters (found 121). 
{code}

 * The ASF related errors are not yours.
 * The way you handled the config deprecation seems ok to me.
 * Still would like to see more localized error handling:
{code:java}
try {
[..]
            try {
              for (Future<Void> pathFuture : pathFutures) {
                pathFuture.get();
              }
            } catch (InterruptedException | ExecutionException e) {
              for (Future<Void> future : pathFutures) {
                future.cancel(true);
              }
              throw new IOException(e);
            }
} finally {
[..]
}
{code}
Do you strongly disagree, or just forgot?

Keep submitting the fixed patch until we have a green run.

Thanks,
 Peter

> MM table split computation is very slow
> ---------------------------------------
>
>                 Key: HIVE-22964
>                 URL: https://issues.apache.org/jira/browse/HIVE-22964
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Aditya Shah
>            Assignee: Aditya Shah
>            Priority: Major
>         Attachments: HIVE-22964.1.patch, HIVE-22964.patch
>
>
> Since for MM table we process the paths prior to inputFormat.getSplits() we 
> end up doing listing on the whole table at once. This could be optimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-22964) MM table split computation is very slow

Reply via email to