[
https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008549#comment-13008549
]
Joydeep Sen Sarma commented on HIVE-2051:
-----------------------------------------
Siying - i think we shouldn't ignore ExecutionException. The best part of
checking for each task status seems to be that we can find out if any of them
failed (indicated by ExecutionException). Also we can remove the
executor.awaitTermination() call as well (same feedback as the comments above).
also - do you want to make the core of this routine synchronized (perhaps on
the context object - which is one per query)? there really is no point running
more than one of these per query at a time. (we can move this whole routine to
the Context object if that seems like a better place (or at least make the call
from the Context object where it can be marked as a synchronized method).
otherwise looks good. please upload a new patch and i will test and commit.
> getInputSummary() to call FileSystem.getContentSummary() in parallel
> --------------------------------------------------------------------
>
> Key: HIVE-2051
> URL: https://issues.apache.org/jira/browse/HIVE-2051
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch,
> HIVE-2051.4.patch
>
>
> getInputSummary() now call FileSystem.getContentSummary() one by one, which
> can be extremely slow when the number of input paths are huge. By calling
> those functions in parallel, we can cut latency in most cases.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira