[ 
https://issues.apache.org/jira/browse/HIVE-8294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152248#comment-14152248
 ] 

Gopal V commented on HIVE-8294:
-------------------------------

Even excluding the speed issue, that line produces approx ~1.25Gb of LOG data 
for a query which shuffles ~30Mb of data at the 1000 scale query.

> Hive-on-Tez is going into MapOperator::cleanUpInputFileChangedOp()
> ------------------------------------------------------------------
>
>                 Key: HIVE-8294
>                 URL: https://issues.apache.org/jira/browse/HIVE-8294
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.14.0
>            Reporter: Gopal V
>            Assignee: Gopal V
>
> Hive-on-Tez produces a significant amount of time in small queries going over 
> the list of files in the plan to check for input context.
> {code}
> for (String onefile : conf.getPathToAliases().keySet()) {
>       Path onepath = normalizePath(onefile);
> ....
>    for (String onealias : conf.getPathToAliases().get(onefile)) {
>         Operator<? extends OperatorDesc> op = 
> conf.getAliasToWork().get(onealias);
>         MapInputPath inp = new MapInputPath(onefile, onealias, op, partDesc);
>         MapOpCtx context = opCtxMap.get(inp);
>         if (context != null) {
>           current = context;
>           LOG.info("Processing alias " + onealias + " for file " + onefile);
>           return;
>         }
>       }
> {code}
> For a simple test like {{select distinct l_shipdate from lineitem;}} it 
> produced 5861039 log lines with "Processing alias ".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to