[
https://issues.apache.org/jira/browse/HIVE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892974#action_12892974
]
Ning Zhang commented on HIVE-1408:
----------------------------------
Some questions:
1) the local file system handled in shims are in a way that they are with the
same file name (class name) and are compiled conditionally depending on the
hadoop version during compile time. This may cause problem when deploying the
same hive jar file to be used in different clusters with different version. The
current shim was implemented by naming the classes differently and use
ShimsLoader to get the correct class during execution time. This allows hive
jar files to be deployed to different hadoop clusters.
2) data/conf/hive-site.xml fs.pfile.impl is not needed if ShimsLoader is used
as described above.
3) the hive.exec.mode.local.auto default values are different in
HiveConf.java and conf/hive-default.xml. It's better to be the same to avoid
confusion.
4) ctas.q.out: do you know why the GlobalTableID was changed?
5) MapRedTask.java:149 The plan file name is not randomized as before. It may
cause problem when the parallel execution mode is true and multiple MapRedTasks
are running at the same time (e.g., parallel muti-table inserts).
6) If there are 2 MapRed tasks and MR2 depends on MR1 and MR1 is decided to
be running local, it seems MR2 have to be local since the intermediate files
are stored in local file system? What about in parallel execution when MR1 and
MR2 running in parallel and only one of them is local? It seems the info of
whether a task is "local" is stored in Context (and HiveConf) which is shared
among parallel MR tasks?
7) ExecDriver.localizeMRTmpFileImpl changes the FileSinkDesc.dirName after
the MR tasks have generated, it breaks the dynamic partition code which runs
when the FileSinkOperator is generated. In particular, the DynamicPartitionCtx
also stores the dirName, it has to be changed as well in localizeMRTmpFileImpl.
8) MoveTask previously move intermediate directory in HDFS to the final
directory also in HDFS. In the local mode, we should change the MoveTask
execution as well?
9) Driver.java:100 the two functions are made static. Should they be moved to
Utilities?
> add option to let hive automatically run in local mode based on tunable
> heuristics
> ----------------------------------------------------------------------------------
>
> Key: HIVE-1408
> URL: https://issues.apache.org/jira/browse/HIVE-1408
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Joydeep Sen Sarma
> Assignee: Joydeep Sen Sarma
> Attachments: 1408.1.patch, 1408.2.patch, 1408.2.q.out.patch,
> hive-1408.6.patch
>
>
> as a followup to HIVE-543 - we should have a simple option (enabled by
> default) to let hive run in local mode if possible.
> two levels of options are desirable:
> 1. hive.exec.mode.local.auto=true/false // control whether local mode is
> automatically chosen
> 2. Options to control different heuristics, some naiive examples:
> hive.exec.mode.local.auto.input.size.max=1G // don't choose local mode
> if data > 1G
> hive.exec.mode.local.auto.script.enable=true/false // choose if local
> mode is enabled for queries with user scripts
> this can be implemented as a pre/post execution hook. It makes sense to
> provide this as a standard hook in the hive codebase since it's likely to
> improve response time for many users (especially for test queries).
> the initial proposal is to choose this at a query level and not at per
> hive-task (ie. hadoop job) level. per job-level requires more changes to
> compilation (to not pre-commit to hdfs or local scratch directories at
> compile time).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.