[jira] [Commented] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]

Rui Li (JIRA) Thu, 18 Dec 2014 17:50:12 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252762#comment-14252762
 ]


Rui Li commented on HIVE-9153:
------------------------------

Hi [~xuefuz] - if the spark cluster is the same as the hadoop cluster i.e. each 
executor is also a datanode, spark task scheduler usually does a good job to 
make sure all mappers have some locality (of course on condition that the 
mappers do specify a preferred location). In such case, more mappers won't 
impact data locality.
bq. Is there a way to disable Spark's delayed schedule to try out?
Spark task scheduler divides tasks into multiple lists according to locality 
level and attempts to launch tasks with highest locality level when an executor 
offers resources. It may also wait some time to schedule tasks in a lower 
level. I don't think there's a switch to turn it off. Actually I'm not 100% 
sure it's the delay schedule causing the issue. If all our tasks don't have 
preferred location, the delay may happen at start-up (waiting allowed locality 
level to drop) but not during execution. I'll look more into this.

> Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]
> ---------------------------------------------------------------------
>
>                 Key: HIVE-9153
>                 URL: https://issues.apache.org/jira/browse/HIVE-9153
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>    Affects Versions: spark-branch
>            Reporter: Brock Noland
>            Assignee: Rui Li
>         Attachments: screenshot.PNG
>
>
> The default InputFormat is {{CombineHiveInputFormat}} and thus HOS uses this. 
> However, Tez uses {{HiveInputFormat}}. Since tasks are relatively cheap in 
> Spark, it might make sense for us to use {{HiveInputFormat}} as well. We 
> should evaluate this on a query which has many input splits such as {{select 
> count(\*) from store_sales where something is not null}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9153) Evaluate CombineHiveInputFormat versus HiveInputFormat [Spark Branch]

Reply via email to