[jira] [Commented] (HIVE-9339) Optimize split grouping for CombineHiveInputFormat [Spark Branch]

Rui Li (JIRA) Sun, 11 Jan 2015 19:38:34 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273191#comment-14273191
 ]


Rui Li commented on HIVE-9339:
------------------------------

Using listener is fine. We currently use listeners to collect metrics as well.

> Optimize split grouping for CombineHiveInputFormat [Spark Branch]
> -----------------------------------------------------------------
>
>                 Key: HIVE-9339
>                 URL: https://issues.apache.org/jira/browse/HIVE-9339
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>
> It seems that split generation, especially in terms of grouping inputs, needs 
> to be improved. For this, we may need cluster information. Because of this, 
> we will first try to solve the problem for Spark.
> As to cluster information, Spark doesn't provide an API (SPARK-5080). 
> However, Spark doesn't have a listener API, with which Spark driver can get 
> notifications about executor going up/down, task starting/finishing, etc. 
> With this information, Spark client should be able to have a view of the 
> current cluster image.
> Spark developers mentioned that the listener can only be created after 
> SparkContext is started, at which time, some executions may have already 
> started and so the listener will miss some information. This can be fixed. 
> File a JIRA with Spark project if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9339) Optimize split grouping for CombineHiveInputFormat [Spark Branch]

Reply via email to