[jira] [Commented] (HIVE-16484) Investigate SparkLauncher for HoS as alternative to bin/spark-submit

Sahil Takiar (JIRA) Mon, 08 Jan 2018 13:12:39 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16317066#comment-16317066
 ]


Sahil Takiar commented on HIVE-16484:
-------------------------------------

[~xuefuz] thanks for voicing your concern, I see a few benefits to doing this:

* The main benefit is the usage of {{InProcessLauncher}} which was added in 
SPARK-11035
** I didn't add the integration with {{InProcessLauncher}} to this patch mainly 
because I didn't want the diff to get too big; I plan to add integration with 
{{InProcessLauncher}} in another JIRA
** The {{InProcessLauncher}} avoids running {{bin/spark-submit}}, it calls 
{{SparkSubmit#main}} directly, which decreases the amount of time it takes to 
start a HoS session; a separate process doesn't need to be launched to start 
the Spark app
** It also makes HoS easier to debug because everything is run in a single 
process, we don't have to rely on re-directing stdout / stderr output streams, 
etc.
* The API is much cleaner than building up command line arguments for 
{{bin/spark-submit}}

Some other thoughts:

{quote} Moreover, security related stuff will need more testing at least. 
{quote} I'm not that familiar with the security aspects of HoS, but I can add 
some tests with {{MiniHiveKdc}} / doAs to check if things are still good. 

{quote} I'd feel nervous in completely different code path which is so critical 
{quote} Valid point, but the code path isn't that different, at the end of the 
day everything is going through {{SparkSubmit.scala}}.

{quote} we can make a switch in later releases {quote} I don't think we have 
plans to release Hive 3.0.0 anytime soon, so we can fix any issues with 
{{SparkLauncher}} before the release.

Let me know your thoughts.

> Investigate SparkLauncher for HoS as alternative to bin/spark-submit
> --------------------------------------------------------------------
>
>                 Key: HIVE-16484
>                 URL: https://issues.apache.org/jira/browse/HIVE-16484
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>         Attachments: HIVE-16484.1.patch, HIVE-16484.10.patch, 
> HIVE-16484.2.patch, HIVE-16484.3.patch, HIVE-16484.4.patch, 
> HIVE-16484.5.patch, HIVE-16484.6.patch, HIVE-16484.7.patch, 
> HIVE-16484.8.patch, HIVE-16484.9.patch
>
>
> The {{SparkClientImpl#startDriver}} currently looks for the {{SPARK_HOME}} 
> directory and invokes the {{bin/spark-submit}} script, which spawns a 
> separate process to run the Spark application.
> {{SparkLauncher}} was added in SPARK-4924 and is a programatic way to launch 
> Spark applications.
> I see a few advantages:
> * No need to spawn a separate process to launch a HoS --> lower startup time
> * Simplifies the code in {{SparkClientImpl}} --> easier to debug
> * {{SparkLauncher#startApplication}} returns a {{SparkAppHandle}} which 
> contains some useful utilities for querying the state of the Spark job
> ** It also allows the launcher to specify a list of job listeners



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16484) Investigate SparkLauncher for HoS as alternative to bin/spark-submit

Reply via email to