[ https://issues.apache.org/jira/browse/HIVE-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16317066#comment-16317066 ]
Sahil Takiar commented on HIVE-16484: ------------------------------------- [~xuefuz] thanks for voicing your concern, I see a few benefits to doing this: * The main benefit is the usage of {{InProcessLauncher}} which was added in SPARK-11035 ** I didn't add the integration with {{InProcessLauncher}} to this patch mainly because I didn't want the diff to get too big; I plan to add integration with {{InProcessLauncher}} in another JIRA ** The {{InProcessLauncher}} avoids running {{bin/spark-submit}}, it calls {{SparkSubmit#main}} directly, which decreases the amount of time it takes to start a HoS session; a separate process doesn't need to be launched to start the Spark app ** It also makes HoS easier to debug because everything is run in a single process, we don't have to rely on re-directing stdout / stderr output streams, etc. * The API is much cleaner than building up command line arguments for {{bin/spark-submit}} Some other thoughts: {quote} Moreover, security related stuff will need more testing at least. {quote} I'm not that familiar with the security aspects of HoS, but I can add some tests with {{MiniHiveKdc}} / doAs to check if things are still good. {quote} I'd feel nervous in completely different code path which is so critical {quote} Valid point, but the code path isn't that different, at the end of the day everything is going through {{SparkSubmit.scala}}. {quote} we can make a switch in later releases {quote} I don't think we have plans to release Hive 3.0.0 anytime soon, so we can fix any issues with {{SparkLauncher}} before the release. Let me know your thoughts. > Investigate SparkLauncher for HoS as alternative to bin/spark-submit > -------------------------------------------------------------------- > > Key: HIVE-16484 > URL: https://issues.apache.org/jira/browse/HIVE-16484 > Project: Hive > Issue Type: Bug > Components: Spark > Reporter: Sahil Takiar > Assignee: Sahil Takiar > Attachments: HIVE-16484.1.patch, HIVE-16484.10.patch, > HIVE-16484.2.patch, HIVE-16484.3.patch, HIVE-16484.4.patch, > HIVE-16484.5.patch, HIVE-16484.6.patch, HIVE-16484.7.patch, > HIVE-16484.8.patch, HIVE-16484.9.patch > > > The {{SparkClientImpl#startDriver}} currently looks for the {{SPARK_HOME}} > directory and invokes the {{bin/spark-submit}} script, which spawns a > separate process to run the Spark application. > {{SparkLauncher}} was added in SPARK-4924 and is a programatic way to launch > Spark applications. > I see a few advantages: > * No need to spawn a separate process to launch a HoS --> lower startup time > * Simplifies the code in {{SparkClientImpl}} --> easier to debug > * {{SparkLauncher#startApplication}} returns a {{SparkAppHandle}} which > contains some useful utilities for querying the state of the Spark job > ** It also allows the launcher to specify a list of job listeners -- This message was sent by Atlassian JIRA (v6.4.14#64029)