[ 
https://issues.apache.org/jira/browse/HIVE-18916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511191#comment-16511191
 ] 

Sahil Takiar commented on HIVE-18916:
-------------------------------------

[~aihuaxu] can you take a look. Here is a brief summary of the changes:
* {{SparkClientImpl}} has been modified so that if the thread that is 
monitoring the {{bin/spark-submit}} process detects that {{bin/spark-submit}} 
fails, it parses the stdout / stderr of {{bin/spark-submit}} and checks for any 
log lines that contain "Error" and then includes those lines in the exception 
that gets thrown
** {{SparkClientImpl}} was actually already doing this, but the information 
wasn't getting propagated all the way to the end user
* A few changes to {{RpcServer}} were necessary to make sure the exception 
thrown by the "Driver" thread gets propagated to the user
* A few other minor changes to classes like {{RemoteSparkJobMonitor}}, 
{{SparkTask}} and the constructor of {{SparkClientImpl}} to prevent double 
logging of exceptions
* Added a few unit tests for this, which required masking a few additional 
patterns in .q files

The motivation for this is that {{bin/spark-submit}} errors out when certain 
parameters are misconfigured. This allows HoS to propagate these error messages 
to the end-user, which should improve debuggability. The added .q files are a 
good example of this.

> SparkClientImpl doesn't error out if spark-submit fails
> -------------------------------------------------------
>
>                 Key: HIVE-18916
>                 URL: https://issues.apache.org/jira/browse/HIVE-18916
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>         Attachments: HIVE-18916.1.WIP.patch, HIVE-18916.2.patch, 
> HIVE-18916.3.patch
>
>
> If {{spark-submit}} returns a non-zero exit code, {{SparkClientImpl}} will 
> simply log the exit code, but won't throw an error. Eventually, the 
> connection timeout will get triggered and an exception like {{Timed out 
> waiting for client connection}} will be logged, which is pretty misleading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to