[ 
https://issues.apache.org/jira/browse/HIVE-16854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045762#comment-16045762
 ] 

Xuefu Zhang commented on HIVE-16854:
------------------------------------

Thank [~lirui] for enhancing the fix. Removing the limit of 1 backlog 
connection is great.
+1
Patch looks good. These failures don't seem related to the patch.

> SparkClientFactory is locked too aggressively
> ---------------------------------------------
>
>                 Key: HIVE-16854
>                 URL: https://issues.apache.org/jira/browse/HIVE-16854
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: 1.1.0
>            Reporter: Xuefu Zhang
>            Assignee: Rui Li
>         Attachments: 15763.jstack, HIVE-16854.2.patch, HIVE-16854.patch
>
>
> Most methods in SparkClientFactory are synchronized on the SparkClientFactory 
> singleton. However, some methods are very expensive, such as createClient(), 
> which returns a SparkClientImpl instance. However, creating a SparkClientImpl 
> instance requires starting a remote driver to connect back to RPCServer. This 
> process can take a long time such as in case of a busy yarn queue. When this 
> happens, all pending  calls on SparkClientFactory will have to wait for a 
> long time.
> In our case, hive.spark.client.server.connect.timeout is set to 1hr. This 
> makes some queries waiting for hours before starting.
> The current implementation seems pretty much making all remote driver 
> launches serialized. If one of them takes time, the following ones will have 
> to wait.
> HS2 stacktrace is attached for reference. It's based on earlier version of 
> Hive, so the line numbers might be slightly off. The following shows the 
> locking effect:
> {code}
> xuefu@hadoopservice20-sjc1:~$ grep 
> org.apache.hive.spark.client.SparkClientFactory 15763.jstack 
>       at 
> org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:79)
>       - waiting to lock <0x00007f78fa1a9cc0> (a java.lang.Class for 
> org.apache.hive.spark.client.SparkClientFactory)
>       at 
> org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:79)
>       - waiting to lock <0x00007f78fa1a9cc0> (a java.lang.Class for 
> org.apache.hive.spark.client.SparkClientFactory)
>       at 
> org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
>       - locked <0x00007f78fa1a9cc0> (a java.lang.Class for 
> org.apache.hive.spark.client.SparkClientFactory)
>       at 
> org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:79)
>       - waiting to lock <0x00007f78fa1a9cc0> (a java.lang.Class for 
> org.apache.hive.spark.client.SparkClientFactory)
>       at 
> org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:79)
>       - waiting to lock <0x00007f78fa1a9cc0> (a java.lang.Class for 
> org.apache.hive.spark.client.SparkClientFactory)
>       at 
> org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:79)
>       - waiting to lock <0x00007f78fa1a9cc0> (a java.lang.Class for 
> org.apache.hive.spark.client.SparkClientFactory)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to