[jira] [Commented] (HIVE-7525) Research to find out if it's possible to submit Spark jobs concurrently using shared SparkContext

Chao (JIRA) Tue, 29 Jul 2014 10:44:49 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078056#comment-14078056
 ]


Chao commented on HIVE-7525:
----------------------------

I modified SparkClient to make it submit rdd4 via a separate thread, which 
simply does the "foreach" in the "run" method. However, I keep getting this 
issue
about not being able to find the plan file:

14/07/29 10:01:37 INFO exec.Utilities: local path = 
hdfs://localhost:8020/tmp/hive-chao/6ab5877a-ba1a-4761-971e-45d9b46cd3c6/hive_2014-07-29_10-01-28_749_8375059517503664847-1/-mr-10003/1a80d789-63d8-43bb-b3f4-4ad74a66b0af/map.xml
14/07/29 10:01:37 INFO exec.Utilities: Open file to read in plan: 
hdfs://localhost:8020/tmp/hive-chao/6ab5877a-ba1a-4761-971e-45d9b46cd3c6/hive_2014-07-29_10-01-28_749_8375059517503664847-1/-mr-10003/1a80d789-63d8-43bb-b3f4-4ad74a66b0af/map.xml
14/07/29 10:01:37 INFO exec.Utilities: File not found: File does not exist: 
/tmp/hive-chao/6ab5877a-ba1a-4761-971e-45d9b46cd3c6/hive_2014-07-29_10-01-28_749_8375059517503664847-1/-mr-10003/1a80d789-63d8-43bb-b3f4-4ad74a66b0af/map.xml
        at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65)
        at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1726)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1669)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1649)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1621)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:482)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:322)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)

On the other hand, if I trigger the "foreach" in the current thread, everything 
is fine.
Maybe it's because the hadoop FS doesn't allow accessing the same file from 
different threads? Not sure why.

> Research to find out if it's possible to submit Spark jobs concurrently using 
> shared SparkContext
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-7525
>                 URL: https://issues.apache.org/jira/browse/HIVE-7525
>             Project: Hive
>          Issue Type: Task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Chao
>
> Refer to HIVE-7503 and SPARK-2688. Find out if it's possible to submit 
> multiple spark jobs concurrently using a shared SparkContext. SparkClient's 
> code can be manipulated for this test. Here is the process:
> 1. Transform rdd1 to rdd2 using some transformation.
> 2. call rdd2.cache() to persist it in memory.
> 3. in two threads, calling accordingly:
>     Thread a. rdd2 -> rdd3; rdd3.foreach()
>     Thread b. rdd2 -> rdd4; rdd4.foreach()
> It would be nice to find out monitoring and error reporting aspects.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7525) Research to find out if it's possible to submit Spark jobs concurrently using shared SparkContext

Reply via email to