[ https://issues.apache.org/jira/browse/HIVE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078056#comment-14078056 ]
Chao commented on HIVE-7525: ---------------------------- I modified SparkClient to make it submit rdd4 via a separate thread, which simply does the "foreach" in the "run" method. However, I keep getting this issue about not being able to find the plan file: 14/07/29 10:01:37 INFO exec.Utilities: local path = hdfs://localhost:8020/tmp/hive-chao/6ab5877a-ba1a-4761-971e-45d9b46cd3c6/hive_2014-07-29_10-01-28_749_8375059517503664847-1/-mr-10003/1a80d789-63d8-43bb-b3f4-4ad74a66b0af/map.xml 14/07/29 10:01:37 INFO exec.Utilities: Open file to read in plan: hdfs://localhost:8020/tmp/hive-chao/6ab5877a-ba1a-4761-971e-45d9b46cd3c6/hive_2014-07-29_10-01-28_749_8375059517503664847-1/-mr-10003/1a80d789-63d8-43bb-b3f4-4ad74a66b0af/map.xml 14/07/29 10:01:37 INFO exec.Utilities: File not found: File does not exist: /tmp/hive-chao/6ab5877a-ba1a-4761-971e-45d9b46cd3c6/hive_2014-07-29_10-01-28_749_8375059517503664847-1/-mr-10003/1a80d789-63d8-43bb-b3f4-4ad74a66b0af/map.xml at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1726) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1669) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1649) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1621) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:482) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:322) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) On the other hand, if I trigger the "foreach" in the current thread, everything is fine. Maybe it's because the hadoop FS doesn't allow accessing the same file from different threads? Not sure why. > Research to find out if it's possible to submit Spark jobs concurrently using > shared SparkContext > ------------------------------------------------------------------------------------------------- > > Key: HIVE-7525 > URL: https://issues.apache.org/jira/browse/HIVE-7525 > Project: Hive > Issue Type: Task > Components: Spark > Reporter: Xuefu Zhang > Assignee: Chao > > Refer to HIVE-7503 and SPARK-2688. Find out if it's possible to submit > multiple spark jobs concurrently using a shared SparkContext. SparkClient's > code can be manipulated for this test. Here is the process: > 1. Transform rdd1 to rdd2 using some transformation. > 2. call rdd2.cache() to persist it in memory. > 3. in two threads, calling accordingly: > Thread a. rdd2 -> rdd3; rdd3.foreach() > Thread b. rdd2 -> rdd4; rdd4.foreach() > It would be nice to find out monitoring and error reporting aspects. -- This message was sent by Atlassian JIRA (v6.2#6252)