Alexandre Fonseca created GIRAPH-859:
----------------------------------------

             Summary: Yarn-based Giraph user woes
                 Key: GIRAPH-859
                 URL: https://issues.apache.org/jira/browse/GIRAPH-859
             Project: Giraph
          Issue Type: Bug
            Reporter: Alexandre Fonseca


After a lengthy debugging session with Stefan Beskow due to the following post 
in the mailing list:
http://mail-archives.apache.org/mod_mbox/giraph-user/201402.mbox/%3C32c1ea4f88ec4fd2bc0815b012c0de48%40MERCMBX25R.na.SAS.com%3E

I was able to identify several problems that occur if you submit Giraph jobs 
using the Yarn framework with the submitting user being different from the Yarn 
daemons (and HDFS) user. This is under a scenario with no authentication. The 
delegation tokens used in a secure environment might solve this problem.

h2. First problem

Since the client user and the AM master user are different, the AM is unable to 
find the HDFS files distributed by the client as it'll look at the wrong home 
directory: /user/<yarn user>/giraph_yarn_jar_cache instead of /user/<client 
user>/giraph_yarn_jar_cache:

{code}
14/02/20 18:10:25 INFO yarn.GiraphYarnClient: Made local resource for 
:/r/sanyo.unx.sas.com/vol/vol410/u41/stbesk/snapshot_from_git/jars/giraph-ex.jar
 to 
hdfs://el01cn01.unx.sas.com:8020/user/stbesk/giraph_yarn_jar_cache/application_1392713839733_0034/giraph-ex.jar
{code}

{code}
Exception in thread "pool-3-thread-2" java.lang.IllegalStateException: Could 
not configure the containerlaunch context for GiraphYarnTasks.
        at 
org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:391)
        at 
org.apache.giraph.yarn.GiraphApplicationMaster.access$500(GiraphApplicationMaster.java:78)
        at 
org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.buildContainerLaunchContext(GiraphApplicationMaster.java:522)
        at 
org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.run(GiraphApplicationMaster.java:479)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.FileNotFoundException: File does not exist: 
hdfs://el01cn01.unx.sas.com:8020/user/yarn/giraph_yarn_jar_cache/application_1392713839733_0034/okapi-0.3.2.jar
{code}

h2. Second problem

The AM attempts to rewrite the giraph-conf.xml in the HDFS distributed cache 
before launching the task containers. Since the client is the one who put that 
file there on the first place, and default permissions are rw-r--r--, the AM 
will be unable to rewrite the file unless the Yarn user also happens to be the 
HDFS superuser. This issue also occurs at the directory level of the 
distributed cache folder for the application when it tries to delete or write 
new files.

{code}
Exception in thread "pool-3-thread-1" java.lang.IllegalStateException: Could 
not configure the containerlaunch context for GiraphYarnTasks.
        at 
org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:401)
        at 
org.apache.giraph.yarn.GiraphApplicationMaster.access$500(GiraphApplicationMaster.java:78)
        at 
org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.buildContainerLaunchContext(GiraphApplicationMaster.java:532)
        at 
org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.run(GiraphApplicationMaster.java:489)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.hadoop.security.AccessControlException: Permission 
denied: user=yarn, access=WRITE, 
inode="/user/stbesk/giraph_yarn_jar_cache/application_1392713839733_0044/giraph-conf.xml":stbesk:supergroup:-rw-r--r--
        at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:234)
        at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:164)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5430)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5412)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:5374)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2178)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2133)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2086)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:499)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:321)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
        at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
        at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
        at 
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1429)
        at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1449)
        at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1374)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:390)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:386)
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:386)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:330)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:907)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:785)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:365)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338)
        at 
org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1904)
        at 
org.apache.giraph.yarn.YarnUtils.exportGiraphConfiguration(YarnUtils.java:257)
        at 
org.apache.giraph.yarn.GiraphApplicationMaster.updateGiraphConfForExport(GiraphApplicationMaster.java:421)
        at 
org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:396)
        ... 6 more
{code}

h2. Third problem

A temporary giraph-conf.xml file is created in /tmp/giraph-conf.xml on the host 
of the Giraph client submitting a job. However, this file is not deleted after 
creation so if different users submit giraph jobs in the same hosts, one of 
them will be unable to as it won't be able to write to the temporary location.

{code}
Exception in thread "pool-2-thread-1" java.lang.IllegalStateException: Could 
not configure the containerlaunch context for GiraphYarnTasks.
        at 
org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:391)
        at 
org.apache.giraph.yarn.GiraphApplicationMaster.access$500(GiraphApplicationMaster.java:78)
        at 
org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.buildContainerLaunchContext(GiraphApplicationMaster.java:522)
        at 
org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.run(GiraphApplicationMaster.java:479)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.FileNotFoundException: /tmp/giraph-conf.xml (Permission 
denied)
        at java.io.FileOutputStream.open(Native Method)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:110)
        at 
org.apache.giraph.yarn.YarnUtils.exportGiraphConfiguration(YarnUtils.java:235)
        at 
org.apache.giraph.yarn.GiraphApplicationMaster.updateGiraphConfForExport(GiraphApplicationMaster.java:411)
        at 
org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:386)
        ... 6 more
{code}

And I'm sure there are many more but in the meantime we stopped the debugging.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to