[ https://issues.apache.org/jira/browse/GIRAPH-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13940605#comment-13940605 ]
Andrey Stepachev commented on GIRAPH-859: ----------------------------------------- Insecure YARN cluster runs users under nobody (or other custom user). Thats don't play well with permissions on hdfs. To solve that I've wrote patch https://issues.apache.org/jira/browse/YARN-1853 which enables to runAs even on unsecured cluster. > Yarn-based Giraph user woes > --------------------------- > > Key: GIRAPH-859 > URL: https://issues.apache.org/jira/browse/GIRAPH-859 > Project: Giraph > Issue Type: Bug > Reporter: Alexandre Fonseca > Labels: yarn > Attachments: GIRAPH-859-incomplete.patch > > > After a lengthy debugging session with Stefan Beskow due to the following > post in the mailing list: > http://mail-archives.apache.org/mod_mbox/giraph-user/201402.mbox/%3C32c1ea4f88ec4fd2bc0815b012c0de48%40MERCMBX25R.na.SAS.com%3E > I was able to identify several problems that occur if you submit Giraph jobs > using the Yarn framework with the submitting user being different from the > Yarn daemons (and HDFS) user. This is under a scenario with no > authentication. The delegation tokens used in a secure environment might > solve this problem. > h2. First problem > Since the client user and the AM master user are different, the AM is unable > to find the HDFS files distributed by the client as it'll look at the wrong > home directory: /user/<yarn user>/giraph_yarn_jar_cache instead of > /user/<client user>/giraph_yarn_jar_cache: > {code} > 14/02/20 18:10:25 INFO yarn.GiraphYarnClient: Made local resource for > :/r/sanyo.unx.sas.com/vol/vol410/u41/stbesk/snapshot_from_git/jars/giraph-ex.jar > to > hdfs://el01cn01.unx.sas.com:8020/user/stbesk/giraph_yarn_jar_cache/application_1392713839733_0034/giraph-ex.jar > {code} > {code} > Exception in thread "pool-3-thread-2" java.lang.IllegalStateException: Could > not configure the containerlaunch context for GiraphYarnTasks. > at > org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:391) > at > org.apache.giraph.yarn.GiraphApplicationMaster.access$500(GiraphApplicationMaster.java:78) > at > org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.buildContainerLaunchContext(GiraphApplicationMaster.java:522) > at > org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.run(GiraphApplicationMaster.java:479) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > Caused by: java.io.FileNotFoundException: File does not exist: > hdfs://el01cn01.unx.sas.com:8020/user/yarn/giraph_yarn_jar_cache/application_1392713839733_0034/okapi-0.3.2.jar > {code} > h2. Second problem > The AM attempts to rewrite the giraph-conf.xml in the HDFS distributed cache > before launching the task containers. Since the client is the one who put > that file there on the first place, and default permissions are rw-r--r--, > the AM will be unable to rewrite the file unless the Yarn user also happens > to be the HDFS superuser. This issue also occurs at the directory level of > the distributed cache folder for the application when it tries to delete or > write new files. > {code} > Exception in thread "pool-3-thread-1" java.lang.IllegalStateException: Could > not configure the containerlaunch context for GiraphYarnTasks. > at > org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:401) > at > org.apache.giraph.yarn.GiraphApplicationMaster.access$500(GiraphApplicationMaster.java:78) > at > org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.buildContainerLaunchContext(GiraphApplicationMaster.java:532) > at > org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.run(GiraphApplicationMaster.java:489) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > Caused by: org.apache.hadoop.security.AccessControlException: Permission > denied: user=yarn, access=WRITE, > inode="/user/stbesk/giraph_yarn_jar_cache/application_1392713839733_0044/giraph-conf.xml":stbesk:supergroup:-rw-r--r-- > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:234) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:164) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5430) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5412) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:5374) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2178) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2133) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2086) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:499) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:321) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:525) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1429) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1449) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1374) > at > org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:390) > at > org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:386) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:386) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:330) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:907) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:785) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:365) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338) > at > org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1904) > at > org.apache.giraph.yarn.YarnUtils.exportGiraphConfiguration(YarnUtils.java:257) > at > org.apache.giraph.yarn.GiraphApplicationMaster.updateGiraphConfForExport(GiraphApplicationMaster.java:421) > at > org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:396) > ... 6 more > {code} > h2. Third problem > A temporary giraph-conf.xml file is created in /tmp/giraph-conf.xml on the > host of the Giraph client submitting a job. However, this file is not deleted > after creation so if different users submit giraph jobs in the same hosts, > one of them will be unable to as it won't be able to write to the temporary > location. > {code} > Exception in thread "pool-2-thread-1" java.lang.IllegalStateException: Could > not configure the containerlaunch context for GiraphYarnTasks. > at > org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:391) > at > org.apache.giraph.yarn.GiraphApplicationMaster.access$500(GiraphApplicationMaster.java:78) > at > org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.buildContainerLaunchContext(GiraphApplicationMaster.java:522) > at > org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.run(GiraphApplicationMaster.java:479) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.FileNotFoundException: /tmp/giraph-conf.xml (Permission > denied) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.<init>(FileOutputStream.java:221) > at java.io.FileOutputStream.<init>(FileOutputStream.java:110) > at > org.apache.giraph.yarn.YarnUtils.exportGiraphConfiguration(YarnUtils.java:235) > at > org.apache.giraph.yarn.GiraphApplicationMaster.updateGiraphConfForExport(GiraphApplicationMaster.java:411) > at > org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:386) > ... 6 more > {code} > And I'm sure there are many more but in the meantime we stopped the debugging. -- This message was sent by Atlassian JIRA (v6.2#6252)