[ 
https://issues.apache.org/jira/browse/GIRAPH-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13940605#comment-13940605
 ] 

Andrey Stepachev commented on GIRAPH-859:
-----------------------------------------

Insecure YARN cluster runs users under nobody (or other custom user). Thats 
don't play well with permissions on hdfs. To solve that I've wrote patch 
https://issues.apache.org/jira/browse/YARN-1853 which enables to runAs even on 
unsecured cluster.

> Yarn-based Giraph user woes
> ---------------------------
>
>                 Key: GIRAPH-859
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-859
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alexandre Fonseca
>              Labels: yarn
>         Attachments: GIRAPH-859-incomplete.patch
>
>
> After a lengthy debugging session with Stefan Beskow due to the following 
> post in the mailing list:
> http://mail-archives.apache.org/mod_mbox/giraph-user/201402.mbox/%3C32c1ea4f88ec4fd2bc0815b012c0de48%40MERCMBX25R.na.SAS.com%3E
> I was able to identify several problems that occur if you submit Giraph jobs 
> using the Yarn framework with the submitting user being different from the 
> Yarn daemons (and HDFS) user. This is under a scenario with no 
> authentication. The delegation tokens used in a secure environment might 
> solve this problem.
> h2. First problem
> Since the client user and the AM master user are different, the AM is unable 
> to find the HDFS files distributed by the client as it'll look at the wrong 
> home directory: /user/<yarn user>/giraph_yarn_jar_cache instead of 
> /user/<client user>/giraph_yarn_jar_cache:
> {code}
> 14/02/20 18:10:25 INFO yarn.GiraphYarnClient: Made local resource for 
> :/r/sanyo.unx.sas.com/vol/vol410/u41/stbesk/snapshot_from_git/jars/giraph-ex.jar
>  to 
> hdfs://el01cn01.unx.sas.com:8020/user/stbesk/giraph_yarn_jar_cache/application_1392713839733_0034/giraph-ex.jar
> {code}
> {code}
> Exception in thread "pool-3-thread-2" java.lang.IllegalStateException: Could 
> not configure the containerlaunch context for GiraphYarnTasks.
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:391)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster.access$500(GiraphApplicationMaster.java:78)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.buildContainerLaunchContext(GiraphApplicationMaster.java:522)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.run(GiraphApplicationMaster.java:479)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:722)
> Caused by: java.io.FileNotFoundException: File does not exist: 
> hdfs://el01cn01.unx.sas.com:8020/user/yarn/giraph_yarn_jar_cache/application_1392713839733_0034/okapi-0.3.2.jar
> {code}
> h2. Second problem
> The AM attempts to rewrite the giraph-conf.xml in the HDFS distributed cache 
> before launching the task containers. Since the client is the one who put 
> that file there on the first place, and default permissions are rw-r--r--, 
> the AM will be unable to rewrite the file unless the Yarn user also happens 
> to be the HDFS superuser. This issue also occurs at the directory level of 
> the distributed cache folder for the application when it tries to delete or 
> write new files.
> {code}
> Exception in thread "pool-3-thread-1" java.lang.IllegalStateException: Could 
> not configure the containerlaunch context for GiraphYarnTasks.
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:401)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster.access$500(GiraphApplicationMaster.java:78)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.buildContainerLaunchContext(GiraphApplicationMaster.java:532)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.run(GiraphApplicationMaster.java:489)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.hadoop.security.AccessControlException: Permission 
> denied: user=yarn, access=WRITE, 
> inode="/user/stbesk/giraph_yarn_jar_cache/application_1392713839733_0044/giraph-conf.xml":stbesk:supergroup:-rw-r--r--
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:234)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:164)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5430)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5412)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:5374)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2178)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2133)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2086)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:499)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:321)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
>         at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>         at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1429)
>         at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1449)
>         at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1374)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:390)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:386)
>         at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:386)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:330)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:907)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:785)
>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:365)
>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338)
>         at 
> org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1904)
>         at 
> org.apache.giraph.yarn.YarnUtils.exportGiraphConfiguration(YarnUtils.java:257)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster.updateGiraphConfForExport(GiraphApplicationMaster.java:421)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:396)
>         ... 6 more
> {code}
> h2. Third problem
> A temporary giraph-conf.xml file is created in /tmp/giraph-conf.xml on the 
> host of the Giraph client submitting a job. However, this file is not deleted 
> after creation so if different users submit giraph jobs in the same hosts, 
> one of them will be unable to as it won't be able to write to the temporary 
> location.
> {code}
> Exception in thread "pool-2-thread-1" java.lang.IllegalStateException: Could 
> not configure the containerlaunch context for GiraphYarnTasks.
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:391)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster.access$500(GiraphApplicationMaster.java:78)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.buildContainerLaunchContext(GiraphApplicationMaster.java:522)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.run(GiraphApplicationMaster.java:479)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.FileNotFoundException: /tmp/giraph-conf.xml (Permission 
> denied)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:110)
>         at 
> org.apache.giraph.yarn.YarnUtils.exportGiraphConfiguration(YarnUtils.java:235)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster.updateGiraphConfForExport(GiraphApplicationMaster.java:411)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:386)
>         ... 6 more
> {code}
> And I'm sure there are many more but in the meantime we stopped the debugging.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to