[ 
https://issues.apache.org/jira/browse/GIRAPH-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Fonseca updated GIRAPH-859:
-------------------------------------

    Attachment: GIRAPH-859-incomplete.patch

The following patch "solves" the 1st and 3rd problems and partially solves the 
2nd one by hacking around permissions (setting RW permissions on the HDFS 
giraph-conf.xml for all)

However, I feel that the real solution for the 1st and 2nd should be the 
running of the AM and Task containers with the same identity as the client 
user. Unfortunately, I've looked through the hadoop yarn APIs and was unable to 
find a simple function like setUser (although it seems that at some point it 
did exist, but the security problems are rather obvious with such an approach). 
Looking at the MapReduce 2 code, I see extensive usage of DelegationTokens and 
other authentication mechanisms which do appear to do what I propose (feel free 
to correct me if I'm wrong). 

Unfortunately, I'm not familiar with HDFS credential management and 
documentation is somewhat lacking so it would require a greater effort than the 
one I can currently spare. Hopefully, someone could either point in me in the 
right direction or implement it themselves.

As it is, with this patch, everything should work with different client/yarn 
users so long as HDFS permission checking is disabled.

> Yarn-based Giraph user woes
> ---------------------------
>
>                 Key: GIRAPH-859
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-859
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alexandre Fonseca
>              Labels: yarn
>         Attachments: GIRAPH-859-incomplete.patch
>
>
> After a lengthy debugging session with Stefan Beskow due to the following 
> post in the mailing list:
> http://mail-archives.apache.org/mod_mbox/giraph-user/201402.mbox/%3C32c1ea4f88ec4fd2bc0815b012c0de48%40MERCMBX25R.na.SAS.com%3E
> I was able to identify several problems that occur if you submit Giraph jobs 
> using the Yarn framework with the submitting user being different from the 
> Yarn daemons (and HDFS) user. This is under a scenario with no 
> authentication. The delegation tokens used in a secure environment might 
> solve this problem.
> h2. First problem
> Since the client user and the AM master user are different, the AM is unable 
> to find the HDFS files distributed by the client as it'll look at the wrong 
> home directory: /user/<yarn user>/giraph_yarn_jar_cache instead of 
> /user/<client user>/giraph_yarn_jar_cache:
> {code}
> 14/02/20 18:10:25 INFO yarn.GiraphYarnClient: Made local resource for 
> :/r/sanyo.unx.sas.com/vol/vol410/u41/stbesk/snapshot_from_git/jars/giraph-ex.jar
>  to 
> hdfs://el01cn01.unx.sas.com:8020/user/stbesk/giraph_yarn_jar_cache/application_1392713839733_0034/giraph-ex.jar
> {code}
> {code}
> Exception in thread "pool-3-thread-2" java.lang.IllegalStateException: Could 
> not configure the containerlaunch context for GiraphYarnTasks.
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:391)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster.access$500(GiraphApplicationMaster.java:78)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.buildContainerLaunchContext(GiraphApplicationMaster.java:522)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.run(GiraphApplicationMaster.java:479)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:722)
> Caused by: java.io.FileNotFoundException: File does not exist: 
> hdfs://el01cn01.unx.sas.com:8020/user/yarn/giraph_yarn_jar_cache/application_1392713839733_0034/okapi-0.3.2.jar
> {code}
> h2. Second problem
> The AM attempts to rewrite the giraph-conf.xml in the HDFS distributed cache 
> before launching the task containers. Since the client is the one who put 
> that file there on the first place, and default permissions are rw-r--r--, 
> the AM will be unable to rewrite the file unless the Yarn user also happens 
> to be the HDFS superuser. This issue also occurs at the directory level of 
> the distributed cache folder for the application when it tries to delete or 
> write new files.
> {code}
> Exception in thread "pool-3-thread-1" java.lang.IllegalStateException: Could 
> not configure the containerlaunch context for GiraphYarnTasks.
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:401)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster.access$500(GiraphApplicationMaster.java:78)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.buildContainerLaunchContext(GiraphApplicationMaster.java:532)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.run(GiraphApplicationMaster.java:489)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.hadoop.security.AccessControlException: Permission 
> denied: user=yarn, access=WRITE, 
> inode="/user/stbesk/giraph_yarn_jar_cache/application_1392713839733_0044/giraph-conf.xml":stbesk:supergroup:-rw-r--r--
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:234)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:164)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5430)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5412)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:5374)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2178)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2133)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2086)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:499)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:321)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
>         at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>         at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1429)
>         at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1449)
>         at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1374)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:390)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:386)
>         at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:386)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:330)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:907)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:785)
>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:365)
>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338)
>         at 
> org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1904)
>         at 
> org.apache.giraph.yarn.YarnUtils.exportGiraphConfiguration(YarnUtils.java:257)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster.updateGiraphConfForExport(GiraphApplicationMaster.java:421)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:396)
>         ... 6 more
> {code}
> h2. Third problem
> A temporary giraph-conf.xml file is created in /tmp/giraph-conf.xml on the 
> host of the Giraph client submitting a job. However, this file is not deleted 
> after creation so if different users submit giraph jobs in the same hosts, 
> one of them will be unable to as it won't be able to write to the temporary 
> location.
> {code}
> Exception in thread "pool-2-thread-1" java.lang.IllegalStateException: Could 
> not configure the containerlaunch context for GiraphYarnTasks.
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:391)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster.access$500(GiraphApplicationMaster.java:78)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.buildContainerLaunchContext(GiraphApplicationMaster.java:522)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster$LaunchContainerRunnable.run(GiraphApplicationMaster.java:479)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.FileNotFoundException: /tmp/giraph-conf.xml (Permission 
> denied)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:110)
>         at 
> org.apache.giraph.yarn.YarnUtils.exportGiraphConfiguration(YarnUtils.java:235)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster.updateGiraphConfForExport(GiraphApplicationMaster.java:411)
>         at 
> org.apache.giraph.yarn.GiraphApplicationMaster.getTaskResourceMap(GiraphApplicationMaster.java:386)
>         ... 6 more
> {code}
> And I'm sure there are many more but in the meantime we stopped the debugging.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to