Jan Van Besien created MAPREDUCE-7540:
-----------------------------------------

             Summary: JDK24 subject propagation case missed in 
CommitterEventHandler
                 Key: MAPREDUCE-7540
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7540
             Project: Hadoop Map/Reduce
          Issue Type: Bug
            Reporter: Jan Van Besien


HADOOP-19670 converted {{CommitterEventHandler}} to use 
{{{}SubjectInheritingThread{}}}, but the backing {{ThreadFactory}} that creates 
them is only installed when the job classloader is enabled:
{code:java}
  // CommitterEventHandler.serviceStart()
  ThreadFactoryBuilder tfBuilder = new ThreadFactoryBuilder()
      .setNameFormat("CommitterEvent Processor #%d");
  if (jobClassLoader != null) {
    // if the job classloader is enabled, we need to use the job classloader
    // as the thread context classloader (TCCL) of these threads in case the
    // committer needs to load another class via TCCL
    ThreadFactory backingTf = new ThreadFactory() {
      @Override
      public Thread newThread(Runnable r) {
        Thread thread = new SubjectInheritingThread(r);
        thread.setContextClassLoader(jobClassLoader);
        return thread;
      }
    };
    tfBuilder.setThreadFactory(backingTf);
  }
  ThreadFactory tf = tfBuilder.build();
  {code}
When {{mapreduce.job.classloader}} is not enabled, 
{{ThreadFactoryBuilder.build()}} falls back to 
{{{}Executors.defaultThreadFactory(){}}}, so the "CommitterEvent Processor" 
pool threads are plain \{{Thread}}s.

On JDK 24+ (SecurityManager permanently disabled, JEP 486) plain threads no 
longer inherit the Subject bound by {{{}Subject.callAs{}}}. So inside 
{{{}CommitterEventHandler$EventProcessor{}}}, 
{{UserGroupInformation.getCurrentUser()}} finds no Subject and falls back to 
the login user. As a result {{OutputCommitter.setupJob()}} / {{commitJob()}} 
perform their HDFS operations as the NodeManager's OS user (e.g. {{{}yarn{}}}) 
instead of the job submitter.

Observed on Hadoop 3.5.0 running on JDK 25 (see stack trace below).

Note the asymmetry in the stack trace: the NameNode side shows the patched 
{{SubjectInheritingThread}} frames from HADOOP-19668, while the AM side bottoms 
out in a bare {{Thread.run}} with no Subject wrapper frames.
{noformat}
  org.apache.hadoop.security.AccessControlException: Permission denied: 
user=yarn, access=WRITE,
  inode="/tmp/lily/crunch/crunch-1246119727/p27":airflow:lily:drwxrwxr-x
      at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:663)
      at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:503)
      at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermissionWithContext(FSPermissionChecker.java:527)
      at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:397)
      at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1974)
      at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1955)
      at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1914)
      at 
org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:62)
      at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3568)
      at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1176)
      at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideT
  ranslatorPB.java:742)
      at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNa
  menodeProtocolProtos.java)
      at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:631)
      at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:599)
      at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:583)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1228)
      at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1312)
      at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1227)
      at 
java.base/jdk.internal.vm.ScopedValueContainer.callWithoutScope(ScopedValueContainer.java:162)
      at 
java.base/jdk.internal.vm.ScopedValueContainer.call(ScopedValueContainer.java:147)
      at java.base/java.lang.ScopedValue$Carrier.call(ScopedValue.java:419)
      at java.base/javax.security.auth.Subject.callAs(Subject.java:331)
      at 
org.apache.hadoop.security.authentication.util.SubjectUtil.callAs(SubjectUtil.java:242)
      at 
org.apache.hadoop.security.authentication.util.SubjectUtil.doAs(SubjectUtil.java:314)
      at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1958)
      at org.apache.hadoop.ipc.Server$Handler.work(Server.java:3279)
      at 
org.apache.hadoop.util.concurrent.SubjectInheritingThread$1.run(SubjectInheritingThread.java:203)
      at 
org.apache.hadoop.util.concurrent.SubjectInheritingThread$1.run(SubjectInheritingThread.java:199)
      at 
java.base/jdk.internal.vm.ScopedValueContainer.callWithoutScope(ScopedValueContainer.java:162)
      at 
java.base/jdk.internal.vm.ScopedValueContainer.call(ScopedValueContainer.java:147)
      at java.base/java.lang.ScopedValue$Carrier.call(ScopedValue.java:419)
      at java.base/javax.security.auth.Subject.callAs(Subject.java:331)
      at 
org.apache.hadoop.security.authentication.util.SubjectUtil.callAs(SubjectUtil.java:242)
      at 
org.apache.hadoop.security.authentication.util.SubjectUtil.doAs(SubjectUtil.java:275)
      at 
org.apache.hadoop.util.concurrent.SubjectInheritingThread.run(SubjectInheritingThread.java:199)

      at 
java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
      at 
java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
      at 
java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:483)
      at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
      at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
      at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2557)
      at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2531)
      at 
org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1507)
      at 
org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1504)
      at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
      at 
org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1521)
      at 
org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1496)
      at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2497)
      at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.setupJob(FileOutputCommitter.java:361)
      at 
org.apache.crunch.io.CrunchOutputs$CompositeOutputCommitter.setupJob(CrunchOutputs.java:300)
      at
  
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:256)
      at 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:236)
      at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090)
      at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
      at java.base/java.lang.Thread.run(Thread.java:1474)
  {noformat}
{*}fix{*}: always install the backing factory; only the classloader assignment 
should be conditional:
{code:java}
  ThreadFactory backingTf = new ThreadFactory() {
    @Override
    public Thread newThread(Runnable r) {
      Thread thread = new SubjectInheritingThread(r);
      if (jobClassLoader != null) {
        thread.setContextClassLoader(jobClassLoader);
      }
      return thread;
    }
  };
  tfBuilder.setThreadFactory(backingTf);
  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to