Vikram Ahuja created HADOOP-19460:
-------------------------------------
Summary: High number of Threads Launched when Calling
fs.getFileStatus() via proxyUser after Kerberos authentication.
Key: HADOOP-19460
URL: https://issues.apache.org/jira/browse/HADOOP-19460
Project: Hadoop Common
Issue Type: Bug
Affects Versions: 3.3.6
Reporter: Vikram Ahuja
We have observed an issue where very large number of threads are being launched
when performing concurrent {{fs.getFileStatus(path) operations}} as proxyUser.
Although this issue was observed in our hive services, we were able to
replicate this issue without hive by writing a sample standalone program which
first logs in via a principal and keytab and then creates a proxy user and
fires concurrent {{fs.getFileStatus(path)}} for a few mins. Eventually when the
concurrency increases it tries to create more threads than max available
threads(ulimit range) and the process eventually slows down.
{code:java}
UserGroupInformation proxyUserUGI = UserGroupInformation.createProxyUser(
"hive", UserGroupInformation.getLoginUser());{code}
In this particular case, when launching 30 concurrent threads calling , the max
number of threads launched by the PID are 6066.
{code:java}
Every 1.0s: ps -eo nlwp,pid,args --sort -nlwp | head
Wed Feb 19 06:12:47 2025
NLWP PID COMMAND
6066 700718 /usr/lib/jvm/java-17-openjdk/bin/java -cp
./test.jar:/usr/hadoop/*:/usr/hadoop/lib/*:/usr/hadoop-hdfs/*
org.apache.hadoop.hive.common.HDFSFileStatusExample hdfs://namenode:8020
principal keytab_location 30
{code}
But the same behaviour is not observed when the same calls are made using the
current userUGI instead of proxyUser.
{code:java}
UserGroupInformation currentUserUgi =
UserGroupInformation.getCurrentUser();{code}
In this case when launching 30 concurrent threads calling , the max number of
threads launched by the PID are 56 and when launched with 500 concurrent
threads the max number of threads launched are 524.
{code:java}
Every 1.0s: ps -eo nlwp,pid,args --sort -nlwp | head
Tue Feb 18 06:23:18 2025NLWP PID COMMAND
56 748244 /usr/lib/jvm/java-17-openjdk/bin/java -cp
./test.jar:/usr/hadoop/*:/usr/hadoop/lib/*:/usr/hadoop-hdfs/*
org.apache.hadoop.hive.common.HDFSFileStatus hdfs://namenode:8020 principal
keytab_location 30
Every 1.0s: ps -eo nlwp,pid,args --sort -nlwp | head
Wed Feb 19 06:19:03 2025NLWP PID COMMAND
524 750984 /usr/lib/jvm/java-17-openjdk/bin/java -cp
./test.jar:/usr/hadoop/*:/usr/hadoop/lib/*:/usr/hadoop-hdfs/*
org.apache.hadoop.hive.common.HDFSFileStatus hdfs://namenode:8020 principal
keytab_location 500{code}
I am attaching both the sample programs where in one case the calls are made by
ProxyUser(issue occurs here) and in another case the call is made by
currentUser(Works fine).
The command line args given for the sample program are:
arg[0] = namenode_host_name:port
arg[1] = principal
arg[2] = keytab_location
arg[3] = Number of threads
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]