liang yu created HADOOP-19052:
---------------------------------
Summary: Hadoop use Shell command to get the count of the hard
link which takes a lot of time
Key: HADOOP-19052
URL: https://issues.apache.org/jira/browse/HADOOP-19052
Project: Hadoop Common
Issue Type: Improvement
Environment: Hadopp 3.3.4
Spark 2.4.0
Reporter: liang yu
Attachments: image-2024-01-26-16-18-44-969.png,
image-2024-01-26-17-15-32-312.png, image-2024-01-26-17-19-49-805.png
Using Hadoop 3.3.4 and Spark 2.4.0
We use Spark Streaming to append multiple files in hadoop filesystem each
minute, which will cause a lot of append executions. We found that the write
speed in hadoop is very slow. Then we traced some datanodes' log and find that
there is a warning :
{code:java}
Waited above threshold(300 ms) to acq uire lock: lock identifier:
FsDatasetRWock waitTimeMs=518 ms.
Suppressed 13 lock wait warnings. Longest suppressed WaitTimeMs=838.
The stack trace is: java. lang. Thread. getStackTrace (Thread. java: 1559)
{code}
!image-2024-01-26-17-15-32-312.png! Then we traced the method
_org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.append(FsDatasetImpl.
java:1239),_ and print how long each command take to finish the execution, and
find that it takes us 700ms to get the linkCount of the file.
!image-2024-01-26-17-19-49-805.png!
We find that java has to start a new thread to execute a shell command
{code:java}
stat -c%h /path/to/file
{code}
this will take some time because we need to wait for the thread to fork.
I think we can use java native method to get this.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]