[jira] [Created] (HADOOP-19052) Hadoop use Shell command to get the count of the hard link which takes a lot of time

liang yu (Jira) Fri, 26 Jan 2024 01:28:07 -0800

liang yu created HADOOP-19052:
---------------------------------

             Summary: Hadoop use Shell command to get the count of the hard 
link which takes a lot of time
                 Key: HADOOP-19052
                 URL: https://issues.apache.org/jira/browse/HADOOP-19052
             Project: Hadoop Common
          Issue Type: Improvement
         Environment: Hadopp 3.3.4

Spark 2.4.0
Reporter: liang yu
Attachments: image-2024-01-26-16-18-44-969.png,
image-2024-01-26-17-15-32-312.png, image-2024-01-26-17-19-49-805.png

Using Hadoop 3.3.4 and Spark 2.4.0

We use Spark Streaming to append multiple files in hadoop filesystem each
minute, which will cause a lot of append executions. We found that the write
speed in hadoop is very slow. Then we traced some datanodes' log and find that
there is a warning :
{code:java}
Waited above threshold(300 ms) to acq uire lock: lock identifier:
FsDatasetRWock waitTimeMs=518 ms.
Suppressed 13 lock wait warnings. Longest suppressed WaitTimeMs=838.
The stack trace is: java. lang. Thread. getStackTrace (Thread. java: 1559)
{code}
!image-2024-01-26-17-15-32-312.png! Then we traced the method
_org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.append(FsDatasetImpl.
java:1239),_ and print how long each command take to finish the execution, and
find that it takes us 700ms to get the linkCount of the file.

!image-2024-01-26-17-19-49-805.png!

We find that java has to start a new thread to execute a shell command
{code:java}
stat -c%h /path/to/file
{code}
this will take some time because we need to wait for the thread to fork.

I think we can use java native method to get this.

--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-19052) Hadoop use Shell command to get the count of the hard link which takes a lot of time

Reply via email to