liang yu created HADOOP-19052:
---------------------------------

             Summary: Hadoop use Shell command to get the count of the hard 
link which takes a lot of time
                 Key: HADOOP-19052
                 URL: https://issues.apache.org/jira/browse/HADOOP-19052
             Project: Hadoop Common
          Issue Type: Improvement
         Environment: Hadopp 3.3.4

Spark 2.4.0
            Reporter: liang yu
         Attachments: image-2024-01-26-16-18-44-969.png, 
image-2024-01-26-17-15-32-312.png, image-2024-01-26-17-19-49-805.png

Using Hadoop 3.3.4 and Spark 2.4.0

We use Spark Streaming to append multiple files in hadoop filesystem each 
minute, which will cause a lot of append executions.  We found that the write 
speed in hadoop is very slow. Then we traced some datanodes' log and find that 
there is a warning :
{code:java}
Waited above threshold(300 ms) to acq uire lock: lock identifier: 
FsDatasetRWock waitTimeMs=518 ms.
Suppressed 13 lock wait warnings. Longest suppressed WaitTimeMs=838.
The stack trace is: java. lang. Thread. getStackTrace (Thread. java: 1559) 
{code}
!image-2024-01-26-17-15-32-312.png! Then we traced the method 
_org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.append(FsDatasetImpl.
 java:1239),_ and print how long each command take to finish the execution, and 
find that it takes us 700ms to get the linkCount of the file.

!image-2024-01-26-17-19-49-805.png!

We find that java has to start a new thread to execute a shell command 
{code:java}
stat -c%h /path/to/file
{code}
 this will take some time because we need to wait for the thread to fork.

 

I think we can use java native method to get this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to