[
https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176314#comment-14176314
]
Eric Zhiqiang Ma commented on HDFS-7180:
Brandon Li: Thanks. It sounds great to me. I will be waiting for the patch and
try it.
NFSv3 gateway frequently gets stuck
---
Key: HDFS-7180
URL: https://issues.apache.org/jira/browse/HDFS-7180
Project: Hadoop HDFS
Issue Type: Bug
Components: nfs
Affects Versions: 2.5.0
Environment: Linux, Fedora 19 x86-64
Reporter: Eric Zhiqiang Ma
Assignee: Brandon Li
Priority: Critical
We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway
on one node in the cluster to let users upload data with rsync.
However, we find the NFSv3 daemon seems frequently get stuck while the HDFS
seems working well. (hdfds dfs -ls and etc. works just well). The last stuck
we found is after around 1 day running and several hundreds GBs of data
uploaded.
The NFSv3 daemon is started on one node and on the same node the NFS is
mounted.
From the node where the NFS is mounted:
dmsg shows like this:
[1859245.368108] nfs: server localhost not responding, still trying
[1859245.368111] nfs: server localhost not responding, still trying
[1859245.368115] nfs: server localhost not responding, still trying
[1859245.368119] nfs: server localhost not responding, still trying
[1859245.368123] nfs: server localhost not responding, still trying
[1859245.368127] nfs: server localhost not responding, still trying
[1859245.368131] nfs: server localhost not responding, still trying
[1859245.368135] nfs: server localhost not responding, still trying
[1859245.368138] nfs: server localhost not responding, still trying
[1859245.368142] nfs: server localhost not responding, still trying
[1859245.368146] nfs: server localhost not responding, still trying
[1859245.368150] nfs: server localhost not responding, still trying
[1859245.368153] nfs: server localhost not responding, still trying
The mounted directory can not be `ls` and `df -hT` gets stuck too.
The latest lines from the nfs3 log in the hadoop logs directory:
2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated
user map size: 35
2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated
group map size: 54
2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx:
Have to change stable write to unstable write:FILE_SYNC
2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx:
Have to change stable write to unstable write:FILE_SYNC
2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx:
Have to change stable write to unstable write:FILE_SYNC
2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx:
Have to change stable write to unstable write:FILE_SYNC
2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx:
Have to change stable write to unstable write:FILE_SYNC
2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx:
Have to change stable write to unstable write:FILE_SYNC
2014-10-02 05:51:46,750 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx:
Have to change stable write to unstable write:FILE_SYNC
2014-10-02 05:53:23,809 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx:
Have to change stable write to unstable write:FILE_SYNC
2014-10-02 05:53:24,508 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx:
Have to change stable write to unstable write:FILE_SYNC
2014-10-02 05:55:57,334 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx:
Have to change stable write to unstable write:FILE_SYNC
2014-10-02 05:57:07,428 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx:
Have to change stable write to unstable write:FILE_SYNC
2014-10-02 05:58:32,609 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update
cache now
2014-10-02 05:58:32,610 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not
doing static UID/GID mapping because '/etc/nfs.map' does not exist.
2014-10-02 05:58:32,620 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated
user map size: 35
2014-10-02 05:58:32,628 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated
group map size: 54
2014-10-02 06:01:32,098 WARN org.apache.hadoop.hdfs.DFSClient: Slow
ReadProcessor read fields took 60062ms (threshold=3ms); ack: seqno: -2
status: SUCCESS status: ERROR downstreamAckTimeNanos: 0, targets:
[10.0.3.172:50010, 10.0.3.176:50010]
2014-10-02 06:01:32,099 WARN org.apache.hadoop.hdfs.DFSClient:
DFSOutputStream ResponseProcessor exception for block
BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643
java.io.IOException: Bad response ERROR for block
BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643 from