[
https://issues.apache.org/jira/browse/HDFS-7695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295998#comment-14295998
]
Konstantin Shvachko commented on HDFS-7695:
---
This was partly investigated under HDFS-7611. The simptoms looked similar to
the bug described there.
Different test cases are failing there on different runs, with the same
exception
{code}
java.io.IOException: Timed out waiting for Mini HDFS Cluster to start
at
org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1200)
at
org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1825)
at
org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1786)
at
org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testParentDirWithUCFileDeleteWithSnapShot(TestOpenFilesWithSnapshot.java:89)
{code}
The test
- creates a file and starts adding data
- then aborts the stream
- creates a snapshot while file is not closed
- deletes the file without deleting the snapshot and
- restarts NameNode
The behavior I see from the logs (added extanded logging info) that on restart
NN replays the edits acoording to the steps above. The block are then reported
by DNs, but they remain having 0 replicas, and therefore NN cannot leave
SafeMode.
The missing blocks are supposed to be present, because even though the file was
deleted its snapshot was not. I do not understand why replicas are not added to
the locations when they are reported.
Intermittent failures in TestOpenFilesWithSnapshot
--
Key: HDFS-7695
URL: https://issues.apache.org/jira/browse/HDFS-7695
Project: Hadoop HDFS
Issue Type: Bug
Components: test
Affects Versions: 2.6.0
Reporter: Konstantin Shvachko
This is to investigate intermittent failures of
{{TestOpenFilesWithSnapshot}}, which is timing out on the NameNode restart as
it is unable to leave SafeMode.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)