[jira] [Commented] (HDFS-2726) Exception in createBlockOutputStream shouldn't delete exception stack trace
[ https://issues.apache.org/jira/browse/HDFS-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177057#comment-13177057 ] Hudson commented on HDFS-2726: -- Integrated in Hadoop-Mapreduce-trunk-Commit #1497 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1497/]) HDFS-2726. Fix a logging issue under DFSClient's createBlockOutputStream method (harsh) harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1225456 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java Exception in createBlockOutputStream shouldn't delete exception stack trace - Key: HDFS-2726 URL: https://issues.apache.org/jira/browse/HDFS-2726 Project: Hadoop HDFS Issue Type: Improvement Reporter: Michael Bieniosek Assignee: Harsh J Fix For: 0.24.0 Attachments: HDFS-2726.patch I'm occasionally (1/5000 times) getting this error after upgrading everything to hadoop-0.18: 08/09/09 03:28:36 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream 08/09/09 03:28:36 INFO dfs.DFSClient: Abandoning block blk_624229997631234952_8205908 DFSClient contains the logging code: LOG.info(Exception in createBlockOutputStream + ie); This would be better written with ie as the second argument to LOG.info, so that the stack trace could be preserved. As it is, I don't know how to start debugging. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2394) Add tests for Namenode active standby states
[ https://issues.apache.org/jira/browse/HDFS-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177063#comment-13177063 ] Suresh Srinivas commented on HDFS-2394: --- You are right. Existing tests cover this. Add tests for Namenode active standby states Key: HDFS-2394 URL: https://issues.apache.org/jira/browse/HDFS-2394 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node, test Affects Versions: 0.24.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-2394) Add tests for Namenode active standby states
[ https://issues.apache.org/jira/browse/HDFS-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas resolved HDFS-2394. --- Resolution: Invalid Add tests for Namenode active standby states Key: HDFS-2394 URL: https://issues.apache.org/jira/browse/HDFS-2394 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node, test Affects Versions: 0.24.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4) DF should use used + available as the capacity of this volume
[ https://issues.apache.org/jira/browse/HDFS-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177079#comment-13177079 ] Harsh J commented on HDFS-4: Are there any other disadvantages you can think of in going with used+available? Any edge cases where that sum may be incorrect to use? DF should use used + available as the capacity of this volume - Key: HDFS-4 URL: https://issues.apache.org/jira/browse/HDFS-4 Project: Hadoop HDFS Issue Type: Bug Environment: UNIX Reporter: Rong-En Fan Labels: newbie Generally speaking, UNIX tends to keep certain percentage of disk space reserved for root used only (can be changed via tune2fs or when mkfs). Therefore, Hadoop's DF class should not use the 1st number in df output as the capacity of this volume. Instead, it should use used+available as its capacity. Otherwise, datanode may think this volume is not full but in fact it is. The code in question is src/core/org/apache/hadoop/fs/DF.java, method parseExecResult() -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-21) unresponsive namenode because of not finding places to replicate
[ https://issues.apache.org/jira/browse/HDFS-21?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-21. - Resolution: Won't Fix This is a clear effect of tweaking dfs.replication.min. You want your HDFS to guarantee X replicas before file is closed, and that's what it will do. Resolving as Won't Fix. unresponsive namenode because of not finding places to replicate Key: HDFS-21 URL: https://issues.apache.org/jira/browse/HDFS-21 Project: Hadoop HDFS Issue Type: Bug Reporter: Christian Kunz We have a 80 node cluster where many nodes started to fail such it went down to 59 live nodes. Originally we had our set of applications 60 times replicated. The cluster size went below the required replication number, and started to become increasingly less responsive, spewing out the following messages at a high rate: WARN org.apache.hadoop.fs.FSNamesystem: Not able to place enough replicas, still in need of 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-10) DFS logging in NameSystem.pendingTransfer consumes all disk space
[ https://issues.apache.org/jira/browse/HDFS-10?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-10. - Resolution: Won't Fix These things help ops determine HDFS activity. If you do not wish to see them ever, you may turn up the logging to a WARN or higher level. Its INFO by default. Resolving as Won't Fix, as these things are useful and yet not too much info to be DEBUG-only. DFS logging in NameSystem.pendingTransfer consumes all disk space - Key: HDFS-10 URL: https://issues.apache.org/jira/browse/HDFS-10 Project: Hadoop HDFS Issue Type: Bug Reporter: Michael Bieniosek Sometimes the namenode goes crazy. I see this in my logs: 2007-04-28 02:40:46,992 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.pendingTransfer: ask x.y.z.243:50010 to replicate blk_-9064654741761822118 to datanode(s) x.y.z.247:50010 2007-04-28 02:40:46,992 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.pendingTransfer: ask x.y.z.243:50010 to replicate blk_-8996500637974689840 to datanode(s) x.y.yz.225:50010 2007-04-28 02:40:46,992 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.pendingTransfer: ask x.y.z.227:50010 to replicate blk_-8870980160272831217 to datanode(s) x.y.z.244:50010 2007-04-28 02:40:46,992 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.pendingTransfer: ask x.y.z.227:50010 to replicate blk_-8721101562083234290 to datanode(s) x.y.z.250:50010 2007-04-28 02:40:46,992 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.pendingTransfer: ask x.y.z.250:50010 to replicate blk_-9044741671491162229 to datanode(s) x.y.z.244:50010 There are on the order of 10k/sec until the machine runs out of disk space. I notice that in FSNamesystem.java, about 10 lines above this line is logged, there is a comment: // // Move the block-replication into a pending state. // The reason we use 'pending' is so we can retry // replications that fail after an appropriate amount of time. // (REMIND - mjc - this timer is not yet implemented.) // -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2727) hdfs.c uses deprecated property dfs.block.size
hdfs.c uses deprecated property dfs.block.size -- Key: HDFS-2727 URL: https://issues.apache.org/jira/browse/HDFS-2727 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 0.23.0 Reporter: Sho Shimauchi Priority: Minor hdfs.c uses deprecated property dfs.block.size. It should use new property dfs.blocksize instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-9) distcp job failed
[ https://issues.apache.org/jira/browse/HDFS-9?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-9. Resolution: Incomplete This could've very well been a transient issue. Lets open a new JIRA if this is too frequent. This one has gone stale over the versions. distcp job failed - Key: HDFS-9 URL: https://issues.apache.org/jira/browse/HDFS-9 Project: Hadoop HDFS Issue Type: Bug Reporter: Runping Qi I was running distcp to copy data from one dfs to another. The job failed with the following exception in the mappers: java.net.SocketException: Connection reset at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1633) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1720) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64) at org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.copy(CopyFiles.java:305) at org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.map(CopyFiles.java:352) at org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.map(CopyFiles.java:217) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:195) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1750) I examined the data node logs of the target dfs. I saw a lot of exceptions like: 2007-10-12 15:04:09,109 ERROR org.apache.hadoop.dfs.DataNode: DataXceiver: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:1365) at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:897) at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:763) at java.lang.Thread.run(Thread.java:619) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2727) hdfs.c uses deprecated property dfs.block.size
[ https://issues.apache.org/jira/browse/HDFS-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177095#comment-13177095 ] Harsh J commented on HDFS-2727: --- It should not rely on properties for dfs.blocksize and dfs.replication, and instead fetch those from the jFS object itself, via the getDefaultBlockSize and getDefaultReplication API calls. This will help avoid maintenance in future :) hdfs.c uses deprecated property dfs.block.size -- Key: HDFS-2727 URL: https://issues.apache.org/jira/browse/HDFS-2727 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 0.23.0 Reporter: Sho Shimauchi Priority: Minor hdfs.c uses deprecated property dfs.block.size. It should use new property dfs.blocksize instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1314) dfs.block.size accepts only absolute value
[ https://issues.apache.org/jira/browse/HDFS-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sho Shimauchi updated HDFS-1314: Attachment: hdfs-1314.txt attached * revert hdfs.c * add more info to hdfs-default.xml and cluster_setup.xml dfs.block.size accepts only absolute value -- Key: HDFS-1314 URL: https://issues.apache.org/jira/browse/HDFS-1314 Project: Hadoop HDFS Issue Type: Bug Reporter: Karim Saadah Assignee: Sho Shimauchi Priority: Minor Labels: newbie Attachments: hdfs-1314.txt, hdfs-1314.txt, hdfs-1314.txt Using dfs.block.size=8388608 works but dfs.block.size=8mb does not. Using dfs.block.size=8mb should throw some WARNING on NumberFormatException. (http://pastebin.corp.yahoo.com/56129) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-67) /tmp/hadoop-${user}/dfs/tmp/tmp/client-${long}.tmp is not cleanup correctly
[ https://issues.apache.org/jira/browse/HDFS-67?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-67. - Resolution: Not A Problem Not a problem after Dhruba's HDFS-1707. /tmp/hadoop-${user}/dfs/tmp/tmp/client-${long}.tmp is not cleanup correctly --- Key: HDFS-67 URL: https://issues.apache.org/jira/browse/HDFS-67 Project: Hadoop HDFS Issue Type: Bug Reporter: Benjamin Francisoud Attachments: patch-DFSClient-HADOOP-2561.diff Diretory /tmp/hadoop-${user}/dfs/tmp/tmp is being filled with those kinfd of files: client-226966559287638337420857.tmp I tried to look at the code and found: h3. DFSClient.java src/java/org/apache/hadoop/dfs/DFSClient.java {code:java} private void closeBackupStream() throws IOException {...} /* Similar to closeBackupStream(). Theoritically deleting a file * twice could result in deleting a file that we should not. */ private void deleteBackupFile() {...} private File newBackupFile() throws IOException { String name = tmp + File.separator + client- + Math.abs(r.nextLong()); File result = dirAllocator.createTmpFileForWrite(name, 2 * blockSize, conf); return result; } {code} h3. LocalDirAllocator src/java/org/apache/hadoop/fs/LocalDirAllocator.java#AllocatorPerContext.java {code:java} /** Creates a file on the local FS. Pass size as -1 if not known apriori. We * round-robin over the set of disks (via the configured dirs) and return * a file on the first path which has enough space. The file is guaranteed * to go away when the JVM exits. */ public File createTmpFileForWrite(String pathStr, long size, Configuration conf) throws IOException { // find an appropriate directory Path path = getLocalPathForWrite(pathStr, size, conf); File dir = new File(path.getParent().toUri().getPath()); String prefix = path.getName(); // create a temp file on this directory File result = File.createTempFile(prefix, null, dir); result.deleteOnExit(); return result; } {code} First it seems to me it's a bit of a mess here I don't know if it's DFSClient.java#deleteBackupFile() or LocalDirAllocator#createTmpFileForWrite() {deleteOnExit(); ) who is call ... or both. Why not keep it dry and delete it only once. But the most important is the deleteOnExit(); since it mean if it is never restarted it will never delete files :( -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-67) /tmp/hadoop-${user}/dfs/tmp/tmp/client-${long}.tmp is not cleanup correctly
[ https://issues.apache.org/jira/browse/HDFS-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177204#comment-13177204 ] Harsh J commented on HDFS-67: - Er, make that HADOOP-1707 sorry. /tmp/hadoop-${user}/dfs/tmp/tmp/client-${long}.tmp is not cleanup correctly --- Key: HDFS-67 URL: https://issues.apache.org/jira/browse/HDFS-67 Project: Hadoop HDFS Issue Type: Bug Reporter: Benjamin Francisoud Attachments: patch-DFSClient-HADOOP-2561.diff Diretory /tmp/hadoop-${user}/dfs/tmp/tmp is being filled with those kinfd of files: client-226966559287638337420857.tmp I tried to look at the code and found: h3. DFSClient.java src/java/org/apache/hadoop/dfs/DFSClient.java {code:java} private void closeBackupStream() throws IOException {...} /* Similar to closeBackupStream(). Theoritically deleting a file * twice could result in deleting a file that we should not. */ private void deleteBackupFile() {...} private File newBackupFile() throws IOException { String name = tmp + File.separator + client- + Math.abs(r.nextLong()); File result = dirAllocator.createTmpFileForWrite(name, 2 * blockSize, conf); return result; } {code} h3. LocalDirAllocator src/java/org/apache/hadoop/fs/LocalDirAllocator.java#AllocatorPerContext.java {code:java} /** Creates a file on the local FS. Pass size as -1 if not known apriori. We * round-robin over the set of disks (via the configured dirs) and return * a file on the first path which has enough space. The file is guaranteed * to go away when the JVM exits. */ public File createTmpFileForWrite(String pathStr, long size, Configuration conf) throws IOException { // find an appropriate directory Path path = getLocalPathForWrite(pathStr, size, conf); File dir = new File(path.getParent().toUri().getPath()); String prefix = path.getName(); // create a temp file on this directory File result = File.createTempFile(prefix, null, dir); result.deleteOnExit(); return result; } {code} First it seems to me it's a bit of a mess here I don't know if it's DFSClient.java#deleteBackupFile() or LocalDirAllocator#createTmpFileForWrite() {deleteOnExit(); ) who is call ... or both. Why not keep it dry and delete it only once. But the most important is the deleteOnExit(); since it mean if it is never restarted it will never delete files :( -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2729) Update BlockManager's comments regarding the invalid block set
[ https://issues.apache.org/jira/browse/HDFS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177315#comment-13177315 ] Hudson commented on HDFS-2729: -- Integrated in Hadoop-Hdfs-trunk-Commit #1552 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1552/]) HDFS-2729. Update BlockManager's comments regarding the invalid block set (harsh) harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1225591 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java Update BlockManager's comments regarding the invalid block set -- Key: HDFS-2729 URL: https://issues.apache.org/jira/browse/HDFS-2729 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.23.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Fix For: 0.24.0 Attachments: HDFS-2729.patch Looks like after HDFS-82 was covered at some point, the comments and logs still carry presence of two sets when there really is just one set. This patch changes the logs and comments to be more accurate about that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-5) Check that network topology is updated when new data-nodes are joining the cluster
[ https://issues.apache.org/jira/browse/HDFS-5?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-5. Resolution: Cannot Reproduce The mapping is done pretty much properly as far as I've noticed. With caching enabled though, one needs to restart the NN to get it in proper effect. Check that network topology is updated when new data-nodes are joining the cluster -- Key: HDFS-5 URL: https://issues.apache.org/jira/browse/HDFS-5 Project: Hadoop HDFS Issue Type: Bug Reporter: Konstantin Shvachko There is a suspicion that network topology is not updated if new racks are added to the cluster. We should investigate and either confirm or rule out this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-58) DistributedFileSystem.listPaths with some paths causes directory to be cleared
[ https://issues.apache.org/jira/browse/HDFS-58?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-58. - Resolution: Cannot Reproduce This has gone stale, and looking at listStatus impls, looks like it could not happen. Can't reproduce, closing out. DistributedFileSystem.listPaths with some paths causes directory to be cleared -- Key: HDFS-58 URL: https://issues.apache.org/jira/browse/HDFS-58 Project: Hadoop HDFS Issue Type: Bug Environment: Linux Reporter: Bryan Duxbury I am currently writing a Ruby wrapper to the Java DFS client libraries via JNI. While attempting to test the listPaths method of the FileSystem class, I discovered that passing a Path URI like hdfs://tf11:7276/user/rapleaf results in the /user/rapleaf directory being cleared of all contents. A path URI like hdfs://tf11:7276/user/rapleaf/* will list the contents of the directory without damage. I have verified this by creating directories and listing via the bin/hadoop dfs -ls command. Obviously passing an incorrectly formatted string a method that should be read-only should not have destructive effects. Also, the actual required path syntax for listings should be recorded in the documentation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-97) DFS should detect slow links(nodes) and avoid them
[ https://issues.apache.org/jira/browse/HDFS-97?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-97. - Resolution: Not A Problem We do tend to avoid highly loaded DataNodes (via xceiver counts) which may almost do the same operation. Resolving as not a problem. DFS should detect slow links(nodes) and avoid them -- Key: HDFS-97 URL: https://issues.apache.org/jira/browse/HDFS-97 Project: Hadoop HDFS Issue Type: Bug Reporter: Runping Qi The current DFS does not detect slow links (nodes). Thus, when a node or its network link is slow, it may affect the overall system performance significantly. Specifically, when a map job needs to read data from such a node, it may progress 10X slower. And when a DFS data node pipeline consists of such a node, the write performance degrades significantly. This may lead to some long tails for map/reduce jobs. We have experienced such behaviors quite often. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1314) dfs.block.size accepts only absolute value
[ https://issues.apache.org/jira/browse/HDFS-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177112#comment-13177112 ] Harsh J commented on HDFS-1314: --- Not sure why the patch failed. Perhaps its cause of the docs change in hadoop-common instead? Could you submit same patch without that change alone? I'll add back in later when committing (and will upload cumulative when am doing that). dfs.block.size accepts only absolute value -- Key: HDFS-1314 URL: https://issues.apache.org/jira/browse/HDFS-1314 Project: Hadoop HDFS Issue Type: Bug Reporter: Karim Saadah Assignee: Sho Shimauchi Priority: Minor Labels: newbie Attachments: hdfs-1314.txt, hdfs-1314.txt, hdfs-1314.txt Using dfs.block.size=8388608 works but dfs.block.size=8mb does not. Using dfs.block.size=8mb should throw some WARNING on NumberFormatException. (http://pastebin.corp.yahoo.com/56129) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2729) Update BlockManager's comments regarding the invalid block set
[ https://issues.apache.org/jira/browse/HDFS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177288#comment-13177288 ] Hudson commented on HDFS-2729: -- Integrated in Hadoop-Mapreduce-trunk-Commit #1501 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1501/]) HDFS-2729. Update BlockManager's comments regarding the invalid block set (harsh) harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1225591 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java Update BlockManager's comments regarding the invalid block set -- Key: HDFS-2729 URL: https://issues.apache.org/jira/browse/HDFS-2729 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.23.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Fix For: 0.24.0 Attachments: HDFS-2729.patch Looks like after HDFS-82 was covered at some point, the comments and logs still carry presence of two sets when there really is just one set. This patch changes the logs and comments to be more accurate about that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2726) Exception in createBlockOutputStream shouldn't delete exception stack trace
[ https://issues.apache.org/jira/browse/HDFS-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177174#comment-13177174 ] Hudson commented on HDFS-2726: -- Integrated in Hadoop-Mapreduce-trunk #942 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/942/]) HDFS-2726. Fix a logging issue under DFSClient's createBlockOutputStream method (harsh) harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1225456 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java Exception in createBlockOutputStream shouldn't delete exception stack trace - Key: HDFS-2726 URL: https://issues.apache.org/jira/browse/HDFS-2726 Project: Hadoop HDFS Issue Type: Improvement Reporter: Michael Bieniosek Assignee: Harsh J Fix For: 0.24.0 Attachments: HDFS-2726.patch I'm occasionally (1/5000 times) getting this error after upgrading everything to hadoop-0.18: 08/09/09 03:28:36 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream 08/09/09 03:28:36 INFO dfs.DFSClient: Abandoning block blk_624229997631234952_8205908 DFSClient contains the logging code: LOG.info(Exception in createBlockOutputStream + ie); This would be better written with ie as the second argument to LOG.info, so that the stack trace could be preserved. As it is, I don't know how to start debugging. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2729) Update BlockManager's comments regarding the invalid block set
[ https://issues.apache.org/jira/browse/HDFS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177317#comment-13177317 ] Hudson commented on HDFS-2729: -- Integrated in Hadoop-Common-trunk-Commit #1480 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1480/]) HDFS-2729. Update BlockManager's comments regarding the invalid block set (harsh) harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1225591 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java Update BlockManager's comments regarding the invalid block set -- Key: HDFS-2729 URL: https://issues.apache.org/jira/browse/HDFS-2729 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.23.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Fix For: 0.24.0 Attachments: HDFS-2729.patch Looks like after HDFS-82 was covered at some point, the comments and logs still carry presence of two sets when there really is just one set. This patch changes the logs and comments to be more accurate about that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HDFS-97) DFS should detect slow links(nodes) and avoid them
[ https://issues.apache.org/jira/browse/HDFS-97?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J reopened HDFS-97: - Oh well, didn't notice the 'read' issue too. We cover writes with that, not reads. Reopening. DFS should detect slow links(nodes) and avoid them -- Key: HDFS-97 URL: https://issues.apache.org/jira/browse/HDFS-97 Project: Hadoop HDFS Issue Type: Bug Reporter: Runping Qi The current DFS does not detect slow links (nodes). Thus, when a node or its network link is slow, it may affect the overall system performance significantly. Specifically, when a map job needs to read data from such a node, it may progress 10X slower. And when a DFS data node pipeline consists of such a node, the write performance degrades significantly. This may lead to some long tails for map/reduce jobs. We have experienced such behaviors quite often. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-36) Handling of deprecated dfs.info.bindAddress and dfs.info.port
[ https://issues.apache.org/jira/browse/HDFS-36?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-36. - Resolution: Cannot Reproduce Can't reproduce on 1.0+. Setting dfs.http(s).address suffices. Handling of deprecated dfs.info.bindAddress and dfs.info.port - Key: HDFS-36 URL: https://issues.apache.org/jira/browse/HDFS-36 Project: Hadoop HDFS Issue Type: Bug Environment: Windows XP Reporter: Cagdas Gerede Priority: Minor When checkpointing is triggered in Secondary name node, Secondary name node throws exception while it tries to connect to Namenode's http server in the following two cases: 1) In hadoop-site.xml, if you put only dfs.http.address but not dfs.info.bindAddress and dfs.info.port (Connection Refused Exception) 2) In hadoop-site.xml, if you put only dfs.info.bindAddress and dfs.info.port but not dfs.http.address (SecondaryNameNode.getServerAddress line 148 throws exception since newAddrPort is null) Temporary Solution: If you put dfs.http.address, dfs.info.bindAddress, and dfs.info.port, then SecondaryNameNode successfully fetches the image and log from Namenode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2580) NameNode#main(...) can make use of GenericOptionsParser.
[ https://issues.apache.org/jira/browse/HDFS-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-2580: -- Status: Open (was: Patch Available) NameNode#main(...) can make use of GenericOptionsParser. Key: HDFS-2580 URL: https://issues.apache.org/jira/browse/HDFS-2580 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.23.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Fix For: 0.24.0 Attachments: HDFS-2580.patch DataNode supports passing generic opts when calling via {{hdfs datanode}}. NameNode can support the same thing as well, but doesn't right now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-61) Datanode shutdown is called multiple times
[ https://issues.apache.org/jira/browse/HDFS-61?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-61. - Resolution: Cannot Reproduce On trunk, looks like we only call it once now. This has gone stale, closing out. Datanode shutdown is called multiple times --- Key: HDFS-61 URL: https://issues.apache.org/jira/browse/HDFS-61 Project: Hadoop HDFS Issue Type: Bug Reporter: Suresh Srinivas Assignee: Suresh Srinivas - When DataNode gets {{IncorrectVersionException}} in {{DataNode.offerService()}} {{DataNode.shutdown()}} is called - In {{DataNode.processCommand()}} when DataNode gets DNA_SHUTDOWN, {{DataNode.shutdown()}} is called {{DataNode.shutdown()}} is again called in {{DataNode.run()}} method -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1910) when dfs.name.dir and dfs.name.edits.dir are same fsimage will be saved twice every time
[ https://issues.apache.org/jira/browse/HDFS-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177126#comment-13177126 ] Hudson commented on HDFS-1910: -- Integrated in Hadoop-Hdfs-22-branch #124 (See [https://builds.apache.org/job/Hadoop-Hdfs-22-branch/124/]) Remove erroneously added file while commit to HDFS-1910. HDFS-1910. NameNdoe should not save fsimage twice. Contributed by Konstantin Shvachko. Revert. Refers to wrong jira HDFS-1910. HDFS-1910. NameNdoe should not save fsimage twice. Contributed by Konstantin Shvachko. shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1225342 Files : * /hadoop/common/branches/branch-0.22/hdfs/bin/hadoop shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1225337 Files : * /hadoop/common/branches/branch-0.22/hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.22/hdfs/bin/hadoop * /hadoop/common/branches/branch-0.22/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1225336 Files : * /hadoop/common/branches/branch-0.22/hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.22/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * /hadoop/common/branches/branch-0.22/hdfs/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestBackupNode.java shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1225333 Files : * /hadoop/common/branches/branch-0.22/hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.22/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * /hadoop/common/branches/branch-0.22/hdfs/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestBackupNode.java when dfs.name.dir and dfs.name.edits.dir are same fsimage will be saved twice every time Key: HDFS-1910 URL: https://issues.apache.org/jira/browse/HDFS-1910 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.21.0 Reporter: Gokul Priority: Minor Labels: critical-0.22.0 Fix For: 0.22.1 Attachments: saveImageOnce-v0.22.patch when image and edits dir are configured same, the fsimage flushing from memory to disk will be done twice whenever saveNamespace is done. this may impact the performance of backupnode/snn where it does a saveNamespace during every checkpointing time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2728) Remove dfsadmin -printTopology from branch-1 docs since it does not exist
[ https://issues.apache.org/jira/browse/HDFS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-2728: -- Status: Patch Available (was: Open) Remove dfsadmin -printTopology from branch-1 docs since it does not exist - Key: HDFS-2728 URL: https://issues.apache.org/jira/browse/HDFS-2728 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 1.0.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Attachments: HDFS-2728.patch It is documented we have -printTopology but we do not really have it in this branch. Possible docs mixup from somewhere in security branch pre-merge? {code} ➜ branch-1 grep printTopology -R . ./src/docs/src/documentation/content/xdocs/.svn/text-base/hdfs_user_guide.xml.svn-base: code-printTopology/code ./src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml: code-printTopology/code {code} Lets remove the reference. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2729) Update BlockManager's comments regarding the invalid block set
[ https://issues.apache.org/jira/browse/HDFS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-2729: -- Resolution: Fixed Fix Version/s: 0.24.0 Status: Resolved (was: Patch Available) Committed revision 1225591. Thanks Eli! Update BlockManager's comments regarding the invalid block set -- Key: HDFS-2729 URL: https://issues.apache.org/jira/browse/HDFS-2729 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.23.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Fix For: 0.24.0 Attachments: HDFS-2729.patch Looks like after HDFS-82 was covered at some point, the comments and logs still carry presence of two sets when there really is just one set. This patch changes the logs and comments to be more accurate about that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-57) A Datanode's datadir could have lots of blocks in the top-level directory
[ https://issues.apache.org/jira/browse/HDFS-57?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-57. - Resolution: Not A Problem Not a problem in the current {{FSDataset}} operations (neither on branch-1). A Datanode's datadir could have lots of blocks in the top-level directory - Key: HDFS-57 URL: https://issues.apache.org/jira/browse/HDFS-57 Project: Hadoop HDFS Issue Type: Bug Reporter: dhruba borthakur When a datanode restarts, it moves all the blocks from the datadir's tmp directory into the top-level of the datadir. It does not move these blocks into subdirectories of the datadir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-82) recentInvalidateSets in FSNamesystem is not required
[ https://issues.apache.org/jira/browse/HDFS-82?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-82. - Resolution: Not A Problem This has been resolved on trunk. We only have one set. recentInvalidateSets in FSNamesystem is not required - Key: HDFS-82 URL: https://issues.apache.org/jira/browse/HDFS-82 Project: Hadoop HDFS Issue Type: Bug Reporter: Raghu Angadi See HADOOP-2576 for more background. When a file is deleted, blocks are first placed in recentInvalidateSets and then later computeDatanodeWork moves it to 'invalidateSet' for each datanode. I could not see why a block is placed in this intermediate set. I think it is confusing as well.. for example, -metasave prints blocks from only one list. Unless we read very carefully its not easy to figure out that there are two lists. My proposal is to keep only one of them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2728) Remove dfsadmin -printTopology from branch-1 docs since it does not exist
[ https://issues.apache.org/jira/browse/HDFS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2728: -- Status: Open (was: Patch Available) Remove dfsadmin -printTopology from branch-1 docs since it does not exist - Key: HDFS-2728 URL: https://issues.apache.org/jira/browse/HDFS-2728 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 1.0.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Attachments: HDFS-2728.patch It is documented we have -printTopology but we do not really have it in this branch. Possible docs mixup from somewhere in security branch pre-merge? {code} ➜ branch-1 grep printTopology -R . ./src/docs/src/documentation/content/xdocs/.svn/text-base/hdfs_user_guide.xml.svn-base: code-printTopology/code ./src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml: code-printTopology/code {code} Lets remove the reference. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2654) Make BlockReaderLocal not extend RemoteBlockReader2
[ https://issues.apache.org/jira/browse/HDFS-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HDFS-2654: - Description: The BlockReaderLocal code paths are easier to understand (especially true on branch-1 where BlockReaderLocal inherits code from BlockerReader and FSInputChecker) if the local and remote block reader implementations are independent, and they're not really sharing much code anyway. If for some reason they start to share significant code we can make the BlockReader interface an abstract class. (was: The BlockReaderLocal code paths are easier to understand (especially true on branch-1 where BlockReaderLocal inherits code from BlockerReader and FSInputChecker) if the local and remote block reader implementations are independent, and they're not really sharing much code anyway. If for some reason they start to share sifnificant code we can make the BlockReader interface an abstract class.) Target Version/s: 0.23.1, 1.1.0 (was: 1.1.0, 0.23.1) Make BlockReaderLocal not extend RemoteBlockReader2 --- Key: HDFS-2654 URL: https://issues.apache.org/jira/browse/HDFS-2654 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Affects Versions: 0.23.1, 1.0.0 Reporter: Eli Collins Assignee: Eli Collins Attachments: hdfs-2654-1.patch, hdfs-2654-2.patch, hdfs-2654-2.patch, hdfs-2654-2.patch, hdfs-2654-3.patch, hdfs-2654-b1-1.patch, hdfs-2654-b1-2.patch, hdfs-2654-b1-3.patch, hdfs-2654-b1-4-fix.patch, hdfs-2654-b1-4.patch The BlockReaderLocal code paths are easier to understand (especially true on branch-1 where BlockReaderLocal inherits code from BlockerReader and FSInputChecker) if the local and remote block reader implementations are independent, and they're not really sharing much code anyway. If for some reason they start to share significant code we can make the BlockReader interface an abstract class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2698) BackupNode is downloading image from NameNode for every checkpoint
[ https://issues.apache.org/jira/browse/HDFS-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177125#comment-13177125 ] Hudson commented on HDFS-2698: -- Integrated in Hadoop-Hdfs-22-branch #124 (See [https://builds.apache.org/job/Hadoop-Hdfs-22-branch/124/]) HDFS-2698. BackupNode is downloading image from NameNode for every checkpoint. Contributed by Konstantin Shvachko. shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1225340 Files : * /hadoop/common/branches/branch-0.22/hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.22/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * /hadoop/common/branches/branch-0.22/hdfs/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestBackupNode.java BackupNode is downloading image from NameNode for every checkpoint -- Key: HDFS-2698 URL: https://issues.apache.org/jira/browse/HDFS-2698 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.22.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Fix For: 0.22.1 Attachments: rollFSImage.patch, rollFSImage.patch BackupNode can make periodic checkpoints without downloading image and edits files from the NameNode, but with just saving the namespace to local disks. This is not happening because NN renews checkpoint time after every checkpoint, thus making its image ahead of the BN's even though they are in sync. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2729) Update BlockManager's comments regarding the invalid block set
Update BlockManager's comments regarding the invalid block set -- Key: HDFS-2729 URL: https://issues.apache.org/jira/browse/HDFS-2729 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.23.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Looks like after HDFS-82 was covered at some point, the comments and logs still carry presence of two sets when there really is just one set. This patch changes the logs and comments to be more accurate about that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-19) Unhandled exceptions in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-19?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-19. - Resolution: Invalid This has gone stale. I do not find these methods in the current DFSOutputStream. Do open a new one if there is still trouble with the newer impl. Unhandled exceptions in DFSClient - Key: HDFS-19 URL: https://issues.apache.org/jira/browse/HDFS-19 Project: Hadoop HDFS Issue Type: Bug Reporter: Konstantin Shvachko DFSOutputStream.handleSocketException() does not handle exceptions thrown inside it by abandonBlock(). I'd propose to retry abandonBlock() in case of timeout. In case of DFSOutputStream.close() the exception in handleSocketException() will result in calling abandonFileInProgress(). In a similar case of DFSOutputStream.flush() the file will not be abandoned. Exceptions thrown by abandonFileInProgress() are not handled either. Feels like we need a general mechanism for handling all these things. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2726) Exception in createBlockOutputStream shouldn't delete exception stack trace
[ https://issues.apache.org/jira/browse/HDFS-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177141#comment-13177141 ] Hudson commented on HDFS-2726: -- Integrated in Hadoop-Hdfs-trunk #909 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/909/]) HDFS-2726. Fix a logging issue under DFSClient's createBlockOutputStream method (harsh) harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1225456 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java Exception in createBlockOutputStream shouldn't delete exception stack trace - Key: HDFS-2726 URL: https://issues.apache.org/jira/browse/HDFS-2726 Project: Hadoop HDFS Issue Type: Improvement Reporter: Michael Bieniosek Assignee: Harsh J Fix For: 0.24.0 Attachments: HDFS-2726.patch I'm occasionally (1/5000 times) getting this error after upgrading everything to hadoop-0.18: 08/09/09 03:28:36 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream 08/09/09 03:28:36 INFO dfs.DFSClient: Abandoning block blk_624229997631234952_8205908 DFSClient contains the logging code: LOG.info(Exception in createBlockOutputStream + ie); This would be better written with ie as the second argument to LOG.info, so that the stack trace could be preserved. As it is, I don't know how to start debugging. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-22) Help information of refreshNodes does not show how to decomission nodes
[ https://issues.apache.org/jira/browse/HDFS-22?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-22. - Resolution: Not A Problem The current docs: {code} Updates the set of hosts allowed to connect to namenode. Re-reads the config file to update values defined by dfs.hosts and dfs.host.exclude and reads the entires (hostnames) in those files. Each entry not defined in dfs.hosts but in dfs.hosts.exclude is decommissioned. Each entry defined in dfs.hosts and also in dfs.host.exclude is stopped from decommissioning if it has aleady been marked for decommission. Entires not present in both the lists are decommissioned. {code} Covers it pretty much I think? Please reopen if not. Help information of refreshNodes does not show how to decomission nodes --- Key: HDFS-22 URL: https://issues.apache.org/jira/browse/HDFS-22 Project: Hadoop HDFS Issue Type: Bug Environment: hadoop 0.19.1, jdk 1.6, CentOS 5.2 Reporter: Wang Xu Assignee: Wang Xu Attachments: refreshNodes.patch The help information does not indicate how to decommission nodes. It only describes two scenarios: * to stop nodes if not in dfs.hosts * stop decommissioning if node is decommissioning and in both dfs.hosts and dfs.host.exclude but omits this one: * starting decommissioning if node is in service and in both dfs.hosts and dfs.host.exclude It would better describe as Each entry defined in dfs.hosts and also in dfs.host.exclude is start decommissioning and start block replication if it is in service, or is stopped from decommissioning if it has already been marked for decommission. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2728) Remove dfsadmin -printTopology from branch-1 docs since it does not exist
[ https://issues.apache.org/jira/browse/HDFS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-2728: -- Attachment: HDFS-2728.patch Remove dfsadmin -printTopology from branch-1 docs since it does not exist - Key: HDFS-2728 URL: https://issues.apache.org/jira/browse/HDFS-2728 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 1.0.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Attachments: HDFS-2728.patch It is documented we have -printTopology but we do not really have it in this branch. Possible docs mixup from somewhere in security branch pre-merge? {code} ➜ branch-1 grep printTopology -R . ./src/docs/src/documentation/content/xdocs/.svn/text-base/hdfs_user_guide.xml.svn-base: code-printTopology/code ./src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml: code-printTopology/code {code} Lets remove the reference. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-102) high cpu usage in ReplicationMonitor thread
[ https://issues.apache.org/jira/browse/HDFS-102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-102. -- Resolution: Cannot Reproduce This has gone stale. The current structure within BlockManager isn't a list anymore, and we haven't seen this kinda behavior in quite a while. high cpu usage in ReplicationMonitor thread Key: HDFS-102 URL: https://issues.apache.org/jira/browse/HDFS-102 Project: Hadoop HDFS Issue Type: Bug Reporter: Koji Noguchi We had a namenode stuck in CPU 99% and it was showing a slow response time. (dfs.namenode.handler.count was still set to 10.) ReplicationMonitor thread was using the most CPU time. Jstack showed, org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor@1c7b0f4d daemon prio=10 tid=0x002d90690800 nid=0x4855 runnable [0x41941000..0x41941b30] java.lang.Thread.State: RUNNABLE at java.util.AbstractList$Itr.remove(AbstractList.java:360) at org.apache.hadoop.dfs.FSNamesystem.blocksToInvalidate(FSNamesystem.java:2475) - locked 0x002a9f522038 (a org.apache.hadoop.dfs.FSNamesystem) at org.apache.hadoop.dfs.FSNamesystem.computeDatanodeWork(FSNamesystem.java:1775) at org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:1713) at java.lang.Thread.run(Thread.java:619) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-20) fsck path -delete doesn't report failures
[ https://issues.apache.org/jira/browse/HDFS-20?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-20. - Resolution: Not A Problem Currently in NamenodeFsck, if any operation under check() throws an exception, I can verify it is definitely logged. Not a problem anymore. fsck path -delete doesn't report failures --- Key: HDFS-20 URL: https://issues.apache.org/jira/browse/HDFS-20 Project: Hadoop HDFS Issue Type: Bug Reporter: Owen O'Malley Assignee: Sameer Paranjpye When I have safemode on and do fsck / -delete, it legitimately fails on the first delete. However, the fsck stops and does not report the failure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-146) Regression: TestInjectionForSimulatedStorage fails with IllegalMonitorStateException
[ https://issues.apache.org/jira/browse/HDFS-146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-146. -- Resolution: Cannot Reproduce Hasn't had a failure report in two years now. Gone stale, closing out this and related issues. Regression: TestInjectionForSimulatedStorage fails with IllegalMonitorStateException Key: HDFS-146 URL: https://issues.apache.org/jira/browse/HDFS-146 Project: Hadoop HDFS Issue Type: Bug Reporter: gary murry org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage.testInjection fails with IllegalMonitorStateException Stacktrace java.lang.IllegalMonitorStateException at java.lang.Object.notifyAll(Native Method) at org.apache.hadoop.ipc.Server.stop(Server.java:1110) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:574) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:569) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:553) at org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage.testInjection(TestInjectionForSimulatedStorage.java:195) No errors show up in the standard output, but there are a few warnings. http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-trunk/749/testReport/org.apache.hadoop.hdfs/TestInjectionForSimulatedStorage/testInjection/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-92) if hadoop.tmp.dir is under your dfs.data.dir, HDFS will silently wipe out your name directory
[ https://issues.apache.org/jira/browse/HDFS-92?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-92. - Resolution: Not A Problem This goes against all recommendations in configuring the directories. I don't see why one would configure it this way that it'd lead to an obvious issue. Same is with merging mapred.local.dir and dfs.datanode.data.dir. Resolving as not a problem. if hadoop.tmp.dir is under your dfs.data.dir, HDFS will silently wipe out your name directory --- Key: HDFS-92 URL: https://issues.apache.org/jira/browse/HDFS-92 Project: Hadoop HDFS Issue Type: Bug Environment: gentoo linux on Intel/Dell w/ Sun JDK Reporter: Brian Karlak I used a hadoop-site.xml conf file like: property namedfs.data.dir/name value/data01/hadoop/value descriptionDirs to store data on./description /property property namehadoop.tmp.dir/name value/data01/hadoop/tmp/value descriptionA base for other temporary directories./description /property This file will format the namenode properly. Upon startup with the bin/start-dfs.sh script, however, the /data01/hadoop/tmp/dfs/name directory is silently wiped out. This foobars the namenode, but only after the next DFS stop/start cycle. (see output below) This is obviously a configuration error first and foremost, but the fact that hadoop silently corrupts itself makes it tricky to track down. [hid191]$ bin/hadoop namenode -format 08/04/04 18:41:43 INFO dfs.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = hid191.dev01.corp.metaweb.com/127.0.0.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 0.16.2 STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.16 -r 642481; compiled by 'hadoopqa' on Sat Mar 29 01:59:04 UTC 2008 / 08/04/04 18:41:43 INFO fs.FSNamesystem: fsOwner=zenkat,users 08/04/04 18:41:43 INFO fs.FSNamesystem: supergroup=supergroup 08/04/04 18:41:43 INFO fs.FSNamesystem: isPermissionEnabled=true 08/04/04 18:41:43 INFO dfs.Storage: Storage directory /data01/hadoop/tmp/dfs/name has been successfully formatted. 08/04/04 18:41:43 INFO dfs.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at hid191.dev01.corp.metaweb.com/127.0.0.1 / [hid191]$ ls /data01/hadoop/tmp/dfs/name current image [hid191]$ bin/start-dfs.sh starting namenode, logging to /data01/hadoop/logs/hadoop-zenkat-namenode-hid191.out localhost: starting datanode, logging to /data01/hadoop/logs/hadoop-zenkat-datanode-hid191.out localhost: starting secondarynamenode, logging to /data01/hadoop/logs/hadoop-zenkat-secondarynamenode-hid191.out [hid191]$ ls /data01/hadoop/tmp/dfs/name ls: cannot access /data01/hadoop/tmp/dfs/name: No such file or directory -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2728) Remove dfsadmin -printTopology from branch-1 docs since it does not exist
[ https://issues.apache.org/jira/browse/HDFS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177164#comment-13177164 ] Hadoop QA commented on HDFS-2728: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12508838/HDFS-2728.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1745//console This message is automatically generated. Remove dfsadmin -printTopology from branch-1 docs since it does not exist - Key: HDFS-2728 URL: https://issues.apache.org/jira/browse/HDFS-2728 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 1.0.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Attachments: HDFS-2728.patch It is documented we have -printTopology but we do not really have it in this branch. Possible docs mixup from somewhere in security branch pre-merge? {code} ➜ branch-1 grep printTopology -R . ./src/docs/src/documentation/content/xdocs/.svn/text-base/hdfs_user_guide.xml.svn-base: code-printTopology/code ./src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml: code-printTopology/code {code} Lets remove the reference. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-64) delete on dfs hung
[ https://issues.apache.org/jira/browse/HDFS-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-64. - Resolution: Not A Problem This has gone stale, and given that we haven't seen this recently at all, looks like it may have been fixed inadvertently. delete on dfs hung -- Key: HDFS-64 URL: https://issues.apache.org/jira/browse/HDFS-64 Project: Hadoop HDFS Issue Type: Bug Reporter: Devaraj Das I had a case where the JobTracker was trying to delete some files, as part of Garbage Collect for a job, in a dfs directory. The thread hung and this is the trace: Thread 19 (IPC Server handler 5 on 57344): State: WAITING Blocked count: 137022 Waited count: 336004 Waiting on org.apache.hadoop.ipc.Client$Call@eb6238 Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.ipc.Client.call(Client.java:683) org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) org.apache.hadoop.dfs.$Proxy4.delete(Unknown Source) sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) java.lang.reflect.Method.invoke(Method.java:597) org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) org.apache.hadoop.dfs.$Proxy4.delete(Unknown Source) org.apache.hadoop.dfs.DFSClient.delete(DFSClient.java:515) org.apache.hadoop.dfs.DistributedFileSystem.delete(DistributedFileSystem.java:170) org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:118) org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:114) org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:1635) org.apache.hadoop.mapred.JobInProgress.isJobComplete(JobInProgress.java:1387) org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:1348) org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:565) org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:2032) and it hung for an enormously long amount of time ~1 hour. Not sure whether these will help: I saw this message in the NameNode log around the time the delete was issued by the JobTracker 2008-05-07 09:55:57,375 WARN org.apache.hadoop.dfs.StateChange: DIR* FSDirectory.unprotectedDelete: failed to remove /mapredsystem/ddas/mapredsystem/10091.{running.machine.com}/job_200805070458_0004 because it does not exist I also checked that the directory in question was actually there (and the job couldn't have run without this directory being there). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-12) hadoop dfs -put does not return nonzero status on failure
[ https://issues.apache.org/jira/browse/HDFS-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-12. - Resolution: Not A Problem Fix Version/s: 0.23.0 This has been fixed by the FsCommand revamp on 0.23+. hadoop dfs -put does not return nonzero status on failure --- Key: HDFS-12 URL: https://issues.apache.org/jira/browse/HDFS-12 Project: Hadoop HDFS Issue Type: Bug Reporter: Karl Anderson Fix For: 0.23.0 I'm attempting to put a file on DFS with the hadoop dfs -put command. The put is failing, probably because my cluster is still being initialized, but the command is still returning a status of 0. If there was a meaningful error status, I'd be able to handle the situation (in my case, waiting and putting again works). The output is telling me there is a NotReplicatedYetException; it's a new cluster and the nodes are still being initialized. Here's the beginning of the output; it tries a few times, but eventually gives up. executing: source ~/.bash_profile; hadoop dfs -put ./vectorfile input/vectorfile 08/08/21 13:06:00 WARN fs.FileSystem: ip-10-251-195-162.ec2.internal:50001 is a deprecated filesystem name. Use hdfs://ip-10-251-195-162.ec2.internal:50001/ instead. 08/08/21 13:06:00 WARN fs.FileSystem: ip-10-251-195-162.ec2.internal:50001 is a deprecated filesystem name. Use hdfs://ip-10-251-195-162.ec2.internal:50001/ instead. 08/08/21 13:06:00 WARN fs.FileSystem: ip-10-251-195-162.ec2.internal:50001 is a deprecated filesystem name. Use hdfs://ip-10-251-195-162.ec2.internal:50001/ instead. 08/08/21 13:06:00 WARN fs.FileSystem: ip-10-251-195-162.ec2.internal:50001 is a deprecated filesystem name. Use hdfs://ip-10-251-195-162.ec2.internal:50001/ instead. 08/08/21 13:06:01 INFO dfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/root/input/vectorfile could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1117) at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888) at org.apache.hadoop.ipc.Client.call(Client.java:715) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2440) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2323) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1735) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1912) 08/08/21 13:06:01 WARN dfs.DFSClient: NotReplicatedYetException sleeping /user/root/input/vectorfile retries left 4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2729) Update BlockManager's comments regarding the invalid block set
[ https://issues.apache.org/jira/browse/HDFS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177250#comment-13177250 ] Eli Collins commented on HDFS-2729: --- +1 findbugs and test failure are unrelated. Update BlockManager's comments regarding the invalid block set -- Key: HDFS-2729 URL: https://issues.apache.org/jira/browse/HDFS-2729 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.23.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Attachments: HDFS-2729.patch Looks like after HDFS-82 was covered at some point, the comments and logs still carry presence of two sets when there really is just one set. This patch changes the logs and comments to be more accurate about that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-2728) Remove dfsadmin -printTopology from branch-1 docs since it does not exist
[ https://issues.apache.org/jira/browse/HDFS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-2728. --- Resolution: Fixed Fix Version/s: 1.1.0 Committed revision 1225589. Thanks Eli! Remove dfsadmin -printTopology from branch-1 docs since it does not exist - Key: HDFS-2728 URL: https://issues.apache.org/jira/browse/HDFS-2728 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 1.0.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Fix For: 1.1.0 Attachments: HDFS-2728.patch It is documented we have -printTopology but we do not really have it in this branch. Possible docs mixup from somewhere in security branch pre-merge? {code} ➜ branch-1 grep printTopology -R . ./src/docs/src/documentation/content/xdocs/.svn/text-base/hdfs_user_guide.xml.svn-base: code-printTopology/code ./src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml: code-printTopology/code {code} Lets remove the reference. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-59) No recovery when trying to replicate on marginal datanode
[ https://issues.apache.org/jira/browse/HDFS-59?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-59. - Resolution: Not A Problem This has gone stale. We haven't seen this lately. Lets file a new one if we see this again (these days it errors out with 'Could only replicate to X nodes' kinda errors). Also, could've been your dfs.replication.min 1. No recovery when trying to replicate on marginal datanode - Key: HDFS-59 URL: https://issues.apache.org/jira/browse/HDFS-59 Project: Hadoop HDFS Issue Type: Bug Environment: Sep 14 nightly build with a couple of mapred-related patches Reporter: Christian Kunz We have been uploading a lot of data to hdfs, running about 400 scripts in parallel calling hadoop's command line utility in distributed fashion. Many of them started to hang when copying large files (120GB), repeating the following messages without end: 07/10/05 15:44:25 INFO fs.DFSClient: Could not complete file, retrying... 07/10/05 15:44:26 INFO fs.DFSClient: Could not complete file, retrying... 07/10/05 15:44:26 INFO fs.DFSClient: Could not complete file, retrying... 07/10/05 15:44:27 INFO fs.DFSClient: Could not complete file, retrying... 07/10/05 15:44:27 INFO fs.DFSClient: Could not complete file, retrying... 07/10/05 15:44:28 INFO fs.DFSClient: Could not complete file, retrying... In the namenode log I eventually found repeated messages like: 2007-10-05 14:40:08,063 WARN org.apache.hadoop.fs.FSNamesystem: PendingReplicationMonitor timed out block blk_3124504920241431462 2007-10-05 14:40:11,876 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.pendingTransfer: ask IP450010 to replicate blk_3124504920241431462 to datanode(s) IP4_1:50010 2007-10-05 14:45:08,069 WARN org.apache.hadoop.fs.FSNamesystem: PendingReplicationMonitor timed out block blk_8533614499490422104 2007-10-05 14:45:08,070 WARN org.apache.hadoop.fs.FSNamesystem: PendingReplicationMonitor timed out block blk_7741954594593177224 2007-10-05 14:45:13,973 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.pendingTransfer: ask IP4:50010 to replicate blk_7741954594593177224 to datanode(s) IP4_2:50010 2007-10-05 14:45:13,973 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.pendingTransfer: ask IP4:50010 to replicate blk_8533614499490422104 to datanode(s) IP4_350010 I could not ssh to the node with IpAdress IP4, but seemingly the datanode server still sent heartbeats. After rebooting the node it was okay again and a few files and a few clients recovered, but not all. I restarted these clients and they completed this time (before noticing the marginal node we restarted the clients twice without success). I would conclude that the existence of the marginal node must have caused loss of blocks, at least in the tracking mechanism, in addition to eternal retries. In summary, dfs should be able to handle datanodes with good heartbeat but otherwise failing to do their job. This should include datanodes that have a high rate of socket connection timeouts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-104) TestInjectionForSimulatedStorage fails once in a while
[ https://issues.apache.org/jira/browse/HDFS-104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-104. -- Resolution: Cannot Reproduce Hasn't had a failure report in two years now. Gone stale, closing out this and related issues. TestInjectionForSimulatedStorage fails once in a while -- Key: HDFS-104 URL: https://issues.apache.org/jira/browse/HDFS-104 Project: Hadoop HDFS Issue Type: Bug Reporter: Lohit Vijayarenu TestInjectionForSimulatedStorage fails once in a while. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1314) dfs.block.size accepts only absolute value
[ https://issues.apache.org/jira/browse/HDFS-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-1314: -- Target Version/s: 0.24.0 Status: Patch Available (was: Open) +1. Will commit once Hudson reports its build. dfs.block.size accepts only absolute value -- Key: HDFS-1314 URL: https://issues.apache.org/jira/browse/HDFS-1314 Project: Hadoop HDFS Issue Type: Bug Reporter: Karim Saadah Assignee: Sho Shimauchi Priority: Minor Labels: newbie Attachments: hdfs-1314.txt, hdfs-1314.txt, hdfs-1314.txt Using dfs.block.size=8388608 works but dfs.block.size=8mb does not. Using dfs.block.size=8mb should throw some WARNING on NumberFormatException. (http://pastebin.corp.yahoo.com/56129) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2728) Remove dfsadmin -printTopology from branch-1 docs since it does not exist
[ https://issues.apache.org/jira/browse/HDFS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-2728: -- Target Version/s: 1.1.0 (was: 0.24.0) Remove dfsadmin -printTopology from branch-1 docs since it does not exist - Key: HDFS-2728 URL: https://issues.apache.org/jira/browse/HDFS-2728 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 1.0.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Attachments: HDFS-2728.patch It is documented we have -printTopology but we do not really have it in this branch. Possible docs mixup from somewhere in security branch pre-merge? {code} ➜ branch-1 grep printTopology -R . ./src/docs/src/documentation/content/xdocs/.svn/text-base/hdfs_user_guide.xml.svn-base: code-printTopology/code ./src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml: code-printTopology/code {code} Lets remove the reference. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-105) Streaming task stuck in MapTask$DirectMapOutputCollector.close
[ https://issues.apache.org/jira/browse/HDFS-105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-105. -- Resolution: Cannot Reproduce Hasn't had a similar failure report in two years now. Gone stale, so closing out as can't reproduce. Lets open a new one should we face this again (looks transient?) Streaming task stuck in MapTask$DirectMapOutputCollector.close -- Key: HDFS-105 URL: https://issues.apache.org/jira/browse/HDFS-105 Project: Hadoop HDFS Issue Type: Bug Reporter: Amareshwari Sriramadasu Attachments: thread_dump.txt Observed a streaming task stuck in MapTask$DirectMapOutputCollector.close -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-55) Change all references of dfs to hdfs in configs
[ https://issues.apache.org/jira/browse/HDFS-55?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-55. - Resolution: Won't Fix Its all dfs.* and its in Hadoop, and the settings go to hdfs-site.xml. I think that is sufficient? Don't think its worth the change. Feel free to reopen if you feel otherwise strongly. Change all references of dfs to hdfs in configs --- Key: HDFS-55 URL: https://issues.apache.org/jira/browse/HDFS-55 Project: Hadoop HDFS Issue Type: Bug Reporter: Lohit Vijayarenu After code restructuring dfs has been changed to hdfs, but I see config variables with dfs.something eg dfs.http.address. Should we change everything to hdfs? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2729) Update BlockManager's comments regarding the invalid block set
[ https://issues.apache.org/jira/browse/HDFS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-2729: -- Attachment: HDFS-2729.patch Update BlockManager's comments regarding the invalid block set -- Key: HDFS-2729 URL: https://issues.apache.org/jira/browse/HDFS-2729 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.23.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Attachments: HDFS-2729.patch Looks like after HDFS-82 was covered at some point, the comments and logs still carry presence of two sets when there really is just one set. This patch changes the logs and comments to be more accurate about that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-103) handle return value of globStatus() to be uniform.
[ https://issues.apache.org/jira/browse/HDFS-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-103. -- Resolution: Not A Problem Looking at the current impl. of globStatus, we always return an empty FileStatus[] out, never a null. Not a problem anymore. handle return value of globStatus() to be uniform. -- Key: HDFS-103 URL: https://issues.apache.org/jira/browse/HDFS-103 Project: Hadoop HDFS Issue Type: Bug Reporter: Lohit Vijayarenu Some places in code does not expect null value from globStatus(Path path), they expect path. These have to be fixed to handle null to be uniform. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-44) Unit test failed: TestInjectionForSimulatedStorage
[ https://issues.apache.org/jira/browse/HDFS-44?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-44. - Resolution: Cannot Reproduce Hasn't had a failure report in two years now. Gone stale, closing out this and related issues. Unit test failed: TestInjectionForSimulatedStorage -- Key: HDFS-44 URL: https://issues.apache.org/jira/browse/HDFS-44 Project: Hadoop HDFS Issue Type: Bug Reporter: Mukund Madhugiri Unit test failed: TestInjectionForSimulatedStorage failed in the nightly build with a timeout: tail from the console: [junit] 2007-12-12 12:02:18,674 INFO dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 4th block blk_4235117719756274078 yet. Expecting 4, got 5. [junit] 2007-12-12 12:02:19,184 INFO dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 4th block blk_4235117719756274078 yet. Expecting 4, got 5. [junit] 2007-12-12 12:02:19,694 INFO dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 4th block blk_4235117719756274078 yet. Expecting 4, got 5. [junit] 2007-12-12 12:02:20,204 INFO dfs.TestInjectionForSimulatedStorage (TestInjectionForSimulatedStorage.java:waitForBlockReplication(89)) - Not enough replicas for 4th block blk_4235117719756274078 yet. Expecting 4, got 5. [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec [junit] Test org.apache.hadoop.dfs.TestInjectionForSimulatedStorage FAILED (timeout) Complete console log: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/330/console -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2728) Remove dfsadmin -printTopology from branch-1 docs since it does not exist
Remove dfsadmin -printTopology from branch-1 docs since it does not exist - Key: HDFS-2728 URL: https://issues.apache.org/jira/browse/HDFS-2728 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 1.0.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor It is documented we have -printTopology but we do not really have it in this branch. Possible docs mixup from somewhere in security branch pre-merge? {code} ➜ branch-1 grep printTopology -R . ./src/docs/src/documentation/content/xdocs/.svn/text-base/hdfs_user_guide.xml.svn-base: code-printTopology/code ./src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml: code-printTopology/code {code} Lets remove the reference. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-35) Confusing set replication message
[ https://issues.apache.org/jira/browse/HDFS-35?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-35. - Resolution: Incomplete Unsure what seems to be the problem here. Those logs are in if-else clauses and represent the up vs. down just fine, reading FSNamesystem code presently. Confusing set replication message - Key: HDFS-35 URL: https://issues.apache.org/jira/browse/HDFS-35 Project: Hadoop HDFS Issue Type: Bug Reporter: Raghu Angadi Priority: Minor If a file has a replicaiton of 3 and setReplication() is used to set the replication to 1 we will see following log in NameNode log : {noformat} 2007-08-07 12:18:27,370 INFO fs.FSNamesystem (FSNamesystem.java:setReplicationInternal(661)) - Increasing replication for file /srcdat/2725423627829963655. New replication is 1 2007-08-07 12:18:27,370 INFO fs.FSNamesystem (FSNamesystem.java:setReplicationInternal(668)) - Reducing replication for file /srcdat/2725423627829963655. New replication is 1 {noformat} Fixing this could be trivial. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2729) Update BlockManager's comments regarding the invalid block set
[ https://issues.apache.org/jira/browse/HDFS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177211#comment-13177211 ] Hadoop QA commented on HDFS-2729: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12508841/HDFS-2729.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated 20 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. -1 release audit. The applied patch generated 1 release audit warnings (more than the trunk's current 0 warnings). -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1746//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1746//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1746//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1746//console This message is automatically generated. Update BlockManager's comments regarding the invalid block set -- Key: HDFS-2729 URL: https://issues.apache.org/jira/browse/HDFS-2729 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.23.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Attachments: HDFS-2729.patch Looks like after HDFS-82 was covered at some point, the comments and logs still carry presence of two sets when there really is just one set. This patch changes the logs and comments to be more accurate about that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1314) dfs.block.size accepts only absolute value
[ https://issues.apache.org/jira/browse/HDFS-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177110#comment-13177110 ] Hadoop QA commented on HDFS-1314: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12508830/hdfs-1314.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1744//console This message is automatically generated. dfs.block.size accepts only absolute value -- Key: HDFS-1314 URL: https://issues.apache.org/jira/browse/HDFS-1314 Project: Hadoop HDFS Issue Type: Bug Reporter: Karim Saadah Assignee: Sho Shimauchi Priority: Minor Labels: newbie Attachments: hdfs-1314.txt, hdfs-1314.txt, hdfs-1314.txt Using dfs.block.size=8388608 works but dfs.block.size=8mb does not. Using dfs.block.size=8mb should throw some WARNING on NumberFormatException. (http://pastebin.corp.yahoo.com/56129) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2728) Remove dfsadmin -printTopology from branch-1 docs since it does not exist
[ https://issues.apache.org/jira/browse/HDFS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177251#comment-13177251 ] Eli Collins commented on HDFS-2728: --- +1 Don't think test-patch results on branch-1 are needed as the change is trivial. Remove dfsadmin -printTopology from branch-1 docs since it does not exist - Key: HDFS-2728 URL: https://issues.apache.org/jira/browse/HDFS-2728 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 1.0.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Attachments: HDFS-2728.patch It is documented we have -printTopology but we do not really have it in this branch. Possible docs mixup from somewhere in security branch pre-merge? {code} ➜ branch-1 grep printTopology -R . ./src/docs/src/documentation/content/xdocs/.svn/text-base/hdfs_user_guide.xml.svn-base: code-printTopology/code ./src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml: code-printTopology/code {code} Lets remove the reference. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2580) NameNode#main(...) can make use of GenericOptionsParser.
[ https://issues.apache.org/jira/browse/HDFS-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-2580: -- Status: Patch Available (was: Open) Resubmitting for tests. I don't see an elegant way to use Tool interface, given the createNamenode(…) static call required to initialize 'this'. This should suffice. NameNode#main(...) can make use of GenericOptionsParser. Key: HDFS-2580 URL: https://issues.apache.org/jira/browse/HDFS-2580 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.23.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Fix For: 0.24.0 Attachments: HDFS-2580.patch DataNode supports passing generic opts when calling via {{hdfs datanode}}. NameNode can support the same thing as well, but doesn't right now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2729) Update BlockManager's comments regarding the invalid block set
[ https://issues.apache.org/jira/browse/HDFS-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-2729: -- Status: Patch Available (was: Open) Trivial patch that changes comments and log statements. No tests required. Update BlockManager's comments regarding the invalid block set -- Key: HDFS-2729 URL: https://issues.apache.org/jira/browse/HDFS-2729 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.23.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Attachments: HDFS-2729.patch Looks like after HDFS-82 was covered at some point, the comments and logs still carry presence of two sets when there really is just one set. This patch changes the logs and comments to be more accurate about that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-47) dead datanodes because of OutOfMemoryError
[ https://issues.apache.org/jira/browse/HDFS-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-47. - Resolution: Not A Problem This has gone stale. FWIW, haven't seen DNs go OOM on its own in recent years. Probably a leak that was fixed? Resolving as Not a Problem (anymore). dead datanodes because of OutOfMemoryError -- Key: HDFS-47 URL: https://issues.apache.org/jira/browse/HDFS-47 Project: Hadoop HDFS Issue Type: Bug Reporter: Christian Kunz We see more dead datanodes than in previous releases. The common exception is found in the out file: Exception in thread org.apache.hadoop.dfs.DataBlockScanner@18166e5 java.lang.OutOfMemoryError: Java heap space Exception in thread DataNode: [dfs.data.dir-value] java.lang.OutOfMemoryError: Java heap space -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2692) HA: Bugs related to failover from/into safe-mode
[ https://issues.apache.org/jira/browse/HDFS-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177367#comment-13177367 ] Todd Lipcon commented on HDFS-2692: --- bq. In FSEditLogLoader#loadFSEdits, should we really be unconditionally calling FSNamesystem#notifyGenStampUpdate in the finally block? What if an error occurs and maxGenStamp is never updated in FSEditLogLoader#loadEditRecords This should be OK -- we'll just call it with the argument 0, which won't cause any problem (0 is lower than any possible queued gen stamp) bq. sp. Initiatling in TestHASafeMode#testComplexFailoverIntoSafemode fixed bq. In FSNamesystem#notifyGenStampUpdate, could be a better log message, and the log level should probably not be info: LOG.info(= notified of genstamp update for: + gs); Fixed and changed to DEBUG level bq. Why is SafeModeInfo#doConsistencyCheck costly? It doesn't seem like it should be. If it's not in fact expensive, we might as well make it run regardless of whether or not asserts are enabled You're right that it's not super expensive, but this code gets called on every block being reported during startup, which is a fair amount.. so I chose to maintain the current behavior, of only running the checks when asserts are enabled. bq. Is there really no better way to check if assertions are enabled? Not that I've ever found! :( bq. seems like they should all be made member methods and moved to MiniDFSCluster... Also seems like TestEditLogTailer#waitForStandbyToCatchUp should be moved to MiniDFSCluster. I'd like to move a bunch of these methods into a new {{HATestUtil}} class... can I do that in a follow-up JIRA? Eli said: bq. Nice change and tests. Nit, I'd add a comment in TestHASafeMode#restartStandby where the safemode extension is set indicating the rationale, it looked like the asserts at the end were racy because I missed this Fixed HA: Bugs related to failover from/into safe-mode Key: HDFS-2692 URL: https://issues.apache.org/jira/browse/HDFS-2692 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: HA branch (HDFS-1623) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical Attachments: hdfs-2692.txt, hdfs-2692.txt In testing I saw an AssertionError come up several times when I was trying to do failover between two NNs where one or the other was in safe-mode. Need to write some unit tests to try to trigger this -- hunch is it has something to do with the treatment of safe block count while tailing edits in safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2692) HA: Bugs related to failover from/into safe-mode
[ https://issues.apache.org/jira/browse/HDFS-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-2692: -- Attachment: hdfs-2692.txt HA: Bugs related to failover from/into safe-mode Key: HDFS-2692 URL: https://issues.apache.org/jira/browse/HDFS-2692 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: HA branch (HDFS-1623) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical Attachments: hdfs-2692.txt, hdfs-2692.txt, hdfs-2692.txt In testing I saw an AssertionError come up several times when I was trying to do failover between two NNs where one or the other was in safe-mode. Need to write some unit tests to try to trigger this -- hunch is it has something to do with the treatment of safe block count while tailing edits in safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2692) HA: Bugs related to failover from/into safe-mode
[ https://issues.apache.org/jira/browse/HDFS-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177368#comment-13177368 ] Aaron T. Myers commented on HDFS-2692: -- bq. I'd like to move a bunch of these methods into a new HATestUtil class... can I do that in a follow-up JIRA? Definitely. This also came up in Eli's review of HDFS-2709. Please file? +1, the latest patch looks good to me. HA: Bugs related to failover from/into safe-mode Key: HDFS-2692 URL: https://issues.apache.org/jira/browse/HDFS-2692 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: HA branch (HDFS-1623) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical Attachments: hdfs-2692.txt, hdfs-2692.txt, hdfs-2692.txt In testing I saw an AssertionError come up several times when I was trying to do failover between two NNs where one or the other was in safe-mode. Need to write some unit tests to try to trigger this -- hunch is it has something to do with the treatment of safe block count while tailing edits in safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2720) HA : TestStandbyIsHot is failing while copying in_use.lock file from NN1 nameSpaceDirs to NN2 nameSpaceDirs
[ https://issues.apache.org/jira/browse/HDFS-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177371#comment-13177371 ] Todd Lipcon commented on HDFS-2720: --- Small nits: {code} + // Now format 1st NN and copy the storage dirs to remaining all. {code} to remaining all seems like a typo. copy the storage directory from that node to the others. would be better. Also I think it's easier to read first than 1st {code} + //Start all Namenodes {code} add space after {{//}} - The change to remove setRpcEngine looks unrelated - that should get cleaned up in trunk so it doesn't present a merge issue in the branch. HA : TestStandbyIsHot is failing while copying in_use.lock file from NN1 nameSpaceDirs to NN2 nameSpaceDirs Key: HDFS-2720 URL: https://issues.apache.org/jira/browse/HDFS-2720 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, test Affects Versions: HA branch (HDFS-1623) Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Attachments: HDFS-2720.patch To maintain the clusterID same , we are copying the namespaceDirs from 1st NN to other NNs. While copying this files, in_use.lock file may not allow to copy in all the OSs since it has aquired the lock on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2714) HA: Fix test cases which use standalone FSNamesystems
[ https://issues.apache.org/jira/browse/HDFS-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177374#comment-13177374 ] Aaron T. Myers commented on HDFS-2714: -- +1, the patch looks good to me. HA: Fix test cases which use standalone FSNamesystems - Key: HDFS-2714 URL: https://issues.apache.org/jira/browse/HDFS-2714 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, test Affects Versions: HA branch (HDFS-1623) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Trivial Attachments: hdfs-2714.txt Several tests (eg TestEditLog, TestSaveNamespace) failed in the most recent build with an NPE inside of FSNamesystem.checkOperation. These tests set up a standalone FSN that isn't fully initialized. We just need to add a null check to deal with this case in checkOperation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2720) HA : TestStandbyIsHot is failing while copying in_use.lock file from NN1 nameSpaceDirs to NN2 nameSpaceDirs
[ https://issues.apache.org/jira/browse/HDFS-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177389#comment-13177389 ] Eli Collins commented on HDFS-2720: --- ATM and I were discussing how to initialize the SBN state yesterday. What we currently do is format the primary then copy the name dirs to the SBN. How about making the SBN do this automatically on startup? Specifically, on NN startup, if HA and a shared edits dir are configured, if there is no local image but the shared-dir is configured then the SBN downloads the image from the primary (if the other NN is still standby then it fails to start as it does currently). HA : TestStandbyIsHot is failing while copying in_use.lock file from NN1 nameSpaceDirs to NN2 nameSpaceDirs Key: HDFS-2720 URL: https://issues.apache.org/jira/browse/HDFS-2720 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, test Affects Versions: HA branch (HDFS-1623) Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Attachments: HDFS-2720.patch To maintain the clusterID same , we are copying the namespaceDirs from 1st NN to other NNs. While copying this files, in_use.lock file may not allow to copy in all the OSs since it has aquired the lock on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2720) HA : TestStandbyIsHot is failing while copying in_use.lock file from NN1 nameSpaceDirs to NN2 nameSpaceDirs
[ https://issues.apache.org/jira/browse/HDFS-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177434#comment-13177434 ] Todd Lipcon commented on HDFS-2720: --- That would be a nice improvement... but I think it makes sense to do this small fix that Uma proposed so the tests run on Windows, and then do the standby initialize from remote active feature separately? HA : TestStandbyIsHot is failing while copying in_use.lock file from NN1 nameSpaceDirs to NN2 nameSpaceDirs Key: HDFS-2720 URL: https://issues.apache.org/jira/browse/HDFS-2720 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, test Affects Versions: HA branch (HDFS-1623) Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Attachments: HDFS-2720.patch To maintain the clusterID same , we are copying the namespaceDirs from 1st NN to other NNs. While copying this files, in_use.lock file may not allow to copy in all the OSs since it has aquired the lock on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1314) dfs.block.size accepts only absolute value
[ https://issues.apache.org/jira/browse/HDFS-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177460#comment-13177460 ] Sho Shimauchi commented on HDFS-1314: - I guess HADOOP-7910 was not yet merged into the trunk at that time. Now it has been merged. Could you try the same patch again? dfs.block.size accepts only absolute value -- Key: HDFS-1314 URL: https://issues.apache.org/jira/browse/HDFS-1314 Project: Hadoop HDFS Issue Type: Bug Reporter: Karim Saadah Assignee: Sho Shimauchi Priority: Minor Labels: newbie Attachments: hdfs-1314.txt, hdfs-1314.txt, hdfs-1314.txt Using dfs.block.size=8388608 works but dfs.block.size=8mb does not. Using dfs.block.size=8mb should throw some WARNING on NumberFormatException. (http://pastebin.corp.yahoo.com/56129) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-2714) HA: Fix test cases which use standalone FSNamesystems
[ https://issues.apache.org/jira/browse/HDFS-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-2714. --- Resolution: Fixed Fix Version/s: HA branch (HDFS-1623) Hadoop Flags: Reviewed HA: Fix test cases which use standalone FSNamesystems - Key: HDFS-2714 URL: https://issues.apache.org/jira/browse/HDFS-2714 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, test Affects Versions: HA branch (HDFS-1623) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Trivial Fix For: HA branch (HDFS-1623) Attachments: hdfs-2714.txt Several tests (eg TestEditLog, TestSaveNamespace) failed in the most recent build with an NPE inside of FSNamesystem.checkOperation. These tests set up a standalone FSN that isn't fully initialized. We just need to add a null check to deal with this case in checkOperation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2730) HA: Refactor shared HA-related test code into HATestUtils class
HA: Refactor shared HA-related test code into HATestUtils class --- Key: HDFS-2730 URL: https://issues.apache.org/jira/browse/HDFS-2730 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, test Affects Versions: HA branch (HDFS-1623) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: HA branch (HDFS-1623) A fair number of the HA tests are sharing code like {{waitForStandbyToCatchUp}}, etc. We should refactor this code into an HATestUtils class with static methods. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2720) HA : TestStandbyIsHot is failing while copying in_use.lock file from NN1 nameSpaceDirs to NN2 nameSpaceDirs
[ https://issues.apache.org/jira/browse/HDFS-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177468#comment-13177468 ] Eli Collins commented on HDFS-2720: --- Yup, I'll file a separate jira. Agree wrt the fix for Windows. HA : TestStandbyIsHot is failing while copying in_use.lock file from NN1 nameSpaceDirs to NN2 nameSpaceDirs Key: HDFS-2720 URL: https://issues.apache.org/jira/browse/HDFS-2720 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, test Affects Versions: HA branch (HDFS-1623) Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Attachments: HDFS-2720.patch To maintain the clusterID same , we are copying the namespaceDirs from 1st NN to other NNs. While copying this files, in_use.lock file may not allow to copy in all the OSs since it has aquired the lock on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-2692) HA: Bugs related to failover from/into safe-mode
[ https://issues.apache.org/jira/browse/HDFS-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-2692. --- Resolution: Fixed Fix Version/s: HA branch (HDFS-1623) Hadoop Flags: Reviewed Committed to branch, thanks for the reviews, Aaron and Eli. I filed HDFS-2730 for the test util refactor HA: Bugs related to failover from/into safe-mode Key: HDFS-2692 URL: https://issues.apache.org/jira/browse/HDFS-2692 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: HA branch (HDFS-1623) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical Fix For: HA branch (HDFS-1623) Attachments: hdfs-2692.txt, hdfs-2692.txt, hdfs-2692.txt In testing I saw an AssertionError come up several times when I was trying to do failover between two NNs where one or the other was in safe-mode. Need to write some unit tests to try to trigger this -- hunch is it has something to do with the treatment of safe block count while tailing edits in safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2731) Autopopulate standby name dirs if they're empty
Autopopulate standby name dirs if they're empty --- Key: HDFS-2731 URL: https://issues.apache.org/jira/browse/HDFS-2731 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins Assignee: Eli Collins To setup a SBN we currently format the primary then manually copy the name dirs to the SBN. The SBN should do this automatically. Specifically, on NN startup, if HA with a shared edits dir is configured and populated, if the SBN has empty name dirs it should downloads the image and log from the primary (as an optimization it could copy the logs from the shared dir). If the other NN is still in standby then it should fails to start as it does currently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2731) Autopopulate standby name dirs if they're empty
[ https://issues.apache.org/jira/browse/HDFS-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2731: -- Description: To setup a SBN we currently format the primary then manually copy the name dirs to the SBN. The SBN should do this automatically. Specifically, on NN startup, if HA with a shared edits dir is configured and populated, if the SBN has empty name dirs it should downloads the image and log from the primary (as an optimization it could copy the logs from the shared dir). If the other NN is still in standby then it should fail to start as it does currently. (was: To setup a SBN we currently format the primary then manually copy the name dirs to the SBN. The SBN should do this automatically. Specifically, on NN startup, if HA with a shared edits dir is configured and populated, if the SBN has empty name dirs it should downloads the image and log from the primary (as an optimization it could copy the logs from the shared dir). If the other NN is still in standby then it should fails to start as it does currently.) Autopopulate standby name dirs if they're empty --- Key: HDFS-2731 URL: https://issues.apache.org/jira/browse/HDFS-2731 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins Assignee: Eli Collins To setup a SBN we currently format the primary then manually copy the name dirs to the SBN. The SBN should do this automatically. Specifically, on NN startup, if HA with a shared edits dir is configured and populated, if the SBN has empty name dirs it should downloads the image and log from the primary (as an optimization it could copy the logs from the shared dir). If the other NN is still in standby then it should fail to start as it does currently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2732) Add support for the standby in the bin scripts
Add support for the standby in the bin scripts -- Key: HDFS-2732 URL: https://issues.apache.org/jira/browse/HDFS-2732 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins Assignee: Eli Collins We need to update the bin scripts to support SBNs. Two ideas: Modify start-dfs.sh to start another copy of the NN if HA is configured. We could introduce a file similar to masters (2NN hosts) called standbys which lists the SBN hosts, and start-dfs.sh would automatically make the NN it starts active (and leave the NNs listed in standby as is). Or simpler, we could just provide a start-namenode.sh script that a user can run to start the SBN on another host themselves. The user would manually tell the other NN to be active via HAAdmin (or start-dfs.sh could do that automatically, ie assume the NN it starts should be the primary). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2733) Document HA configuration and CLI
Document HA configuration and CLI - Key: HDFS-2733 URL: https://issues.apache.org/jira/browse/HDFS-2733 Project: Hadoop HDFS Issue Type: Sub-task Components: documentation, ha Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins We need to document the configuration changes in HDFS-2231 and the new CLI introduced by HADOOP-7774. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2732) Add support for the standby in the bin scripts
[ https://issues.apache.org/jira/browse/HDFS-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177479#comment-13177479 ] Todd Lipcon commented on HDFS-2732: --- For me, start-dfs.sh actually already works, since it uses the GetConf tool which prints out all of the NN addresses in the cluster based on the configuration. Does it not work for you? Add support for the standby in the bin scripts -- Key: HDFS-2732 URL: https://issues.apache.org/jira/browse/HDFS-2732 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins Assignee: Eli Collins We need to update the bin scripts to support SBNs. Two ideas: Modify start-dfs.sh to start another copy of the NN if HA is configured. We could introduce a file similar to masters (2NN hosts) called standbys which lists the SBN hosts, and start-dfs.sh would automatically make the NN it starts active (and leave the NNs listed in standby as is). Or simpler, we could just provide a start-namenode.sh script that a user can run to start the SBN on another host themselves. The user would manually tell the other NN to be active via HAAdmin (or start-dfs.sh could do that automatically, ie assume the NN it starts should be the primary). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2731) Autopopulate standby name dirs if they're empty
[ https://issues.apache.org/jira/browse/HDFS-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177478#comment-13177478 ] Todd Lipcon commented on HDFS-2731: --- bq. as an optimization it could copy the logs from the shared dir I dont think it's necessarily an optimization - might actually be _easier_ to implement this way :) bq. If the other NN is still in standby then it should fail to start as it does currently Can you explain what you mean by this? Why not allow it to download the image from the other NN anyway? Autopopulate standby name dirs if they're empty --- Key: HDFS-2731 URL: https://issues.apache.org/jira/browse/HDFS-2731 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins Assignee: Eli Collins To setup a SBN we currently format the primary then manually copy the name dirs to the SBN. The SBN should do this automatically. Specifically, on NN startup, if HA with a shared edits dir is configured and populated, if the SBN has empty name dirs it should downloads the image and log from the primary (as an optimization it could copy the logs from the shared dir). If the other NN is still in standby then it should fail to start as it does currently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2709) HA: Appropriately handle error conditions in EditLogTailer
[ https://issues.apache.org/jira/browse/HDFS-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-2709: - Attachment: HDFS-2709-HDFS-1623.patch HA: Appropriately handle error conditions in EditLogTailer -- Key: HDFS-2709 URL: https://issues.apache.org/jira/browse/HDFS-2709 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: HA branch (HDFS-1623) Reporter: Todd Lipcon Assignee: Aaron T. Myers Priority: Critical Attachments: HDFS-2709-HDFS-1623.patch, HDFS-2709-HDFS-1623.patch, HDFS-2709-HDFS-1623.patch Currently if the edit log tailer experiences an error replaying edits in the middle of a file, it will go back to retrying from the beginning of the file on the next tailing iteration. This is incorrect since many of the edits will have already been replayed, and not all edits are idempotent. Instead, we either need to (a) support reading from the middle of a finalized file (ie skip those edits already applied), or (b) abort the standby if it hits an error while tailing. If a isn't simple, let's do b for now and come back to 'a' later since this is a rare circumstance and better to abort than be incorrect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2709) HA: Appropriately handle error conditions in EditLogTailer
[ https://issues.apache.org/jira/browse/HDFS-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177483#comment-13177483 ] Aaron T. Myers commented on HDFS-2709: -- Thanks a lot for the thorough review, Eli. Comments inline. I also found and fixed another little bug involving a potential race between the edits log tailer thread and rolling edits logs. I'll post an updated patch in a moment. bq. This change handles errors reading an edit from the log (the common case) but not when there's a failure to apply an edit (eg if there was a bug, or a silent corruption somehow went unnoticed). While loadEdits won't ignore (will throw) this exception it does get propagated up to the catch of Throwable in EditLogTailer#run so we effectively retry endlessly in this case. Need to replace the TODO(HA) comment there with code to shutdown the SBN. Feel free to punt to another jira. Indeed, I had originally intended to do this as part of a separate JIRA, but I'm rethinking that decision. I've added some code to shutdown the SBN, and amended the tests to verify this behavior. bq. How about adding a test that uses multiple shared edits dirs, and shows that a failure to read from one of them will cause the tailer to not catch up, can file a jira for a future change that is OK with faulty shared dirs as long as one is working. Multiple shared edits dirs isn't currently supported or tested. It's certainly an obvious improvement worth doing, but there are currently no tests for it. We should probably file a JIRA to test that. bq. In FileJournalManager#getNumberOfTransactions, not that the we loosen the check to elf.containsTxId(fromTxid) isn't the last else case dead code? Yes indeed, not sure how I missed that. Removed. bq. I think we can remove the TODO(HA): Should this happen when called by the tailer? comment in loadEdits right since we always create new streams when we select them? Yes indeed. Removed. bq. Would it be simpler in LimitedEditLogAnswer#answer to spy on each stream and stub readOp rather than introduce LimitedEditLogInputStream? Different? Yes. Simpler? Maybe. I did it this way because I thought creating spies within spies was kind of gross. I switched it to use a spy in this latest patch, which is at least less code. :) bq. How about introducing DFSHATestUtil and put waitForStandbyToCatchUp and CouldNotCatchUpException there? Seems like the methods you pointed out in the HDFS-2692 review could go there as well). Good idea. Let's do it in a separate JIRA though, along the lines of consolidate generic HA test helper methods. bq. Nit: IOException e, s/e/ioe/ Done. bq. testFailuretoReadEdits needs a javadoc Done. bq. waitForStandbyToCatchUp needs a javadoc indicating it waits for NN_LAG_TIMEOUT then throws CouldNotCatchUp Done. HA: Appropriately handle error conditions in EditLogTailer -- Key: HDFS-2709 URL: https://issues.apache.org/jira/browse/HDFS-2709 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: HA branch (HDFS-1623) Reporter: Todd Lipcon Assignee: Aaron T. Myers Priority: Critical Attachments: HDFS-2709-HDFS-1623.patch, HDFS-2709-HDFS-1623.patch, HDFS-2709-HDFS-1623.patch Currently if the edit log tailer experiences an error replaying edits in the middle of a file, it will go back to retrying from the beginning of the file on the next tailing iteration. This is incorrect since many of the edits will have already been replayed, and not all edits are idempotent. Instead, we either need to (a) support reading from the middle of a finalized file (ie skip those edits already applied), or (b) abort the standby if it hits an error while tailing. If a isn't simple, let's do b for now and come back to 'a' later since this is a rare circumstance and better to abort than be incorrect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2731) Autopopulate standby name dirs if they're empty
[ https://issues.apache.org/jira/browse/HDFS-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177486#comment-13177486 ] Eli Collins commented on HDFS-2731: --- Wrt #1, if we get the image from the NN and the edits from the shared dir, are we sure they'll always match, eg what if we're rolling at the same time (the other NN could be primary and active)? I was thinking asking for both from the primary would mean we always get matched sets and therefore don't need to worry about races. Wrt #2, yea, I thinking we should be explicit (don't have to worry about eg the shared dir being populated by neither NN having populated name dirs, which we know won't be the case if the other is active), but on 2nd thought I think your suggestion is better. Autopopulate standby name dirs if they're empty --- Key: HDFS-2731 URL: https://issues.apache.org/jira/browse/HDFS-2731 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins Assignee: Eli Collins To setup a SBN we currently format the primary then manually copy the name dirs to the SBN. The SBN should do this automatically. Specifically, on NN startup, if HA with a shared edits dir is configured and populated, if the SBN has empty name dirs it should downloads the image and log from the primary (as an optimization it could copy the logs from the shared dir). If the other NN is still in standby then it should fail to start as it does currently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2731) Autopopulate standby name dirs if they're empty
[ https://issues.apache.org/jira/browse/HDFS-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177488#comment-13177488 ] Todd Lipcon commented on HDFS-2731: --- The primary shouldn't be removing any old images unless it's taking checkpoints. But there won't be checkpoints if the standby isn't running yet (assuming the standby is the one doing checkpointing). So if we get the most recent image from the NN, then we should always have enough edits in the shared dir to roll forward from there. Autopopulate standby name dirs if they're empty --- Key: HDFS-2731 URL: https://issues.apache.org/jira/browse/HDFS-2731 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins Assignee: Eli Collins To setup a SBN we currently format the primary then manually copy the name dirs to the SBN. The SBN should do this automatically. Specifically, on NN startup, if HA with a shared edits dir is configured and populated, if the SBN has empty name dirs it should downloads the image and log from the primary (as an optimization it could copy the logs from the shared dir). If the other NN is still in standby then it should fail to start as it does currently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2386) with security enabled fsck calls lead to handshake_failure and hftp fails throwing the same exception in the logs
[ https://issues.apache.org/jira/browse/HDFS-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177534#comment-13177534 ] Rajesh Balamohan commented on HDFS-2386: we are actively hitting this issue with the secondary namenode and fsck with the 204. JDK 1.6.0_29, RHEL 6.1, MIT 1.8.x, AES-256, AES-128, and RC4 enc types are enabled. JCE is installed. +1, We are facing this issue as well and get the following exception in NameNode. 11/12/29 18:47:02 WARN mortbay.log: EXCEPTION javax.net.ssl.SSLHandshakeException: Invalid padding at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:174) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1699) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:852) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1138) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1165) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1149) at org.mortbay.jetty.security.SslSocketConnector$SslConnection.run(SslSocketConnector.java:708) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: javax.crypto.BadPaddingException: Padding length invalid: 238 at com.sun.net.ssl.internal.ssl.CipherBox.removePadding(CipherBox.java:399) at com.sun.net.ssl.internal.ssl.CipherBox.decrypt(CipherBox.java:247) at com.sun.net.ssl.internal.ssl.InputRecord.decrypt(InputRecord.java:153) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:840) ... 5 more Pasting the javax.net.debug output from secondary namenode (if this would be of help) Enabled javax.net.debug=all in secondary namenode and got the following output Cipher Suite: TLS_KRB5_WITH_3DES_EDE_CBC_SHA Compression Method: 0 Extension renegotiation_info, renegotiated_connection: empty *** %% Created: [Session-1, TLS_KRB5_WITH_3DES_EDE_CBC_SHA] ** TLS_KRB5_WITH_3DES_EDE_CBC_SHA *** ServerHelloDone *** ClientKeyExchange, Kerberos ... ... .. *** Finished verify_data: { 190, 127, 20, 131, 10, 136, 84, 207, 172, 130, 31, 53 } *** main, WRITE: TLSv1 Handshake, length = 40 main, READ: TLSv1 Alert, length = 2 main, RECV TLSv1 ALERT: fatal, handshake_failure main, called closeSocket() main, handling exception: javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure 11/12/29 18:47:02 ERROR namenode.SecondaryNameNode: checkpoint: Content-Length header is not provided by the namenode when trying to fetch https://NN:50475/getimage?getimage=1 with security enabled fsck calls lead to handshake_failure and hftp fails throwing the same exception in the logs - Key: HDFS-2386 URL: https://issues.apache.org/jira/browse/HDFS-2386 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.205.0 Reporter: Arpit Gupta -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1314) dfs.block.size accepts only absolute value
[ https://issues.apache.org/jira/browse/HDFS-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-1314: -- Status: Open (was: Patch Available) dfs.block.size accepts only absolute value -- Key: HDFS-1314 URL: https://issues.apache.org/jira/browse/HDFS-1314 Project: Hadoop HDFS Issue Type: Bug Reporter: Karim Saadah Assignee: Sho Shimauchi Priority: Minor Labels: newbie Attachments: hdfs-1314.txt, hdfs-1314.txt, hdfs-1314.txt Using dfs.block.size=8388608 works but dfs.block.size=8mb does not. Using dfs.block.size=8mb should throw some WARNING on NumberFormatException. (http://pastebin.corp.yahoo.com/56129) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2716) HA: Configuration needs to allow different dfs.http.addresses for each HA NN
[ https://issues.apache.org/jira/browse/HDFS-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-2716: -- Attachment: hdfs-2716.txt Attached patch fixes the generic conf code to handle NN IDs as well as Nameservice IDs. HA: Configuration needs to allow different dfs.http.addresses for each HA NN Key: HDFS-2716 URL: https://issues.apache.org/jira/browse/HDFS-2716 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: HA branch (HDFS-1623) Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-2716.txt Earlier on the HA branch we expanded the configuration so that different IPC addresses can be specified for each of the HA NNs in a cluster. But we didn't do this for the HTTP address. This has proved problematic while working on HDFS-2291 (checkpointing in HA). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2291) HA: Checkpointing in an HA setup
[ https://issues.apache.org/jira/browse/HDFS-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-2291: -- Attachment: hdfs-2291.txt Attached patch adds a thread to the SBN which takes checkpoints. It doesn't currently deal with the case where a checkpoint is happening while the SBN needs to become active. I'm working on that now, but figured I'd put this patch up for early review. This depends on HDFS-2716. HA: Checkpointing in an HA setup Key: HDFS-2291 URL: https://issues.apache.org/jira/browse/HDFS-2291 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: HA branch (HDFS-1623) Reporter: Aaron T. Myers Assignee: Todd Lipcon Fix For: HA branch (HDFS-1623) Attachments: hdfs-2291.txt We obviously need to create checkpoints when HA is enabled. One thought is to use a third, dedicated checkpointing node in addition to the active and standby nodes. Another option would be to make the standby capable of also performing the function of checkpointing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-2732) Add support for the standby in the bin scripts
[ https://issues.apache.org/jira/browse/HDFS-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins resolved HDFS-2732. --- Resolution: Won't Fix Add support for the standby in the bin scripts -- Key: HDFS-2732 URL: https://issues.apache.org/jira/browse/HDFS-2732 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins Assignee: Eli Collins We need to update the bin scripts to support SBNs. Two ideas: Modify start-dfs.sh to start another copy of the NN if HA is configured. We could introduce a file similar to masters (2NN hosts) called standbys which lists the SBN hosts, and start-dfs.sh would automatically make the NN it starts active (and leave the NNs listed in standby as is). Or simpler, we could just provide a start-namenode.sh script that a user can run to start the SBN on another host themselves. The user would manually tell the other NN to be active via HAAdmin (or start-dfs.sh could do that automatically, ie assume the NN it starts should be the primary). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2732) Add support for the standby in the bin scripts
[ https://issues.apache.org/jira/browse/HDFS-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177538#comment-13177538 ] Eli Collins commented on HDFS-2732: --- Good point, I missed that. It doesn't work for me since I'm running both the NN and SBN on the same host, so the 2nd fails to start because the pid file already exists (other nn already claimed the file). The log dirs would collide as well. In any case, I don't think we need to support the NN and SBN on the same host in the start scripts, developers can workaround this by changing the HADOOP_CONF_DIR and running start-dfs.sh again or start just the NN manually as I've been doing with a separate conf dir. Add support for the standby in the bin scripts -- Key: HDFS-2732 URL: https://issues.apache.org/jira/browse/HDFS-2732 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins Assignee: Eli Collins We need to update the bin scripts to support SBNs. Two ideas: Modify start-dfs.sh to start another copy of the NN if HA is configured. We could introduce a file similar to masters (2NN hosts) called standbys which lists the SBN hosts, and start-dfs.sh would automatically make the NN it starts active (and leave the NNs listed in standby as is). Or simpler, we could just provide a start-namenode.sh script that a user can run to start the SBN on another host themselves. The user would manually tell the other NN to be active via HAAdmin (or start-dfs.sh could do that automatically, ie assume the NN it starts should be the primary). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2709) HA: Appropriately handle error conditions in EditLogTailer
[ https://issues.apache.org/jira/browse/HDFS-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177540#comment-13177540 ] Todd Lipcon commented on HDFS-2709: --- A few thoughts on the overall approach: - Rather than modify EditLogFileInputStream to take a startTxId, why not do the skipping (what you call {{setInitialPosition}}) from the caller? ie modify {{FSEditLogLoader}} to skip the transactions that have already been replayed? The skipping code doesn't seem specific to the input stream itself. - I'm not convinced why we need to have the {{partialLoadOk}} flag in {{FSEditLogLoader}}. IMO if the log is truncated, it's still an error as far as the loader is concerned - we just want to let the caller continue from where the error occured. The only trick is how to go about getting the last successfully loaded txid out of the FSEditLogLoader in the error case -- I guess a member variable and a getter would work there? Do you think this ends up messier than the way you've done it? - Can we add some non-HA tests that exercise FileJournalManager/FSEditLogLoader's ability to start mid-stream? Not sure if that's feasible. HA: Appropriately handle error conditions in EditLogTailer -- Key: HDFS-2709 URL: https://issues.apache.org/jira/browse/HDFS-2709 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: HA branch (HDFS-1623) Reporter: Todd Lipcon Assignee: Aaron T. Myers Priority: Critical Attachments: HDFS-2709-HDFS-1623.patch, HDFS-2709-HDFS-1623.patch, HDFS-2709-HDFS-1623.patch Currently if the edit log tailer experiences an error replaying edits in the middle of a file, it will go back to retrying from the beginning of the file on the next tailing iteration. This is incorrect since many of the edits will have already been replayed, and not all edits are idempotent. Instead, we either need to (a) support reading from the middle of a finalized file (ie skip those edits already applied), or (b) abort the standby if it hits an error while tailing. If a isn't simple, let's do b for now and come back to 'a' later since this is a rare circumstance and better to abort than be incorrect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2734) Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered
Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered Key: HDFS-2734 URL: https://issues.apache.org/jira/browse/HDFS-2734 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.20.1 Reporter: J.Andreina Priority: Minor Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2734) Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered
[ https://issues.apache.org/jira/browse/HDFS-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177580#comment-13177580 ] Harsh J commented on HDFS-2734: --- Hi J.Andreina, That property is for upto 0.20/1.0 SecondaryNameNodes. It is OK to be in core-site.xml. What exact version are you reporting this for? What do you see in SNN_HOST:50090/conf? Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered Key: HDFS-2734 URL: https://issues.apache.org/jira/browse/HDFS-2734 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.1, 0.23.0 Reporter: J.Andreina Priority: Minor Even if we configure the property fs.checkpoint.size in both core-site.xml and hdfs-site.xml the values are not been considered -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira