[jira] [Commented] (HDFS-2065) Fix NPE in DFSClient.getFileChecksum
[ https://issues.apache.org/jira/browse/HDFS-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088848#comment-13088848 ] Bharath Mundlapudi commented on HDFS-2065: -- Ok, I will recheck this. Fix NPE in DFSClient.getFileChecksum Key: HDFS-2065 URL: https://issues.apache.org/jira/browse/HDFS-2065 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-2065-1.patch The following code can throw NPE if callGetBlockLocations returns null. If server returns null {code} ListLocatedBlock locatedblocks = callGetBlockLocations(namenode, src, 0, Long.MAX_VALUE).getLocatedBlocks(); {code} The right fix for this is server should throw right exception. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1976) Logging in DataXceiver will sometimes repeat stack traces
[ https://issues.apache.org/jira/browse/HDFS-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1976: - Assignee: Bharath Mundlapudi Logging in DataXceiver will sometimes repeat stack traces - Key: HDFS-1976 URL: https://issues.apache.org/jira/browse/HDFS-1976 Project: Hadoop HDFS Issue Type: Improvement Reporter: Joey Echeverria Assignee: Bharath Mundlapudi Priority: Minor The run() method in DataXceiver logs the stack trace of all throwables thrown while performing an operation. In some cases, the operations also log stack traces despite throwing the exception up the stack. The logging code should try to avoid double-logging stack traces where possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1872) BPOfferService.cleanUp(..) throws NullPointerException
[ https://issues.apache.org/jira/browse/HDFS-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070972#comment-13070972 ] Bharath Mundlapudi commented on HDFS-1872: -- https://issues.apache.org/jira/browse/HDFS-1592 BPOfferService.cleanUp(..) throws NullPointerException -- Key: HDFS-1872 URL: https://issues.apache.org/jira/browse/HDFS-1872 Project: Hadoop HDFS Issue Type: Bug Components: data-node Reporter: Tsz Wo (Nicholas), SZE {noformat} NullPointerException at org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.cleanUp(DataNode.java:1005) at org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.run(DataNode.java:1220) at java.lang.Thread.run(Thread.java:662) {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1796) Switch NameNode to use non-fair locks
[ https://issues.apache.org/jira/browse/HDFS-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070976#comment-13070976 ] Bharath Mundlapudi commented on HDFS-1796: -- Fair lock guarantees the order of execution at the cost of performance. If delete and read arrives at the same time but read operation takes the lock first, without fair lock, system might schedule delete operation first and then read. Shouldn't we care about the correctness and order of execution in file systems? Switch NameNode to use non-fair locks - Key: HDFS-1796 URL: https://issues.apache.org/jira/browse/HDFS-1796 Project: Hadoop HDFS Issue Type: Improvement Reporter: Hairong Kuang Attachments: non-fair-lock.patch According to JavaDoc, a non-fair lock will normally have higher throughput than a fair lock. Our experiment also shows an improved performance when using a non-fair lock. We should switch namenode to use non-fair locks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1776) Bug in Concat code
[ https://issues.apache.org/jira/browse/HDFS-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1776: - Assignee: Bharath Mundlapudi Bug in Concat code -- Key: HDFS-1776 URL: https://issues.apache.org/jira/browse/HDFS-1776 Project: Hadoop HDFS Issue Type: Bug Reporter: Dmytro Molkov Assignee: Bharath Mundlapudi There is a bug in the concat code. Specifically: in INodeFile.appendBlocks() we need to first reassign the blocks list and then go through it and update the INode pointer. Otherwise we are not updating the inode pointer on all of the new blocks in the file. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1776) Bug in Concat code
[ https://issues.apache.org/jira/browse/HDFS-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1776: - Affects Version/s: 0.23.0 Fix Version/s: 0.23.0 Bug in Concat code -- Key: HDFS-1776 URL: https://issues.apache.org/jira/browse/HDFS-1776 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Dmytro Molkov Assignee: Bharath Mundlapudi Fix For: 0.23.0 There is a bug in the concat code. Specifically: in INodeFile.appendBlocks() we need to first reassign the blocks list and then go through it and update the INode pointer. Otherwise we are not updating the inode pointer on all of the new blocks in the file. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1776) Bug in Concat code
[ https://issues.apache.org/jira/browse/HDFS-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1776: - Attachment: HDFS-1776-1.patch Attaching a patch for this. Bug in Concat code -- Key: HDFS-1776 URL: https://issues.apache.org/jira/browse/HDFS-1776 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Dmytro Molkov Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-1776-1.patch There is a bug in the concat code. Specifically: in INodeFile.appendBlocks() we need to first reassign the blocks list and then go through it and update the INode pointer. Otherwise we are not updating the inode pointer on all of the new blocks in the file. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1977) Stop using StringUtils.stringifyException()
[ https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13064067#comment-13064067 ] Bharath Mundlapudi commented on HDFS-1977: -- Thank you all, I am attaching a patch which addresses Jitendra's comment. Stop using StringUtils.stringifyException() --- Key: HDFS-1977 URL: https://issues.apache.org/jira/browse/HDFS-1977 Project: Hadoop HDFS Issue Type: Improvement Reporter: Joey Echeverria Assignee: Bharath Mundlapudi Priority: Minor Attachments: HDFS-1977-1.patch, HDFS-1977-2.patch, HDFS-1977-3.patch The old version of the logging APIs didn't support logging stack traces by passing exceptions to the logging methods (e.g. Log.error()). A number of log statements make use of StringUtils.stringifyException() to get around the old behavior. It would be nice if this could get cleaned up to make use of the the logger's stack trace printing. This also gives users more control since you can configure how the stack traces are written to the logs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1977) Stop using StringUtils.stringifyException()
[ https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1977: - Attachment: HDFS-1977-4.patch Stop using StringUtils.stringifyException() --- Key: HDFS-1977 URL: https://issues.apache.org/jira/browse/HDFS-1977 Project: Hadoop HDFS Issue Type: Improvement Reporter: Joey Echeverria Assignee: Bharath Mundlapudi Priority: Minor Attachments: HDFS-1977-1.patch, HDFS-1977-2.patch, HDFS-1977-3.patch, HDFS-1977-4.patch The old version of the logging APIs didn't support logging stack traces by passing exceptions to the logging methods (e.g. Log.error()). A number of log statements make use of StringUtils.stringifyException() to get around the old behavior. It would be nice if this could get cleaned up to make use of the the logger's stack trace printing. This also gives users more control since you can configure how the stack traces are written to the logs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1872) BPOfferService.cleanUp(..) throws NullPointerException
[ https://issues.apache.org/jira/browse/HDFS-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13064161#comment-13064161 ] Bharath Mundlapudi commented on HDFS-1872: -- Yes, I was seeing NPE in cleanup code earlier. I made some changes in this area related to datanode exit. It should be fine now. BPOfferService.cleanUp(..) throws NullPointerException -- Key: HDFS-1872 URL: https://issues.apache.org/jira/browse/HDFS-1872 Project: Hadoop HDFS Issue Type: Bug Components: data-node Reporter: Tsz Wo (Nicholas), SZE {noformat} NullPointerException at org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.cleanUp(DataNode.java:1005) at org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.run(DataNode.java:1220) at java.lang.Thread.run(Thread.java:662) {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1977) Stop using StringUtils.stringifyException()
[ https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062068#comment-13062068 ] Bharath Mundlapudi commented on HDFS-1977: -- Todd, If you don't have any further comments on this patch, can you please commit this? Stop using StringUtils.stringifyException() --- Key: HDFS-1977 URL: https://issues.apache.org/jira/browse/HDFS-1977 Project: Hadoop HDFS Issue Type: Improvement Reporter: Joey Echeverria Assignee: Bharath Mundlapudi Priority: Minor Attachments: HDFS-1977-1.patch, HDFS-1977-2.patch, HDFS-1977-3.patch The old version of the logging APIs didn't support logging stack traces by passing exceptions to the logging methods (e.g. Log.error()). A number of log statements make use of StringUtils.stringifyException() to get around the old behavior. It would be nice if this could get cleaned up to make use of the the logger's stack trace printing. This also gives users more control since you can configure how the stack traces are written to the logs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1977) Stop using StringUtils.stringifyException()
[ https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061719#comment-13061719 ] Bharath Mundlapudi commented on HDFS-1977: -- Sure. I agree with you. I am posting an updated patch with your suggestions. Stop using StringUtils.stringifyException() --- Key: HDFS-1977 URL: https://issues.apache.org/jira/browse/HDFS-1977 Project: Hadoop HDFS Issue Type: Improvement Reporter: Joey Echeverria Assignee: Bharath Mundlapudi Priority: Minor Attachments: HDFS-1977-1.patch, HDFS-1977-2.patch, HDFS-1977-3.patch The old version of the logging APIs didn't support logging stack traces by passing exceptions to the logging methods (e.g. Log.error()). A number of log statements make use of StringUtils.stringifyException() to get around the old behavior. It would be nice if this could get cleaned up to make use of the the logger's stack trace printing. This also gives users more control since you can configure how the stack traces are written to the logs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1977) Stop using StringUtils.stringifyException()
[ https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1977: - Attachment: HDFS-1977-3.patch Stop using StringUtils.stringifyException() --- Key: HDFS-1977 URL: https://issues.apache.org/jira/browse/HDFS-1977 Project: Hadoop HDFS Issue Type: Improvement Reporter: Joey Echeverria Assignee: Bharath Mundlapudi Priority: Minor Attachments: HDFS-1977-1.patch, HDFS-1977-2.patch, HDFS-1977-3.patch The old version of the logging APIs didn't support logging stack traces by passing exceptions to the logging methods (e.g. Log.error()). A number of log statements make use of StringUtils.stringifyException() to get around the old behavior. It would be nice if this could get cleaned up to make use of the the logger's stack trace printing. This also gives users more control since you can configure how the stack traces are written to the logs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1977) Stop using StringUtils.stringifyException()
[ https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1977: - Attachment: HDFS-1977-2.patch Things changed since last post, reattaching with new changes. Stop using StringUtils.stringifyException() --- Key: HDFS-1977 URL: https://issues.apache.org/jira/browse/HDFS-1977 Project: Hadoop HDFS Issue Type: Improvement Reporter: Joey Echeverria Assignee: Bharath Mundlapudi Priority: Minor Attachments: HDFS-1977-1.patch, HDFS-1977-2.patch The old version of the logging APIs didn't support logging stack traces by passing exceptions to the logging methods (e.g. Log.error()). A number of log statements make use of StringUtils.stringifyException() to get around the old behavior. It would be nice if this could get cleaned up to make use of the the logger's stack trace printing. This also gives users more control since you can configure how the stack traces are written to the logs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1977) Stop using StringUtils.stringifyException()
[ https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060910#comment-13060910 ] Bharath Mundlapudi commented on HDFS-1977: -- This patch doesn't include unit tests, since its just adopting to new logging api. No new tests are required. Stop using StringUtils.stringifyException() --- Key: HDFS-1977 URL: https://issues.apache.org/jira/browse/HDFS-1977 Project: Hadoop HDFS Issue Type: Improvement Reporter: Joey Echeverria Assignee: Bharath Mundlapudi Priority: Minor Attachments: HDFS-1977-1.patch, HDFS-1977-2.patch The old version of the logging APIs didn't support logging stack traces by passing exceptions to the logging methods (e.g. Log.error()). A number of log statements make use of StringUtils.stringifyException() to get around the old behavior. It would be nice if this could get cleaned up to make use of the the logger's stack trace printing. This also gives users more control since you can configure how the stack traces are written to the logs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2123) 1073: Checkpoint interval should be based on txn count, not size
[ https://issues.apache.org/jira/browse/HDFS-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060046#comment-13060046 ] Bharath Mundlapudi commented on HDFS-2123: -- I have reviewed the patch. This change is more meaningful than check pointing based on size. +1 to the approach. There are some minor comments on logging msgs since we are moving to txns, we should reflect this in log msgs too. 1. Replace in Checkpointer +LOG.info(Log Size Trigger : + checkpointTxnCount + txns ); With +LOG.info(Transaction Count Trigger : + checkpointTxnCount + txns ); 2. Replace in SecondaryNameNode + + \nCheckpoint Size : + StringUtils.byteDesc(checkpointTxnCount) ++ (= + checkpointTxnCount + bytes) With + + \nTransaction Count : + StringUtils.byteDesc(checkpointTxnCount) ++ (= + checkpointTxnCount + txns) 3. Replace in SecondaryNamenode +LOG.info(Log Size Trigger: + checkpointTxnCount + txns); with +LOG.info(Transaction Count Trigger : + checkpointTxnCount + txns ); 4. Replace in SecondaryNamenode + System.err.println(EditLog size + count + transactions is + smaller than configured checkpoint + + interval + checkpointTxnCount + transactions.); with + System.err.println(EditLog transactions + count + is + smaller than configured checkpoint + + transactions + checkpointTxnCount); 1073: Checkpoint interval should be based on txn count, not size Key: HDFS-2123 URL: https://issues.apache.org/jira/browse/HDFS-2123 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: Edit log branch (HDFS-1073) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: Edit log branch (HDFS-1073) Attachments: hdfs-2123.txt, hdfs-2123.txt Currently, the administrator can configure the secondary namenode to checkpoint either every N seconds, or every N bytes worth of edit log. It would make more sense to get rid of the size-based interval and instead allow the administrator to specify checkpoints every N transactions. This also simplifies the code a little bit. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2109) Store uMask as member variable to DFSClient.Conf
[ https://issues.apache.org/jira/browse/HDFS-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-2109: - Attachment: HDFS-2109-2.patch Fixed the javac warning Store uMask as member variable to DFSClient.Conf Key: HDFS-2109 URL: https://issues.apache.org/jira/browse/HDFS-2109 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-2109-1.patch, HDFS-2109-2.patch As a part of removing reference to conf in DFSClient, I am proposing replacing FsPermission.getUMask(conf) everywhere in DFSClient class with dfsClientConf.uMask by storing uMask as a member variable to DFSClient.Conf. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2109) Store uMask as member variable to DFSClient.Conf
[ https://issues.apache.org/jira/browse/HDFS-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058115#comment-13058115 ] Bharath Mundlapudi commented on HDFS-2109: -- Failed tests are not related to this patch. Store uMask as member variable to DFSClient.Conf Key: HDFS-2109 URL: https://issues.apache.org/jira/browse/HDFS-2109 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-2109-1.patch, HDFS-2109-2.patch As a part of removing reference to conf in DFSClient, I am proposing replacing FsPermission.getUMask(conf) everywhere in DFSClient class with dfsClientConf.uMask by storing uMask as a member variable to DFSClient.Conf. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2109) Store uMask as member variable to DFSClient.Conf
Store uMask as member variable to DFSClient.Conf Key: HDFS-2109 URL: https://issues.apache.org/jira/browse/HDFS-2109 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 As a part of removing reference to conf in DFSClient, I am proposing replacing FsPermission.getUMask(conf) everywhere in DFSClient class with dfsClientConf.uMask. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2109) Store uMask as member variable to DFSClient.Conf
[ https://issues.apache.org/jira/browse/HDFS-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-2109: - Description: As a part of removing reference to conf in DFSClient, I am proposing replacing FsPermission.getUMask(conf) everywhere in DFSClient class with dfsClientConf.uMask by storing uMask as a member variable to DFSClient.Conf. was: As a part of removing reference to conf in DFSClient, I am proposing replacing FsPermission.getUMask(conf) everywhere in DFSClient class with dfsClientConf.uMask. Store uMask as member variable to DFSClient.Conf Key: HDFS-2109 URL: https://issues.apache.org/jira/browse/HDFS-2109 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 As a part of removing reference to conf in DFSClient, I am proposing replacing FsPermission.getUMask(conf) everywhere in DFSClient class with dfsClientConf.uMask by storing uMask as a member variable to DFSClient.Conf. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2109) Store uMask as member variable to DFSClient.Conf
[ https://issues.apache.org/jira/browse/HDFS-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-2109: - Attachment: HDFS-2109-1.patch Attaching the patch. Store uMask as member variable to DFSClient.Conf Key: HDFS-2109 URL: https://issues.apache.org/jira/browse/HDFS-2109 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-2109-1.patch As a part of removing reference to conf in DFSClient, I am proposing replacing FsPermission.getUMask(conf) everywhere in DFSClient class with dfsClientConf.uMask by storing uMask as a member variable to DFSClient.Conf. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2092) Create a light inner conf class in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054264#comment-13054264 ] Bharath Mundlapudi commented on HDFS-2092: -- We are not concerned about the task attempt. The problem here is for Task Tracker's availability. The way conf was designed has its own benefits. At the same time it comes with some disadvantages. What if a task attempt can run for a day or more? This is not uncommon in, our clusters. Again, I am listing couple of issues, 1. With UGI, conf will be created per user in TT. (Security folks?) 2. PIG or any other job can store arbitrary data. Hadoop framework should be able to deal with it as far as it can. 3. Last but not least, API should not hold on to client's data. As every job is different so can workloads can be different. So one can't see or hear all the problems. Create a light inner conf class in DFSClient Key: HDFS-2092 URL: https://issues.apache.org/jira/browse/HDFS-2092 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-2092-1.patch, HDFS-2092-2.patch At present, DFSClient stores reference to configuration object. Since, these configuration objects are pretty big at times can blot the processes which has multiple DFSClient objects like in TaskTracker. This is an attempt to remove the reference of conf object in DFSClient. This patch creates a light inner conf class and copies the required keys from the Configuration object. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2092) Create a light inner conf class in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054632#comment-13054632 ] Bharath Mundlapudi commented on HDFS-2092: -- Todd, Thanks for the reasons. When we mean a client it can be anything, like TT/JT which has TIP/JIP. You are right, client TIP/JIP can have references to JobConf. But then reference scope is decided by client. And yes, eventually, we need to fix the FS cache you are referring also if there are any leaks. Create a light inner conf class in DFSClient Key: HDFS-2092 URL: https://issues.apache.org/jira/browse/HDFS-2092 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-2092-1.patch, HDFS-2092-2.patch At present, DFSClient stores reference to configuration object. Since, these configuration objects are pretty big at times can blot the processes which has multiple DFSClient objects like in TaskTracker. This is an attempt to remove the reference of conf object in DFSClient. This patch creates a light inner conf class and copies the required keys from the Configuration object. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-2103) Read lock must be released before acquiring a write lock
[ https://issues.apache.org/jira/browse/HDFS-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi resolved HDFS-2103. -- Resolution: Not A Problem Didn't notice the finally block, where read lock is released. I am closing this Jira. Read lock must be released before acquiring a write lock Key: HDFS-2103 URL: https://issues.apache.org/jira/browse/HDFS-2103 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 In FSNamesystem.getBlockLocationsUpdateTimes function, we have the following code: {code} for (int attempt = 0; attempt 2; attempt++) { if (attempt == 0) { // first attempt is with readlock readLock(); } else { // second attempt is with write lock writeLock(); // writelock is needed to set accesstime } ... if (attempt == 0) { continue; } {code} In the above code, readLock is acquired in attempt 0 and if the execution enters in the continue block, then it tries to acquire writeLock before releasing the readLock. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2092) Remove configuration object reference in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054134#comment-13054134 ] Bharath Mundlapudi commented on HDFS-2092: -- Also, exiting unit tests should cover this path. So i haven't added new unit tests. Remove configuration object reference in DFSClient -- Key: HDFS-2092 URL: https://issues.apache.org/jira/browse/HDFS-2092 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-2092-1.patch, HDFS-2092-2.patch At present, DFSClient stores reference to configuration object. Since, these configuration objects are pretty big at times can blot the processes which has multiple DFSClient objects like in TaskTracker. This is an attempt to remove the reference of conf object in DFSClient. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2092) Remove configuration object reference in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054176#comment-13054176 ] Bharath Mundlapudi commented on HDFS-2092: -- Hi Eli, Does this change mean that a Configuration object can now bee free'd because there's one fewer ref to it? Yes, the direction of this patch is that. Eventually, we will be passing around only the DFSClient#conf or only required parameters to the downstream. This will be a big change and needs border discussion. But you are right, the idea is to stop having references to the conf object coming from the users. We want to let client code to decide the scope of conf object. Regarding memory, these will be few [key,value] pairs copied into DFSClient but then will be freeing the blotted conf object for the GC. That will be a big win on memory. Remove configuration object reference in DFSClient -- Key: HDFS-2092 URL: https://issues.apache.org/jira/browse/HDFS-2092 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-2092-1.patch, HDFS-2092-2.patch At present, DFSClient stores reference to configuration object. Since, these configuration objects are pretty big at times can blot the processes which has multiple DFSClient objects like in TaskTracker. This is an attempt to remove the reference of conf object in DFSClient. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2105) Remove the references to configuration object from the DFSClient library.
Remove the references to configuration object from the DFSClient library. - Key: HDFS-2105 URL: https://issues.apache.org/jira/browse/HDFS-2105 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 This is an umbrella jira to track removing the references to conf object in DFSClient library. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2092) Create a light inner conf class in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054245#comment-13054245 ] Bharath Mundlapudi commented on HDFS-2092: -- Hi Aaron, That was just a sample of measurement for a day. We should care for MAX here in this case. Also, Going forward, PIG 0.9 will store lots of meta data in the conf and also one can embed the PIG script itself in the conf. This can potentially blow the TT. We can measure an approx size of conf by the job.xml file in the job history location. Since one can store anything in the job conf, we should be careful with the references to this object - we should not hold for long duration. Create a light inner conf class in DFSClient Key: HDFS-2092 URL: https://issues.apache.org/jira/browse/HDFS-2092 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-2092-1.patch, HDFS-2092-2.patch At present, DFSClient stores reference to configuration object. Since, these configuration objects are pretty big at times can blot the processes which has multiple DFSClient objects like in TaskTracker. This is an attempt to remove the reference of conf object in DFSClient. This patch creates a light inner conf class and copies the required keys from the Configuration object. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2092) Remove configuration object reference in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-2092: - Attachment: HDFS-2092-1.patch Attaching a patch for this Remove configuration object reference in DFSClient -- Key: HDFS-2092 URL: https://issues.apache.org/jira/browse/HDFS-2092 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-2092-1.patch At present, DFSClient stores reference to configuration object. Since, these configuration objects are pretty big at times can blot the processes which has multiple DFSClient objects like in TaskTracker. This is an attempt to remove the reference of conf object in DFSClient. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2092) Remove configuration object reference in DFSClient
Remove configuration object reference in DFSClient -- Key: HDFS-2092 URL: https://issues.apache.org/jira/browse/HDFS-2092 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 At present, DFSClient stores reference to configuration object. Since, these configuration objects are pretty big at times can blot the processes which has multiple DFSClient objects like in TaskTracker. This is an attempt to remove the reference of conf object in DFSClient. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2094) Add metrics for write pipeline failures
Add metrics for write pipeline failures --- Key: HDFS-2094 URL: https://issues.apache.org/jira/browse/HDFS-2094 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Write pipeline can fail for various reasons like rpc connection issues, disk problem etc. I am proposing to add metrics to detect write pipeline issues. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1692) In secure mode, Datanode process doesn't exit when disks fail.
[ https://issues.apache.org/jira/browse/HDFS-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050792#comment-13050792 ] Bharath Mundlapudi commented on HDFS-1692: -- Exiting tests like TestDataNodeExit should check for this condition. So i have not added a new test for this. In secure mode, Datanode process doesn't exit when disks fail. -- Key: HDFS-1692 URL: https://issues.apache.org/jira/browse/HDFS-1692 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.20.204.0, 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.20.204.0, 0.23.0 Attachments: HDFS-1692-1.patch, HDFS-1692-v0.23-1.patch, HDFS-1692-v0.23-2.patch In secure mode, when disks fail more than volumes tolerated, datanode process doesn't exit properly and it just hangs even though shutdown method is called. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1692) In secure mode, Datanode process doesn't exit when disks fail.
[ https://issues.apache.org/jira/browse/HDFS-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1692: - Attachment: HDFS-1692-v0.23-2.patch I have cleaned up a little bit, like the logging related stuff and few comments. Uploading the patch again. In secure mode, Datanode process doesn't exit when disks fail. -- Key: HDFS-1692 URL: https://issues.apache.org/jira/browse/HDFS-1692 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.20.204.0, 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.20.204.0, 0.23.0 Attachments: HDFS-1692-1.patch, HDFS-1692-v0.23-1.patch, HDFS-1692-v0.23-2.patch In secure mode, when disks fail more than volumes tolerated, datanode process doesn't exit properly and it just hangs even though shutdown method is called. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1942) If all Block Pool service threads exit then datanode should exit.
[ https://issues.apache.org/jira/browse/HDFS-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1942: - Attachment: HDFS-1942-2.patch Attaching a patch with some test cleanup. Reduced the test time. If all Block Pool service threads exit then datanode should exit. - Key: HDFS-1942 URL: https://issues.apache.org/jira/browse/HDFS-1942 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Attachments: HDFS-1942-1.patch, HDFS-1942-2.patch Currently, if all block pool service threads exit, Datanode continue to run. This should be fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1692) In secure mode, Datanode process doesn't exit when disks fail.
[ https://issues.apache.org/jira/browse/HDFS-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1692: - Affects Version/s: 0.23.0 Fix Version/s: 0.23.0 In secure mode, Datanode process doesn't exit when disks fail. -- Key: HDFS-1692 URL: https://issues.apache.org/jira/browse/HDFS-1692 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.20.204.0, 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.20.204.0, 0.23.0 Attachments: HDFS-1692-1.patch In secure mode, when disks fail more than volumes tolerated, datanode process doesn't exit properly and it just hangs even though shutdown method is called. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1692) In secure mode, Datanode process doesn't exit when disks fail.
[ https://issues.apache.org/jira/browse/HDFS-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1692: - Attachment: HDFS-1692-v0.23-1.patch Attaching a patch for version 0.23. In secure mode, Datanode process doesn't exit when disks fail. -- Key: HDFS-1692 URL: https://issues.apache.org/jira/browse/HDFS-1692 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.20.204.0, 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.20.204.0, 0.23.0 Attachments: HDFS-1692-1.patch, HDFS-1692-v0.23-1.patch In secure mode, when disks fail more than volumes tolerated, datanode process doesn't exit properly and it just hangs even though shutdown method is called. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-1977) Stop using StringUtils.stringifyException()
[ https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi reassigned HDFS-1977: Assignee: Bharath Mundlapudi Stop using StringUtils.stringifyException() --- Key: HDFS-1977 URL: https://issues.apache.org/jira/browse/HDFS-1977 Project: Hadoop HDFS Issue Type: Improvement Reporter: Joey Echeverria Assignee: Bharath Mundlapudi Priority: Minor The old version of the logging APIs didn't support logging stack traces by passing exceptions to the logging methods (e.g. Log.error()). A number of log statements make use of StringUtils.stringifyException() to get around the old behavior. It would be nice if this could get cleaned up to make use of the the logger's stack trace printing. This also gives users more control since you can configure how the stack traces are written to the logs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-2072) Remove StringUtils.stringifyException(ie) in logger functions
[ https://issues.apache.org/jira/browse/HDFS-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi resolved HDFS-2072. -- Resolution: Duplicate Remove StringUtils.stringifyException(ie) in logger functions - Key: HDFS-2072 URL: https://issues.apache.org/jira/browse/HDFS-2072 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Apache logger api has an overloaded function which can take the message and exception. I am proposing to clean the logging code with this api. ie.: Change the code from LOG.warn(msg, StringUtils.stringifyException(exception)); to LOG.warn(msg, exception); -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1977) Stop using StringUtils.stringifyException()
[ https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049438#comment-13049438 ] Bharath Mundlapudi commented on HDFS-1977: -- The newer logging API supports exceptions. ie.: Change the code from LOG.warn(msg, StringUtils.stringifyException(exception)); to LOG.warn(msg, exception); Stop using StringUtils.stringifyException() --- Key: HDFS-1977 URL: https://issues.apache.org/jira/browse/HDFS-1977 Project: Hadoop HDFS Issue Type: Improvement Reporter: Joey Echeverria Assignee: Bharath Mundlapudi Priority: Minor The old version of the logging APIs didn't support logging stack traces by passing exceptions to the logging methods (e.g. Log.error()). A number of log statements make use of StringUtils.stringifyException() to get around the old behavior. It would be nice if this could get cleaned up to make use of the the logger's stack trace printing. This also gives users more control since you can configure how the stack traces are written to the logs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1977) Stop using StringUtils.stringifyException()
[ https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1977: - Attachment: HDFS-1977-1.patch Attaching a patch. Stop using StringUtils.stringifyException() --- Key: HDFS-1977 URL: https://issues.apache.org/jira/browse/HDFS-1977 Project: Hadoop HDFS Issue Type: Improvement Reporter: Joey Echeverria Assignee: Bharath Mundlapudi Priority: Minor Attachments: HDFS-1977-1.patch The old version of the logging APIs didn't support logging stack traces by passing exceptions to the logging methods (e.g. Log.error()). A number of log statements make use of StringUtils.stringifyException() to get around the old behavior. It would be nice if this could get cleaned up to make use of the the logger's stack trace printing. This also gives users more control since you can configure how the stack traces are written to the logs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1942) If all Block Pool service threads exit then datanode should exit.
[ https://issues.apache.org/jira/browse/HDFS-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1942: - Attachment: HDFS-1942-3.patch Attaching a patch. Cleaned up some more code. If all Block Pool service threads exit then datanode should exit. - Key: HDFS-1942 URL: https://issues.apache.org/jira/browse/HDFS-1942 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Attachments: HDFS-1942-1.patch, HDFS-1942-2.patch, HDFS-1942-3.patch Currently, if all block pool service threads exit, Datanode continue to run. This should be fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2072) Remove StringUtils.stringifyException(ie) in logger functions
Remove StringUtils.stringifyException(ie) in logger functions - Key: HDFS-2072 URL: https://issues.apache.org/jira/browse/HDFS-2072 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Apache logger api has an overloaded function which can take the message and exception. I am proposing to clean the logging code with this api. ie.: Change the code from LOG.warn(msg, StringUtils.stringifyException(exception)); to LOG.warn(msg, exception); -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1942) If all Block Pool service threads exit then datanode should exit.
[ https://issues.apache.org/jira/browse/HDFS-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1942: - Attachment: HDFS-1942-1.patch Attaching the patch. If all Block Pool service threads exit then datanode should exit. - Key: HDFS-1942 URL: https://issues.apache.org/jira/browse/HDFS-1942 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Attachments: HDFS-1942-1.patch Currently, if all block pool service threads exit, Datanode continue to run. This should be fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2065) Fix NPE in DFSClient.getFileChecksum
Fix NPE in DFSClient.getFileChecksum Key: HDFS-2065 URL: https://issues.apache.org/jira/browse/HDFS-2065 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 The following code can throw NPE if callGetBlockLocations returns null. If server returns null {code} ListLocatedBlock locatedblocks = callGetBlockLocations(namenode, src, 0, Long.MAX_VALUE).getLocatedBlocks(); {code} The right fix for this is server should throw right exception. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2030) Fix the usability of namenode upgrade command
[ https://issues.apache.org/jira/browse/HDFS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046800#comment-13046800 ] Bharath Mundlapudi commented on HDFS-2030: -- Thanks for the review, Suresh. My comments inline. 1.1 Missing banner - done. 1.2 This method is package protected, this unit test just test this function instead of using time consuming MiniDFSCluster. 1.3 Removed the null and empty checks. 1.4 BoolpoolID is autogenerated. Now i have modified the tests to not mock this. 1.5 Added assertEquals where necessary 1.6 Made multiple tests 2.1 Since the setBlockPoolID() and setClusterID() are in NNStorage, i moved this function to this class now solves this problem. 2.2 renamed the function 2.3 comments moved outside the function and moved the if condition inside the method. Attaching the patch with these changes. Fix the usability of namenode upgrade command - Key: HDFS-2030 URL: https://issues.apache.org/jira/browse/HDFS-2030 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Priority: Minor Fix For: 0.23.0 Attachments: HDFS-2030-1.patch Fixing the Namenode upgrade option along the same line as Namenode format option. If clusterid is not given then clusterid will be automatically generated for the upgrade but if clusterid is given then it will be honored. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2030) Fix the usability of namenode upgrade command
[ https://issues.apache.org/jira/browse/HDFS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-2030: - Attachment: HDFS-2030-2.patch Attached the patch. Fix the usability of namenode upgrade command - Key: HDFS-2030 URL: https://issues.apache.org/jira/browse/HDFS-2030 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Priority: Minor Fix For: 0.23.0 Attachments: HDFS-2030-1.patch, HDFS-2030-2.patch Fixing the Namenode upgrade option along the same line as Namenode format option. If clusterid is not given then clusterid will be automatically generated for the upgrade but if clusterid is given then it will be honored. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2030) Fix the usability of namenode upgrade command
[ https://issues.apache.org/jira/browse/HDFS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-2030: - Attachment: HDFS-2030-3.patch Done some more minor cleanup related to comments and adding more description to test class. Please find the attached patch. Fix the usability of namenode upgrade command - Key: HDFS-2030 URL: https://issues.apache.org/jira/browse/HDFS-2030 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Priority: Minor Fix For: 0.23.0 Attachments: HDFS-2030-1.patch, HDFS-2030-2.patch, HDFS-2030-3.patch Fixing the Namenode upgrade option along the same line as Namenode format option. If clusterid is not given then clusterid will be automatically generated for the upgrade but if clusterid is given then it will be honored. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2057) Wait time to terminate the threads causing unit tests to take longer time
Wait time to terminate the threads causing unit tests to take longer time - Key: HDFS-2057 URL: https://issues.apache.org/jira/browse/HDFS-2057 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.20.204.0, 0.20.205.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.20.205.0 As a part of datanode process hang, this part of code was introduced in 0.20.204 to clean up all the waiting threads. - try { - readPool.awaitTermination(10, TimeUnit.SECONDS); - } catch (InterruptedException e) { - LOG.info(Exception occured in doStop: + e.getMessage()); - } - readPool.shutdownNow(); This was clearly meant for production, but all the unit tests uses minidfscluster and minimrcluster for shutdown which waits on this part of the code. Due to this, we saw increase in unit test run times. So removing this code. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2057) Wait time to terminate the threads causing unit tests to take longer time
[ https://issues.apache.org/jira/browse/HDFS-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-2057: - Attachment: HDFS-2057-1.patch Attaching the patch. Wait time to terminate the threads causing unit tests to take longer time - Key: HDFS-2057 URL: https://issues.apache.org/jira/browse/HDFS-2057 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.20.204.0, 0.20.205.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.20.205.0 Attachments: HDFS-2057-1.patch As a part of datanode process hang, this part of code was introduced in 0.20.204 to clean up all the waiting threads. - try { - readPool.awaitTermination(10, TimeUnit.SECONDS); - } catch (InterruptedException e) { - LOG.info(Exception occured in doStop: + e.getMessage()); - } - readPool.shutdownNow(); This was clearly meant for production, but all the unit tests uses minidfscluster and minimrcluster for shutdown which waits on this part of the code. Due to this, we saw increase in unit test run times. So removing this code. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2030) Fix the usability of namenode upgrade command
[ https://issues.apache.org/jira/browse/HDFS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-2030: - Attachment: HDFS-2030-1.patch Attaching the patch Fix the usability of namenode upgrade command - Key: HDFS-2030 URL: https://issues.apache.org/jira/browse/HDFS-2030 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Priority: Minor Fix For: 0.23.0 Attachments: HDFS-2030-1.patch Fixing the Namenode upgrade option along the same line as Namenode format option. If clusterid is not given then clusterid will be automatically generated for the upgrade but if clusterid is given then it will be honored. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2023) Backport of NPE for File.list and File.listFiles
[ https://issues.apache.org/jira/browse/HDFS-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044243#comment-13044243 ] Bharath Mundlapudi commented on HDFS-2023: -- I have run local test-patch on this patch and here are the results for this branch. -1s are not related to this patch. [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 9 new or modified tests. [exec] [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 Eclipse classpath. The patch causes the Eclipse classpath to differ from the contents of the lib directories. Javadoc 6 warnings and Eclipse classpath are not related to this patch. [javadoc] /export/space/branch-0.20-security.qa/hadoop-common/src/core/org/apache/hadoop/security/SecurityUtil.java:36: warning: sun.security.jgss.krb5.Krb5Util is Sun proprietary API and may be removed in a future release [javadoc] import sun.security.jgss.krb5.Krb5Util; [javadoc] ^ [javadoc] /export/space/branch-0.20-security.qa/hadoop-common/src/core/org/apache/hadoop/security/SecurityUtil.java:37: warning: sun.security.krb5.Credentials is Sun proprietary API and may be removed in a future release [javadoc] import sun.security.krb5.Credentials; [javadoc] ^ [javadoc] /export/space/branch-0.20-security.qa/hadoop-common/src/core/org/apache/hadoop/security/SecurityUtil.java:38: warning: sun.security.krb5.PrincipalName is Sun proprietary API and may be removed in a future release [javadoc] import sun.security.krb5.PrincipalName; [javadoc] ^ [javadoc] /export/space/branch-0.20-security.qa/hadoop-common/src/core/org/apache/hadoop/security/KerberosName.java:29: warning: sun.security.krb5.Config is Sun proprietary API and may be removed in a future release [javadoc] import sun.security.krb5.Config; [javadoc] ^ [javadoc] /export/space/branch-0.20-security.qa/hadoop-common/src/core/org/apache/hadoop/security/KerberosName.java:30: warning: sun.security.krb5.KrbException i s Sun proprietary API and may be removed in a future release [javadoc] import sun.security.krb5.KrbException; [javadoc] ^ [javadoc] /export/space/branch-0.20-security.qa/hadoop-common/src/core/org/apache/hadoop/security/KerberosName.java:76: warning: sun.security.krb5.Config is Sun proprietary API and may be removed in a future release [javadoc] private static Config kerbConf; [javadoc] ^ [javadoc] Standard Doclet version 1.6.0_17 [javadoc] Building tree for all the packages and classes... [javadoc] Building index for all the packages and classes... [javadoc] Building index for all classes... [javadoc] Generating /export/space/branch-0.20-security.qa/hadoop-common/build/docs/api/stylesheet.css... [javadoc] 6 warnings --- Backport of NPE for File.list and File.listFiles Key: HDFS-2023 URL: https://issues.apache.org/jira/browse/HDFS-2023 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.20.205.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.20.205.0 Attachments: HDFS-2023-1.patch Since we have multiple Jira's in trunk for common and hdfs, I am creating another jira for this issue. This patch addresses the following: 1. Provides FileUtil API for list and listFiles which throws IOException for null cases. 2. Replaces most of the code where JDK file API with FileUtil API. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2030) Fix the usability of namenode upgrade command
Fix the usability of namenode upgrade command - Key: HDFS-2030 URL: https://issues.apache.org/jira/browse/HDFS-2030 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Priority: Minor Fix For: 0.23.0 Fixing the Namenode upgrade option along the same line as Namenode format option. If clusterid is not given then clusterid will be automatically generated for the upgrade but if clusterid is given then it will be honored. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2014) RPM packages broke bin/hdfs script
[ https://issues.apache.org/jira/browse/HDFS-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043079#comment-13043079 ] Bharath Mundlapudi commented on HDFS-2014: -- I have tested this patch on few cases like hdfs format and upgrade etc. This patch works. Without this patch users will run into issues for the trunk. can someone commit this patch if you don't have any comments? RPM packages broke bin/hdfs script -- Key: HDFS-2014 URL: https://issues.apache.org/jira/browse/HDFS-2014 Project: Hadoop HDFS Issue Type: Bug Components: scripts Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Eric Yang Priority: Critical Fix For: 0.23.0 Attachments: HDFS-2014-1.patch, HDFS-2014.patch bin/hdfs now appears to depend on ../libexec, which doesn't exist inside of a source checkout: todd@todd-w510:~/git/hadoop-hdfs$ ./bin/hdfs namenode ./bin/hdfs: line 22: /home/todd/git/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory ./bin/hdfs: line 138: cygpath: command not found ./bin/hdfs: line 161: exec: : not found -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042432#comment-13042432 ] Bharath Mundlapudi commented on HDFS-988: - I am just wondering, if we are calling os sync at all on this code path. All i see is flush call which flushes from EditLogOutputStream (java buffers) to kernel buffers. Shouldn't we be doing the following? eStream.flush(); eStream.getFileOutputStream().getFD().sync(); This will make sure the edits are actually written to disk. Is there any reason for not doing this? saveNamespace can corrupt edits log, apparently due to race conditions -- Key: HDFS-988 URL: https://issues.apache.org/jira/browse/HDFS-988 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20-append, 0.21.0, 0.22.0 Reporter: dhruba borthakur Assignee: Eli Collins Priority: Blocker Fix For: 0.20-append, 0.22.0 Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988.txt, saveNamespace.txt, saveNamespace_20-append.patch The adminstrator puts the namenode is safemode and then issues the savenamespace command. This can corrupt the edits log. The problem is that when the NN enters safemode, there could still be pending logSycs occuring from other threads. Now, the saveNamespace command, when executed, would save a edits log with partial writes. I have seen this happen on 0.20. https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2023) Backport of NPE for File.list and File.listFiles
Backport of NPE for File.list and File.listFiles Key: HDFS-2023 URL: https://issues.apache.org/jira/browse/HDFS-2023 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.20.205.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.20.205.0 Since we have multiple Jira's in trunk for common and hdfs, I am creating another jira for this issue. This patch addresses the following: 1. Provides FileUtil API for list and listFiles which throws IOException for null cases. 2. Replaces most of the code where JDK file API with FileUtil API. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2023) Backport of NPE for File.list and File.listFiles
[ https://issues.apache.org/jira/browse/HDFS-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-2023: - Attachment: HDFS-2023-1.patch Attaching a patch for this issue. Backport of NPE for File.list and File.listFiles Key: HDFS-2023 URL: https://issues.apache.org/jira/browse/HDFS-2023 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.20.205.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.20.205.0 Attachments: HDFS-2023-1.patch Since we have multiple Jira's in trunk for common and hdfs, I am creating another jira for this issue. This patch addresses the following: 1. Provides FileUtil API for list and listFiles which throws IOException for null cases. 2. Replaces most of the code where JDK file API with FileUtil API. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2023) Backport of NPE for File.list and File.listFiles
[ https://issues.apache.org/jira/browse/HDFS-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042519#comment-13042519 ] Bharath Mundlapudi commented on HDFS-2023: -- Hi Eli, I wanted to have this change in the same Jira as 0.23 but those were reviewed and committed. So I created this one. Also, i could have done multiple patches in those same Jiras but this will be not good for reviwers. On the positive side, we can have this single Jira for all 0.20.*. But i agree with you on having same Jira for backporting. Backport of NPE for File.list and File.listFiles Key: HDFS-2023 URL: https://issues.apache.org/jira/browse/HDFS-2023 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.20.205.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.20.205.0 Attachments: HDFS-2023-1.patch Since we have multiple Jira's in trunk for common and hdfs, I am creating another jira for this issue. This patch addresses the following: 1. Provides FileUtil API for list and listFiles which throws IOException for null cases. 2. Replaces most of the code where JDK file API with FileUtil API. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2019) Fix all the places where Java method File.list is used with FileUtil.list API
Fix all the places where Java method File.list is used with FileUtil.list API - Key: HDFS-2019 URL: https://issues.apache.org/jira/browse/HDFS-2019 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Priority: Minor Fix For: 0.23.0 This new method FileUtil.list will throw an exception when disk is bad rather than returning null. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2019) Fix all the places where Java method File.list is used with FileUtil.list API
[ https://issues.apache.org/jira/browse/HDFS-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-2019: - Attachment: HDFS-2019-1.patch Attaching a patch. Fix all the places where Java method File.list is used with FileUtil.list API - Key: HDFS-2019 URL: https://issues.apache.org/jira/browse/HDFS-2019 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Priority: Minor Fix For: 0.23.0 Attachments: HDFS-2019-1.patch This new method FileUtil.list will throw an exception when disk is bad rather than returning null. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1934) Fix NullPointerException when certain File APIs return null
[ https://issues.apache.org/jira/browse/HDFS-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1934: - Attachment: HDFS-1934-5.patch Reattaching a patch with minor correction for logging. Fix NullPointerException when certain File APIs return null --- Key: HDFS-1934 URL: https://issues.apache.org/jira/browse/HDFS-1934 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-1934-1.patch, HDFS-1934-2.patch, HDFS-1934-3.patch, HDFS-1934-4.patch, HDFS-1934-5.patch While testing Disk Fail Inplace, We encountered the NPE from this part of the code. File[] files = dir.listFiles(); for (File f : files) { ... } This is kinda of an API issue. When a disk is bad (or name is not a directory), this API (listFiles, list) return null rather than throwing an exception. This 'for loop' throws a NPE exception. And same applies to dir.list() API. Fix all the places where null condition was not checked. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1934) Fix NullPointerException when certain File APIs return null
[ https://issues.apache.org/jira/browse/HDFS-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1934: - Status: Patch Available (was: Open) Fix NullPointerException when certain File APIs return null --- Key: HDFS-1934 URL: https://issues.apache.org/jira/browse/HDFS-1934 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-1934-1.patch, HDFS-1934-2.patch, HDFS-1934-3.patch While testing Disk Fail Inplace, We encountered the NPE from this part of the code. File[] files = dir.listFiles(); for (File f : files) { ... } This is kinda of an API issue. When a disk is bad (or name is not a directory), this API (listFiles, list) return null rather than throwing an exception. This 'for loop' throws a NPE exception. And same applies to dir.list() API. Fix all the places where null condition was not checked. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1934) Fix NullPointerException when certain File APIs return null
[ https://issues.apache.org/jira/browse/HDFS-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13040121#comment-13040121 ] Bharath Mundlapudi commented on HDFS-1934: -- Right, this patch is trying to address exactly what you have mentioned. Fix NullPointerException when certain File APIs return null --- Key: HDFS-1934 URL: https://issues.apache.org/jira/browse/HDFS-1934 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-1934-1.patch, HDFS-1934-2.patch, HDFS-1934-3.patch While testing Disk Fail Inplace, We encountered the NPE from this part of the code. File[] files = dir.listFiles(); for (File f : files) { ... } This is kinda of an API issue. When a disk is bad (or name is not a directory), this API (listFiles, list) return null rather than throwing an exception. This 'for loop' throws a NPE exception. And same applies to dir.list() API. Fix all the places where null condition was not checked. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1963) HDFS rpm integration project
[ https://issues.apache.org/jira/browse/HDFS-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13040371#comment-13040371 ] Bharath Mundlapudi commented on HDFS-1963: -- This change seems like breaking Mac builds.If i do ant binary with this patch i am running into this issue. BUILD FAILED hdfs/build.xml:1114: /Users/bharathm/work/projects/hadoop-trunk/hdfs.patch/hdfs/build/c++/Mac_OS_X-x86_64-64/lib does not exist. HDFS rpm integration project Key: HDFS-1963 URL: https://issues.apache.org/jira/browse/HDFS-1963 Project: Hadoop HDFS Issue Type: New Feature Components: build Environment: Java 6, RHEL 5.5 Reporter: Eric Yang Assignee: Eric Yang Fix For: 0.23.0 Attachments: HDFS-1963-1.patch, HDFS-1963-2.patch, HDFS-1963-3.patch, HDFS-1963-4.patch, HDFS-1963-5.patch, HDFS-1963-6.patch, HDFS-1963.patch This jira is corresponding to HADOOP-6255 and associated directory layout change. The patch for creating HDFS rpm packaging should be posted here for patch test build to verify against hdfs svn trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1934) Fix NullPointerException when certain File APIs return null
[ https://issues.apache.org/jira/browse/HDFS-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1934: - Attachment: HDFS-1934-4.patch Thanks for reviewing, Matt. Attaching the patch with this change. Fix NullPointerException when certain File APIs return null --- Key: HDFS-1934 URL: https://issues.apache.org/jira/browse/HDFS-1934 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-1934-1.patch, HDFS-1934-2.patch, HDFS-1934-3.patch, HDFS-1934-4.patch While testing Disk Fail Inplace, We encountered the NPE from this part of the code. File[] files = dir.listFiles(); for (File f : files) { ... } This is kinda of an API issue. When a disk is bad (or name is not a directory), this API (listFiles, list) return null rather than throwing an exception. This 'for loop' throws a NPE exception. And same applies to dir.list() API. Fix all the places where null condition was not checked. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1836) Thousand of CLOSE_WAIT socket
[ https://issues.apache.org/jira/browse/HDFS-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1836: - Fix Version/s: 0.20.205.0 Thousand of CLOSE_WAIT socket -- Key: HDFS-1836 URL: https://issues.apache.org/jira/browse/HDFS-1836 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.20.2 Environment: Linux 2.6.18-194.32.1.el5 #1 SMP Wed Jan 5 17:52:25 EST 2011 x86_64 x86_64 x86_64 GNU/Linux java version 1.6.0_23 Java(TM) SE Runtime Environment (build 1.6.0_23-b05) Java HotSpot(TM) 64-Bit Server VM (build 19.0-b09, mixed mode) Reporter: Dennis Cheung Assignee: Todd Lipcon Fix For: 0.20.3, 0.20.205.0 Attachments: hdfs-1836-0.20.txt, hdfs-1836-0.20.txt, patch-draft-1836.patch $ /usr/sbin/lsof -i TCP:50010 | grep -c CLOSE_WAIT 4471 It is better if everything runs normal. However, from time to time there are some DataStreamer Exception: java.net.SocketTimeoutException and DFSClient.processDatanodeError(2507) | Error Recovery for can be found from log file and the number of CLOSE_WAIT socket just keep increasing The CLOSE_WAIT handles may remain for hours and days; then Too many open file some day. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1836) Thousand of CLOSE_WAIT socket
[ https://issues.apache.org/jira/browse/HDFS-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1836: - Attachment: hdfs-1836-0.20.205.txt Attaching a patch for 0.20.205 version. I just eliminated some hunks. Thousand of CLOSE_WAIT socket -- Key: HDFS-1836 URL: https://issues.apache.org/jira/browse/HDFS-1836 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.20.2 Environment: Linux 2.6.18-194.32.1.el5 #1 SMP Wed Jan 5 17:52:25 EST 2011 x86_64 x86_64 x86_64 GNU/Linux java version 1.6.0_23 Java(TM) SE Runtime Environment (build 1.6.0_23-b05) Java HotSpot(TM) 64-Bit Server VM (build 19.0-b09, mixed mode) Reporter: Dennis Cheung Assignee: Todd Lipcon Fix For: 0.20.3, 0.20.205.0 Attachments: hdfs-1836-0.20.205.txt, hdfs-1836-0.20.txt, hdfs-1836-0.20.txt, patch-draft-1836.patch $ /usr/sbin/lsof -i TCP:50010 | grep -c CLOSE_WAIT 4471 It is better if everything runs normal. However, from time to time there are some DataStreamer Exception: java.net.SocketTimeoutException and DFSClient.processDatanodeError(2507) | Error Recovery for can be found from log file and the number of CLOSE_WAIT socket just keep increasing The CLOSE_WAIT handles may remain for hours and days; then Too many open file some day. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1934) Fix NullPointerException when certain File APIs return null
[ https://issues.apache.org/jira/browse/HDFS-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1934: - Attachment: HDFS-1934-1.patch Attaching a patch which addresses this problem. Fix NullPointerException when certain File APIs return null --- Key: HDFS-1934 URL: https://issues.apache.org/jira/browse/HDFS-1934 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-1934-1.patch While testing Disk Fail Inplace, We encountered the NPE from this part of the code. File[] files = dir.listFiles(); for (File f : files) { ... } This is kinda of an API issue. When a disk is bad (or name is not a directory), this API (listFiles, list) return null rather than throwing an exception. This 'for loop' throws a NPE exception. And same applies to dir.list() API. Fix all the places where null condition was not checked. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1934) Fix NullPointerException when certain File APIs return null
[ https://issues.apache.org/jira/browse/HDFS-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1934: - Attachment: HDFS-1934-2.patch Adding this check at another location. So the updated patch. Fix NullPointerException when certain File APIs return null --- Key: HDFS-1934 URL: https://issues.apache.org/jira/browse/HDFS-1934 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-1934-1.patch, HDFS-1934-2.patch While testing Disk Fail Inplace, We encountered the NPE from this part of the code. File[] files = dir.listFiles(); for (File f : files) { ... } This is kinda of an API issue. When a disk is bad (or name is not a directory), this API (listFiles, list) return null rather than throwing an exception. This 'for loop' throws a NPE exception. And same applies to dir.list() API. Fix all the places where null condition was not checked. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1943) fail to start datanode while start-dfs.sh is executed by root user
[ https://issues.apache.org/jira/browse/HDFS-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039320#comment-13039320 ] Bharath Mundlapudi commented on HDFS-1943: -- +1 to this patch. fail to start datanode while start-dfs.sh is executed by root user -- Key: HDFS-1943 URL: https://issues.apache.org/jira/browse/HDFS-1943 Project: Hadoop HDFS Issue Type: Bug Components: scripts Affects Versions: 0.23.0 Reporter: Wei Yongjun Priority: Blocker Fix For: 0.23.0 Attachments: HDFS-1943.patch When start-dfs.sh is run by root user, we got the following error message: # start-dfs.sh Starting namenodes on [localhost ] localhost: namenode running as process 2556. Stop it first. localhost: starting datanode, logging to /usr/hadoop/hadoop-common-0.23.0-SNAPSHOT/bin/../logs/hadoop-root-datanode-cspf01.out localhost: Unrecognized option: -jvm localhost: Could not create the Java virtual machine. The -jvm options should be passed to jsvc when we starting a secure datanode, but it still passed to java when start-dfs.sh is run by root while secure datanode is disabled. This is a bug of bin/hdfs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1934) Fix NullPointerException when certain File APIs return null
[ https://issues.apache.org/jira/browse/HDFS-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1934: - Affects Version/s: (was: 0.20.205.0) Fix Version/s: (was: 0.20.205.0) Fix NullPointerException when certain File APIs return null --- Key: HDFS-1934 URL: https://issues.apache.org/jira/browse/HDFS-1934 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 While testing Disk Fail Inplace, We encountered the NPE from this part of the code. File[] files = dir.listFiles(); for (File f : files) { ... } This is kinda of an API issue. When a disk is bad (or name is not a directory), this API (listFiles, list) return null rather than throwing an exception. This 'for loop' throws a NPE exception. And same applies to dir.list() API. Fix all the places where null condition was not checked. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated
[ https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1592: - Attachment: HDFS-1592-5.patch Thanks the review, Eli and Jitendra. I am attaching a patch which incorporates your comments. Datanode startup doesn't honor volumes.tolerated - Key: HDFS-1592 URL: https://issues.apache.org/jira/browse/HDFS-1592 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.204.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.20.204.0, 0.23.0 Attachments: HDFS-1592-1.patch, HDFS-1592-2.patch, HDFS-1592-3.patch, HDFS-1592-4.patch, HDFS-1592-5.patch, HDFS-1592-rel20.patch Datanode startup doesn't honor volumes.tolerated for hadoop 20 version. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated
[ https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1592: - Attachment: HDFS-1592-4.patch Attaching a patch with more unit tests. Datanode startup doesn't honor volumes.tolerated - Key: HDFS-1592 URL: https://issues.apache.org/jira/browse/HDFS-1592 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.204.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.20.204.0, 0.23.0 Attachments: HDFS-1592-1.patch, HDFS-1592-2.patch, HDFS-1592-3.patch, HDFS-1592-4.patch, HDFS-1592-rel20.patch Datanode startup doesn't honor volumes.tolerated for hadoop 20 version. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated
[ https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036716#comment-13036716 ] Bharath Mundlapudi commented on HDFS-1592: -- Eli, I have added more unit tests as mentioned above. Also, note that, the case you pointed is a rare condition. In our tests, making file system readonly through mount or umounting disks or even setting permission one level above, we will not hit this issue. Only, when we set the permission on this particular directory then only we will hit this issue. Anyways, i have fixed the case you pointed also. Thanks for spotting this though. Datanode startup doesn't honor volumes.tolerated - Key: HDFS-1592 URL: https://issues.apache.org/jira/browse/HDFS-1592 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.204.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.20.204.0, 0.23.0 Attachments: HDFS-1592-1.patch, HDFS-1592-2.patch, HDFS-1592-3.patch, HDFS-1592-4.patch, HDFS-1592-rel20.patch Datanode startup doesn't honor volumes.tolerated for hadoop 20 version. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated
[ https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036929#comment-13036929 ] Bharath Mundlapudi commented on HDFS-1592: -- These failing tests are not related to this patch. Eli, If you don't have any comments, we will commit this patch today. Datanode startup doesn't honor volumes.tolerated - Key: HDFS-1592 URL: https://issues.apache.org/jira/browse/HDFS-1592 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.204.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.20.204.0, 0.23.0 Attachments: HDFS-1592-1.patch, HDFS-1592-2.patch, HDFS-1592-3.patch, HDFS-1592-4.patch, HDFS-1592-rel20.patch Datanode startup doesn't honor volumes.tolerated for hadoop 20 version. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated
[ https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035533#comment-13035533 ] Bharath Mundlapudi commented on HDFS-1592: -- Eli, thanks for your review and comments. Yes, I have tested against trunk. How did you test this? Did you configure volumes tolerated correctly? The expected behavior is - if volumes failed are more than volumes tolerated, BPOfferService daemon will fail to start. Also, note that, i have filed another Jira for - if all BPService exit due to some reason, Datanode should exit. This is a bug in the current code. Please see the following four tests i have performed and their outcome on trunk. Case 1: One disk failure (/grid/2) and Vol Tolerated = 0. Outcome: BP Service should exit. 11/05/18 07:48:56 WARN common.Util: Path /grid/0/testing/hadoop-logs/dfs/data should be specified as a URI in configuration files. Please update hdfs configuration. 11/05/18 07:48:56 WARN common.Util: Path /grid/1/testing/hadoop-logs/dfs/data should be specified as a URI in configuration files. Please update hdfs configuration. 11/05/18 07:48:56 WARN common.Util: Path /grid/2/testing/hadoop-logs/dfs/data should be specified as a URI in configuration files. Please update hdfs configuration. 11/05/18 07:48:56 WARN common.Util: Path /grid/3/testing/hadoop-logs/dfs/data should be specified as a URI in configuration files. Please update hdfs configuration. 11/05/18 07:48:56 WARN datanode.DataNode: Invalid directory in: dfs.datanode.data.dir: java.io.FileNotFoundException: File file:/grid/2/testing/hadoop-logs/dfs/data does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:424) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:315) at org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:131) at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:148) at org.apache.hadoop.hdfs.server.datanode.DataNode.getDataDirsFromURIs(DataNode.java:2154) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2133) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2074) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2097) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2240) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2250) 11/05/18 07:48:56 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 11/05/18 07:48:56 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 11/05/18 07:48:56 INFO impl.MetricsSystemImpl: DataNode metrics system started 11/05/18 07:48:56 INFO impl.MetricsSystemImpl: Registered source UgiMetrics 11/05/18 07:48:56 INFO datanode.DataNode: Opened info server at 50010 11/05/18 07:48:56 INFO datanode.DataNode: Balancing bandwith is 1048576 bytes/s 11/05/18 07:48:56 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 11/05/18 07:48:56 INFO http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter) 11/05/18 07:48:56 INFO http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50075 11/05/18 07:48:56 INFO http.HttpServer: listener.getLocalPort() returned 50075 webServer.getConnectors()[0].getLocalPort() returned 50075 11/05/18 07:48:56 INFO http.HttpServer: Jetty bound to port 50075 11/05/18 07:48:56 INFO mortbay.log: jetty-6.1.14 11/05/18 07:48:56 WARN mortbay.log: Can't reuse /tmp/Jetty_0_0_0_0_50075_datanodehwtdwq, using /tmp/Jetty_0_0_0_0_50075_datanodehwtdwq_6441176730816569391 11/05/18 07:49:01 INFO mortbay.log: Started SelectChannelConnector@0.0.0.0:50075 11/05/18 07:49:01 INFO ipc.Server: Starting Socket Reader #1 for port 50020 11/05/18 07:49:01 INFO ipc.Server: Starting Socket Reader #2 for port 50020 11/05/18 07:49:01 INFO ipc.Server: Starting Socket Reader #3 for port 50020 11/05/18 07:49:01 INFO ipc.Server: Starting Socket Reader #4 for port 50020 11/05/18 07:49:01 INFO ipc.Server: Starting Socket Reader #5 for port 50020 11/05/18 07:49:01 INFO impl.MetricsSystemImpl: Registered source RpcActivityForPort50020 11/05/18 07:49:01 INFO impl.MetricsSystemImpl: Registered source RpcDetailedActivityForPort50020 11/05/18 07:49:01 INFO impl.MetricsSystemImpl: Registered source JvmMetrics 11/05/18 07:49:01 INFO impl.MetricsSystemImpl: Registered source DataNodeActivity-hadooplab40.yst.corp.yahoo.com-50010 11/05/18 07:49:01 INFO datanode.DataNode: DatanodeRegistration(hadooplab40.yst.corp.yahoo.com:50010, storageID=, infoPort=50075, ipcPort=50020, storageInfo=lv=0;cid=;nsid=0;c=0)In
[jira] [Updated] (HDFS-1905) Improve the usability of namenode -format
[ https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1905: - Attachment: HDFS-1905-2.patch Attaching a patch based on comments. Preserved the previous semantics. Improve the usability of namenode -format -- Key: HDFS-1905 URL: https://issues.apache.org/jira/browse/HDFS-1905 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Priority: Minor Fix For: 0.23.0 Attachments: HDFS-1905-1.patch, HDFS-1905-2.patch While setting up 0.23 version based cluster, i ran into this issue. When i issue a format namenode command, which got changed in 23, it should let the user know to how to use this command in case where complete options were not specified. ./hdfs namenode -format I get the following error msg, still its not clear what and how user should use this command. 11/05/09 15:36:25 ERROR namenode.NameNode: java.lang.IllegalArgumentException: Format must be provided with clusterid at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689) The usability of this command can be improved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1941) Remove -genclusterid from NameNode startup options
[ https://issues.apache.org/jira/browse/HDFS-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035980#comment-13035980 ] Bharath Mundlapudi commented on HDFS-1941: -- Failed tests are not related to this patch. Remove -genclusterid from NameNode startup options -- Key: HDFS-1941 URL: https://issues.apache.org/jira/browse/HDFS-1941 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Priority: Minor Attachments: HDFS-1941-1.patch Currently, namenode -genclusterid is a helper utility to generate unique clusterid. This option is useless once namenode -format automatically generates the clusterid. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated
[ https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1592: - Attachment: HDFS-1592-3.patch Attaching a patch which addresses Eli's comments. Datanode startup doesn't honor volumes.tolerated - Key: HDFS-1592 URL: https://issues.apache.org/jira/browse/HDFS-1592 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.204.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.20.204.0, 0.23.0 Attachments: HDFS-1592-1.patch, HDFS-1592-2.patch, HDFS-1592-3.patch, HDFS-1592-rel20.patch Datanode startup doesn't honor volumes.tolerated for hadoop 20 version. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated
[ https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035990#comment-13035990 ] Bharath Mundlapudi commented on HDFS-1592: -- First, Thank you for identifying this issue, Eli. Great job! Couple of comments, 1. We did test couple of things like masking permissions still dfs level. That didn't catch this issue. You pointed in making specific directory permissions helped us to reproduce this case. Thanks again. 2. We tested by unmounting disks also. 3. Then we tested with injecting failures at kernel level. Regarding testcases, I agree with you that we need more tests, But I think, we should do that in another jira. Since, we have already spent lot of effort in manual testing. Can we file another Jira to track this? With this new patch, i have tested following new cases. Can you please review and provide your feedback? case 1: All four good volumes, Vol Tolerated=1, expected outcome = BPservice should start 11/05/19 04:57:51 INFO datanode.DataNode: FSDataset added volume - /grid/0/testing/hadoop-logs/dfs/data/current 11/05/19 04:57:51 INFO datanode.DataNode: FSDataset added volume - /grid/1/testing/hadoop-logs/dfs/data/current 11/05/19 04:57:51 INFO datanode.DataNode: FSDataset added volume - /grid/2/testing/hadoop-logs/dfs/data/current 11/05/19 04:57:51 INFO datanode.DataNode: FSDataset added volume - /grid/3/testing/hadoop-logs/dfs/data/current 11/05/19 04:57:51 INFO datanode.DataNode: Registered FSDatasetState MBean 11/05/19 04:57:51 INFO datanode.DataNode: Adding block pool BP-1694914230-10.72.86.55-1305704227822 11/05/19 04:57:51 INFO datanode.DirectoryScanner: Periodic Directory Tree Verification scan starting at 1305782678947 with interval 2160 11/05/19 04:57:51 INFO datanode.DataNode: in register: sid=DS-340618566-10.72.86.55-50010-1305704313207;SI=lv=-35;cid=test;nsid=413952175;c=0 11/05/19 04:57:51 INFO datanode.DataNode: bpReg after =lv=-35;cid=test;nsid=413952175;c=0;sid=DS-340618566-10.72.86.55-50010-1305704313207;name=127.0.0.1:50010 11/05/19 04:57:51 INFO datanode.DataNode: in register:;bpDNR=lv=-35;cid=test;nsid=413952175;c=0 11/05/19 04:57:51 INFO datanode.DataNode: For namenode localhost/127.0.0.1:8020 using BLOCKREPORT_INTERVAL of 2160msec Initial delay: 0msec; heartBeatInterval=3000 11/05/19 04:57:51 INFO datanode.DataNode: BlockReport of 0 blocks got processed in 3 msecs 11/05/19 04:57:51 INFO datanode.DataNode: sent block report, processed command:org.apache.hadoop.hdfs.server.protocol.DatanodeCommand$Finalize@3e5a91 11/05/19 04:57:51 INFO datanode.BlockPoolSliceScanner: Periodic Block Verification scan initialized with interval 181440. 11/05/19 04:57:51 INFO datanode.DataBlockScanner: Added bpid=BP-1694914230-10.72.86.55-1305704227822 to blockPoolScannerMap, new size=1 11/05/19 04:57:56 INFO datanode.BlockPoolSliceScanner: Starting a new period : work left in prev period : 0.00% case 2: One failed volume(/grid/2), three good volumes, Vol Tolerated=1, expected outcome = BPService should start 11/05/19 05:01:27 INFO common.Storage: Storage directory /grid/2/testing/hadoop-logs/dfs/data is not formatted. 11/05/19 05:01:27 INFO common.Storage: Formatting ... 11/05/19 05:01:27 WARN common.Storage: Invalid directory in: /grid/2/testing/hadoop-logs/dfs/data/current/BP-1694914230-10.72.86.55-1305704227822: File file:/grid/2/testing/hadoop-logs/dfs/data/current/BP-1694914230-10.72.86.55-1305704227822 does not exist. 11/05/19 05:01:27 INFO common.Storage: Locking is disabled 11/05/19 05:01:27 INFO common.Storage: Locking is disabled 11/05/19 05:01:27 INFO common.Storage: Storage directory /grid/2/testing/hadoop-logs/dfs/data/current/BP-1694914230-10.72.86.55-1305704227822 does not exist. 11/05/19 05:01:27 INFO common.Storage: Storage directory /grid/2/testing/hadoop-logs/dfs/data/current/BP-1694914230-10.72.86.55-1305704227822 does not exist. 11/05/19 05:01:27 INFO common.Storage: Locking is disabled 11/05/19 05:01:27 INFO datanode.DataNode: setting up storage: nsid=0;bpid=BP-1694914230-10.72.86.55-1305704227822;lv=-35;nsInfo=lv=-35;cid=test;nsid=413952175;c=0;bpid=BP-1694914230-10.72.86.55-1305704227822 11/05/19 05:01:27 INFO datanode.DataNode: FSDataset added volume - /grid/0/testing/hadoop-logs/dfs/data/current 11/05/19 05:01:27 INFO datanode.DataNode: FSDataset added volume - /grid/1/testing/hadoop-logs/dfs/data/current 11/05/19 05:01:27 INFO datanode.DataNode: FSDataset added volume - /grid/3/testing/hadoop-logs/dfs/data/current 11/05/19 05:01:27 INFO datanode.DataNode: Registered FSDatasetState MBean 11/05/19 05:01:27 INFO datanode.DataNode: Adding block pool BP-1694914230-10.72.86.55-1305704227822 11/05/19 05:01:27 INFO datanode.DirectoryScanner: Periodic Directory Tree Verification scan starting at 1305789604425 with interval 2160 11/05/19 05:01:27 INFO datanode.DataNode: in register:
[jira] [Commented] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated
[ https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033885#comment-13033885 ] Bharath Mundlapudi commented on HDFS-1592: -- Yes, what you mentioned w.r.t usecases are right. * A DN will successfully start with a failed volume as long as it's configured to tolerate a failed volume * A DN will fail to start if more than the number of tolerated volumes are failed This is the expected behavior with this patch. I had some difficulty in failing the disks through the unit tests. If we set the directory permissions to not writable, then once we run datanode, it will reset the directory permissions and test will always succeed. These tests were done outside of unit tests through umount -l etc. All the above mentioned cases were manually tested. Datanode startup doesn't honor volumes.tolerated - Key: HDFS-1592 URL: https://issues.apache.org/jira/browse/HDFS-1592 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.204.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.20.204.0, 0.23.0 Attachments: HDFS-1592-1.patch, HDFS-1592-2.patch, HDFS-1592-rel20.patch Datanode startup doesn't honor volumes.tolerated for hadoop 20 version. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-1942) If all Block Pool service threads exit then datanode should exit.
If all Block Pool service threads exit then datanode should exit. - Key: HDFS-1942 URL: https://issues.apache.org/jira/browse/HDFS-1942 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Currently, if all block pool service threads exit, Datanode continue to run. This should be fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1941) Remove -genclusterid from NameNode startup options
[ https://issues.apache.org/jira/browse/HDFS-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1941: - Attachment: HDFS-1941-1.patch Attaching the patch which address this jira. Remove -genclusterid from NameNode startup options -- Key: HDFS-1941 URL: https://issues.apache.org/jira/browse/HDFS-1941 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Priority: Minor Attachments: HDFS-1941-1.patch Currently, namenode -genclusterid is a helper utility to generate unique clusterid. This option is useless once namenode -format automatically generates the clusterid. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1905) Improve the usability of namenode -format
[ https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034301#comment-13034301 ] Bharath Mundlapudi commented on HDFS-1905: -- Thanks for the review, Suresh. My comments: 1 3. From a high level, format returns boolean. Semantically, if this operation was successful, we should return true else we should return false. The previous code was bit not right. If format was successful, it was returning false. Even if user opts to not format, format operation as such was not successful, so we should return false. So, i changed this part also. Let me know, if these assumptions are not correct, i will back to previous semantics. I will fix the doc and tests. Sorry, i missed this part. Regarding upgrade, do you want me to do in another Jira? Since, this was just filed for format usability. Improve the usability of namenode -format -- Key: HDFS-1905 URL: https://issues.apache.org/jira/browse/HDFS-1905 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Priority: Minor Fix For: 0.23.0 Attachments: HDFS-1905-1.patch While setting up 0.23 version based cluster, i ran into this issue. When i issue a format namenode command, which got changed in 23, it should let the user know to how to use this command in case where complete options were not specified. ./hdfs namenode -format I get the following error msg, still its not clear what and how user should use this command. 11/05/09 15:36:25 ERROR namenode.NameNode: java.lang.IllegalArgumentException: Format must be provided with clusterid at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689) The usability of this command can be improved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1905) Improve the usability of namenode -format
[ https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034320#comment-13034320 ] Bharath Mundlapudi commented on HDFS-1905: -- For Comment 2: Lets say that, i want to format a namenode which is part of a particular cluster. Reusing the cluster id is useful here - I just want to format this namenode and also want it to be part of same cluster. Improve the usability of namenode -format -- Key: HDFS-1905 URL: https://issues.apache.org/jira/browse/HDFS-1905 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Priority: Minor Fix For: 0.23.0 Attachments: HDFS-1905-1.patch While setting up 0.23 version based cluster, i ran into this issue. When i issue a format namenode command, which got changed in 23, it should let the user know to how to use this command in case where complete options were not specified. ./hdfs namenode -format I get the following error msg, still its not clear what and how user should use this command. 11/05/09 15:36:25 ERROR namenode.NameNode: java.lang.IllegalArgumentException: Format must be provided with clusterid at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689) The usability of this command can be improved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1905) Improve the usability of namenode -format
[ https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1905: - Attachment: HDFS-1905-1.patch Improve the usability of namenode -format -- Key: HDFS-1905 URL: https://issues.apache.org/jira/browse/HDFS-1905 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Priority: Minor Fix For: 0.23.0 Attachments: HDFS-1905-1.patch While setting up 0.23 version based cluster, i ran into this issue. When i issue a format namenode command, which got changed in 23, it should let the user know to how to use this command in case where complete options were not specified. ./hdfs namenode -format I get the following error msg, still its not clear what and how user should use this command. 11/05/09 15:36:25 ERROR namenode.NameNode: java.lang.IllegalArgumentException: Format must be provided with clusterid at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689) The usability of this command can be improved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1905) Improve the usability of namenode -format
[ https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033655#comment-13033655 ] Bharath Mundlapudi commented on HDFS-1905: -- Attached the patch which addressing the following: 1. clusterid will be automatically generated if user doesn't provide one. 2. Admins can specify clusterid with -clusterid option. Improve the usability of namenode -format -- Key: HDFS-1905 URL: https://issues.apache.org/jira/browse/HDFS-1905 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Priority: Minor Fix For: 0.23.0 Attachments: HDFS-1905-1.patch While setting up 0.23 version based cluster, i ran into this issue. When i issue a format namenode command, which got changed in 23, it should let the user know to how to use this command in case where complete options were not specified. ./hdfs namenode -format I get the following error msg, still its not clear what and how user should use this command. 11/05/09 15:36:25 ERROR namenode.NameNode: java.lang.IllegalArgumentException: Format must be provided with clusterid at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689) The usability of this command can be improved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1692) In secure mode, Datanode process doesn't exit when disks fail.
[ https://issues.apache.org/jira/browse/HDFS-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033868#comment-13033868 ] Bharath Mundlapudi commented on HDFS-1692: -- Yes, I will be porting this one to trunk. We run our clusters in secured mode. When a volume tolerated threshold is reached, shutdown is called but datanode continue to run and doesn't exit. This change will address only in secured mode and non secure mode shouldn't have this problem. In secure mode, Datanode process doesn't exit when disks fail. -- Key: HDFS-1692 URL: https://issues.apache.org/jira/browse/HDFS-1692 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.20.204.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.20.204.0 Attachments: HDFS-1692-1.patch In secure mode, when disks fail more than volumes tolerated, datanode process doesn't exit properly and it just hangs even though shutdown method is called. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1692) In secure mode, Datanode process doesn't exit when disks fail.
[ https://issues.apache.org/jira/browse/HDFS-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033880#comment-13033880 ] Bharath Mundlapudi commented on HDFS-1692: -- As i was tracking this hang issue, i have cleaned up some threads which were not exiting. So the change to ipc/Server.java. But yes we can move this particular code to other Jira. For 0.23, we can do separately. In secure mode, Datanode process doesn't exit when disks fail. -- Key: HDFS-1692 URL: https://issues.apache.org/jira/browse/HDFS-1692 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.20.204.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.20.204.0 Attachments: HDFS-1692-1.patch In secure mode, when disks fail more than volumes tolerated, datanode process doesn't exit properly and it just hangs even though shutdown method is called. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-1941) Remove -genclusterid from NameNode startup options
Remove -genclusterid from NameNode startup options -- Key: HDFS-1941 URL: https://issues.apache.org/jira/browse/HDFS-1941 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Priority: Minor Currently, namenode -genclusterid is a helper utility to generate unique clusterid. This option is useless once namenode -format automatically generates the clusterid. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated
[ https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033343#comment-13033343 ] Bharath Mundlapudi commented on HDFS-1592: -- These failing tests are not related to this patch. Datanode startup doesn't honor volumes.tolerated - Key: HDFS-1592 URL: https://issues.apache.org/jira/browse/HDFS-1592 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.204.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.20.204.0, 0.23.0 Attachments: HDFS-1592-1.patch, HDFS-1592-2.patch, HDFS-1592-rel20.patch Datanode startup doesn't honor volumes.tolerated for hadoop 20 version. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-1940) Datanode can have more than one copy of same block when a failed disk is coming back in datanode
[ https://issues.apache.org/jira/browse/HDFS-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi reassigned HDFS-1940: Assignee: Bharath Mundlapudi Datanode can have more than one copy of same block when a failed disk is coming back in datanode Key: HDFS-1940 URL: https://issues.apache.org/jira/browse/HDFS-1940 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.20.204.0 Reporter: Rajit Assignee: Bharath Mundlapudi There is a situation where one datanode can have more than one copy of same block due to a disk fails and comes back after sometime in a datanode. And these duplicate blocks are not getting deleted even after datanode and namenode restart. This situation can only happen in a corner case , when due to disk failure, the data block is replicated to other disk of the same datanode. To simulate this scenario I copied a datablock and the associated .meta file from one disk to another disk of same datanode, so the datanode is having 2 copy of same replica. Now I restarted datanode and namenode. Still the extra data block and meta file is not deleted from the datanode [hdfs@gsbl90192 rajsaha]$ ls -l `find /grid/{0,1,2,3}/hadoop/var/hdfs/data/current -name blk_*` -rw-r--r-- 1 hdfs users 7814 May 13 21:05 /grid/1/hadoop/var/hdfs/data/current/blk_1727421609840461376 -rw-r--r-- 1 hdfs users 71 May 13 21:05 /grid/1/hadoop/var/hdfs/data/current/blk_1727421609840461376_579992.meta -rw-r--r-- 1 hdfs users 7814 May 13 21:14 /grid/3/hadoop/var/hdfs/data/current/blk_1727421609840461376 -rw-r--r-- 1 hdfs users 71 May 13 21:14 /grid/3/hadoop/var/hdfs/data/current/blk_1727421609840461376_579992.meta -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1940) Datanode can have more than one copy of same block when a failed disk is coming back in datanode
[ https://issues.apache.org/jira/browse/HDFS-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1940: - Description: There is a situation where one datanode can have more than one copy of same block due to a disk fails and comes back after sometime in a datanode. And these duplicate blocks are not getting deleted even after datanode and namenode restart. This situation can only happen in a corner case , when due to disk failure, the data block is replicated to other disk of the same datanode. To simulate this scenario I copied a datablock and the associated .meta file from one disk to another disk of same datanode, so the datanode is having 2 copy of same replica. Now I restarted datanode and namenode. Still the extra data block and meta file is not deleted from the datanode ls -l `find /grid/{0,1,2,3}/hadoop/var/hdfs/data/current -name blk_*` -rw-r--r-- 1 hdfs users 7814 May 13 21:05 /grid/1/hadoop/var/hdfs/data/current/blk_1727421609840461376 -rw-r--r-- 1 hdfs users 71 May 13 21:05 /grid/1/hadoop/var/hdfs/data/current/blk_1727421609840461376_579992.meta -rw-r--r-- 1 hdfs users 7814 May 13 21:14 /grid/3/hadoop/var/hdfs/data/current/blk_1727421609840461376 -rw-r--r-- 1 hdfs users 71 May 13 21:14 /grid/3/hadoop/var/hdfs/data/current/blk_1727421609840461376_579992.meta was: There is a situation where one datanode can have more than one copy of same block due to a disk fails and comes back after sometime in a datanode. And these duplicate blocks are not getting deleted even after datanode and namenode restart. This situation can only happen in a corner case , when due to disk failure, the data block is replicated to other disk of the same datanode. To simulate this scenario I copied a datablock and the associated .meta file from one disk to another disk of same datanode, so the datanode is having 2 copy of same replica. Now I restarted datanode and namenode. Still the extra data block and meta file is not deleted from the datanode [hdfs@gsbl90192 rajsaha]$ ls -l `find /grid/{0,1,2,3}/hadoop/var/hdfs/data/current -name blk_*` -rw-r--r-- 1 hdfs users 7814 May 13 21:05 /grid/1/hadoop/var/hdfs/data/current/blk_1727421609840461376 -rw-r--r-- 1 hdfs users 71 May 13 21:05 /grid/1/hadoop/var/hdfs/data/current/blk_1727421609840461376_579992.meta -rw-r--r-- 1 hdfs users 7814 May 13 21:14 /grid/3/hadoop/var/hdfs/data/current/blk_1727421609840461376 -rw-r--r-- 1 hdfs users 71 May 13 21:14 /grid/3/hadoop/var/hdfs/data/current/blk_1727421609840461376_579992.meta Datanode can have more than one copy of same block when a failed disk is coming back in datanode Key: HDFS-1940 URL: https://issues.apache.org/jira/browse/HDFS-1940 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.20.204.0 Reporter: Rajit Assignee: Bharath Mundlapudi There is a situation where one datanode can have more than one copy of same block due to a disk fails and comes back after sometime in a datanode. And these duplicate blocks are not getting deleted even after datanode and namenode restart. This situation can only happen in a corner case , when due to disk failure, the data block is replicated to other disk of the same datanode. To simulate this scenario I copied a datablock and the associated .meta file from one disk to another disk of same datanode, so the datanode is having 2 copy of same replica. Now I restarted datanode and namenode. Still the extra data block and meta file is not deleted from the datanode ls -l `find /grid/{0,1,2,3}/hadoop/var/hdfs/data/current -name blk_*` -rw-r--r-- 1 hdfs users 7814 May 13 21:05 /grid/1/hadoop/var/hdfs/data/current/blk_1727421609840461376 -rw-r--r-- 1 hdfs users 71 May 13 21:05 /grid/1/hadoop/var/hdfs/data/current/blk_1727421609840461376_579992.meta -rw-r--r-- 1 hdfs users 7814 May 13 21:14 /grid/3/hadoop/var/hdfs/data/current/blk_1727421609840461376 -rw-r--r-- 1 hdfs users 71 May 13 21:14 /grid/3/hadoop/var/hdfs/data/current/blk_1727421609840461376_579992.meta -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1836) Thousand of CLOSE_WAIT socket
[ https://issues.apache.org/jira/browse/HDFS-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033466#comment-13033466 ] Bharath Mundlapudi commented on HDFS-1836: -- Thats correct. This code is part of trunk already. Todd, One minor comment. 1. Can we also pass LOG object to this method? Users who wants to debug can enable debug option. IOUtils.cleanup(LOG, blockStream, blockReplyStream); Otherwise, patch looks good. Thank you. Thousand of CLOSE_WAIT socket -- Key: HDFS-1836 URL: https://issues.apache.org/jira/browse/HDFS-1836 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.20.2 Environment: Linux 2.6.18-194.32.1.el5 #1 SMP Wed Jan 5 17:52:25 EST 2011 x86_64 x86_64 x86_64 GNU/Linux java version 1.6.0_23 Java(TM) SE Runtime Environment (build 1.6.0_23-b05) Java HotSpot(TM) 64-Bit Server VM (build 19.0-b09, mixed mode) Reporter: Dennis Cheung Attachments: hdfs-1836-0.20.txt, patch-draft-1836.patch $ /usr/sbin/lsof -i TCP:50010 | grep -c CLOSE_WAIT 4471 It is better if everything runs normal. However, from time to time there are some DataStreamer Exception: java.net.SocketTimeoutException and DFSClient.processDatanodeError(2507) | Error Recovery for can be found from log file and the number of CLOSE_WAIT socket just keep increasing The CLOSE_WAIT handles may remain for hours and days; then Too many open file some day. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1905) Improve the usability of namenode -format
[ https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032268#comment-13032268 ] Bharath Mundlapudi commented on HDFS-1905: -- Cluster ID is displayed on the dfshealth web page. If we are having multiple clusters then having a proper cluster name defined by admins will be useful. If user executes the following command, then correct usage is indeed displayed. ./hdfs namenode -format -help This should be corrected from all the paths. Improve the usability of namenode -format -- Key: HDFS-1905 URL: https://issues.apache.org/jira/browse/HDFS-1905 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Priority: Minor Fix For: 0.23.0 While setting up 0.23 version based cluster, i ran into this issue. When i issue a format namenode command, which got changed in 23, it should let the user know to how to use this command in case where complete options were not specified. ./hdfs namenode -format I get the following error msg, still its not clear what and how user should use this command. 11/05/09 15:36:25 ERROR namenode.NameNode: java.lang.IllegalArgumentException: Format must be provided with clusterid at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689) The usability of this command can be improved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated
[ https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032701#comment-13032701 ] Bharath Mundlapudi commented on HDFS-1592: -- Thanks for the review, Jitendra. 1. The conditions are there for better readability. Yes, we can change this into one condition. 2. Error is logged where the exception is caught. Datanode startup doesn't honor volumes.tolerated - Key: HDFS-1592 URL: https://issues.apache.org/jira/browse/HDFS-1592 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.204.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.20.204.0, 0.23.0 Attachments: HDFS-1592-1.patch, HDFS-1592-rel20.patch Datanode startup doesn't honor volumes.tolerated for hadoop 20 version. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-1934) Fix NullPointerException when certain File APIs return null
Fix NullPointerException when certain File APIs return null --- Key: HDFS-1934 URL: https://issues.apache.org/jira/browse/HDFS-1934 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.205.0, 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.20.205.0, 0.23.0 While testing Disk Fail Inplace, We encountered the NPE from this part of the code. File[] files = dir.listFiles(); for (File f : files) { ... } This is kinda of an API issue. When a disk is bad (or name is not a directory), this API (listFiles, list) return null rather than throwing an exception. This 'for loop' throws a NPE exception. And same applies to dir.list() API. Fix all the places where null condition was not checked. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated
[ https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1592: - Attachment: HDFS-1592-1.patch Attaching the patch for 0.23 version. Datanode startup doesn't honor volumes.tolerated - Key: HDFS-1592 URL: https://issues.apache.org/jira/browse/HDFS-1592 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.204.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.20.204.0, 0.23.0 Attachments: HDFS-1592-1.patch, HDFS-1592-rel20.patch Datanode startup doesn't honor volumes.tolerated for hadoop 20 version. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-1905) Improve the usability of namenode -format
Improve the usability of namenode -format -- Key: HDFS-1905 URL: https://issues.apache.org/jira/browse/HDFS-1905 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Priority: Minor Fix For: 0.23.0 While setting up 0.23 version based cluster, i ran into this issue. When i issue a format namenode command, which got changed in 23, it should let the user know to ./hdfs namenode -format I get the following error msg, still its not clear what and how user should use this command. 11/05/09 15:36:25 ERROR namenode.NameNode: java.lang.IllegalArgumentException: Format must be provided with clusterid at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689) The usability of this command can be improved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1905) Improve the usability of namenode -format
[ https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1905: - Description: While setting up 0.23 version based cluster, i ran into this issue. When i issue a format namenode command, which got changed in 23, it should let the user know to how to use this command in case where complete options were not specified. ./hdfs namenode -format I get the following error msg, still its not clear what and how user should use this command. 11/05/09 15:36:25 ERROR namenode.NameNode: java.lang.IllegalArgumentException: Format must be provided with clusterid at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689) The usability of this command can be improved. was: While setting up 0.23 version based cluster, i ran into this issue. When i issue a format namenode command, which got changed in 23, it should let the user know to ./hdfs namenode -format I get the following error msg, still its not clear what and how user should use this command. 11/05/09 15:36:25 ERROR namenode.NameNode: java.lang.IllegalArgumentException: Format must be provided with clusterid at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689) The usability of this command can be improved. Improve the usability of namenode -format -- Key: HDFS-1905 URL: https://issues.apache.org/jira/browse/HDFS-1905 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Priority: Minor Fix For: 0.23.0 While setting up 0.23 version based cluster, i ran into this issue. When i issue a format namenode command, which got changed in 23, it should let the user know to how to use this command in case where complete options were not specified. ./hdfs namenode -format I get the following error msg, still its not clear what and how user should use this command. 11/05/09 15:36:25 ERROR namenode.NameNode: java.lang.IllegalArgumentException: Format must be provided with clusterid at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689) The usability of this command can be improved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-664) Add a way to efficiently replace a disk in a live datanode
[ https://issues.apache.org/jira/browse/HDFS-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029786#comment-13029786 ] Bharath Mundlapudi commented on HDFS-664: - Is this Jira similar to this: https://issues.apache.org/jira/browse/HDFS-1362 Add a way to efficiently replace a disk in a live datanode -- Key: HDFS-664 URL: https://issues.apache.org/jira/browse/HDFS-664 Project: Hadoop HDFS Issue Type: New Feature Components: data-node Affects Versions: 0.22.0 Reporter: Steve Loughran Attachments: HDFS-664.0-20-3-rc2.patch.1, HDFS-664.patch In clusters where the datanode disks are hot swappable, you need to be able to swap out a disk on a live datanode without taking down the datanode. You don't want to decommission the whole node as that is overkill. on a system with 4 1TB HDDs, giving 3 TB of datanode storage, a decommissioning and restart will consume up to 6 TB of bandwidth. If a single disk were swapped in then there would only be 1TB of data to recover over the network. More importantly, if that data could be moved to free space on the same machine, the recommissioning could take place at disk rates, not network speeds. # Maybe have a way of decommissioning a single disk on the DN; the files could be moved to space on the other disks or the other machines in the rack. # There may not be time to use that option, in which case pulling out the disk would be done with no warning, a new disk inserted. # The DN needs to see that a disk has been replaced (or react to some ops request telling it this), and start using the new disk again -pushing back data, rebuilding the balance. To complicate the process, assume there is a live TT on the system, running jobs against the data. The TT would probably need to be paused while the work takes place, any ongoing work handled somehow. Halting the TT and then restarting it after the replacement disk went in is probably simplest. The more disks you add to a node, the more this scenario becomes a need. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira