[jira] Commented: (HDFS-1024) SecondaryNamenode fails to checkpoint because namenode fails with CancelledKeyException
[ https://issues.apache.org/jira/browse/HDFS-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852246#action_12852246 ] dhruba borthakur commented on HDFS-1024: I am going to commit it to 0.20 branch now. > SecondaryNamenode fails to checkpoint because namenode fails with > CancelledKeyException > --- > > Key: HDFS-1024 > URL: https://issues.apache.org/jira/browse/HDFS-1024 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.20.1, 0.20.2, 0.20.3, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Dmytro Molkov >Priority: Blocker > Fix For: 0.22.0 > > Attachments: HDFS-1024.patch, HDFS-1024.patch.1, > HDFS-1024.patch.1-0.20.txt > > > The secondary namenode fails to retrieve the entire fsimage from the > Namenode. It fetches a part of the fsimage but believes that it has fetched > the entire fsimage file and proceeds ahead with the checkpointing. Stack > traces will be attached below. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1024) SecondaryNamenode fails to checkpoint because namenode fails with CancelledKeyException
[ https://issues.apache.org/jira/browse/HDFS-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852245#action_12852245 ] Todd Lipcon commented on HDFS-1024: --- Not sure if we need a vote, but I do think we should put it in the branch. > SecondaryNamenode fails to checkpoint because namenode fails with > CancelledKeyException > --- > > Key: HDFS-1024 > URL: https://issues.apache.org/jira/browse/HDFS-1024 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.20.1, 0.20.2, 0.20.3, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Dmytro Molkov >Priority: Blocker > Fix For: 0.22.0 > > Attachments: HDFS-1024.patch, HDFS-1024.patch.1, > HDFS-1024.patch.1-0.20.txt > > > The secondary namenode fails to retrieve the entire fsimage from the > Namenode. It fetches a part of the fsimage but believes that it has fetched > the entire fsimage file and proceeds ahead with the checkpointing. Stack > traces will be attached below. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads
[ https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Booth updated HDFS-918: --- Attachment: hdfs-918-TRUNK.patch Trunk patch with previous fixes. > Use single Selector and small thread pool to replace many instances of > BlockSender for reads > > > Key: HDFS-918 > URL: https://issues.apache.org/jira/browse/HDFS-918 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Reporter: Jay Booth > Fix For: 0.22.0 > > Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, > hdfs-918-20100211.patch, hdfs-918-20100228.patch, hdfs-918-20100309.patch, > hdfs-918-branch20.2.patch, hdfs-918-TRUNK.patch, hdfs-multiplex.patch > > > Currently, on read requests, the DataXCeiver server allocates a new thread > per request, which must allocate its own buffers and leads to > higher-than-optimal CPU and memory usage by the sending threads. If we had a > single selector and a small threadpool to multiplex request packets, we could > theoretically achieve higher performance while taking up fewer resources and > leaving more CPU on datanodes available for mapred, hbase or whatever. This > can be done without changing any wire protocols. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1024) SecondaryNamenode fails to checkpoint because namenode fails with CancelledKeyException
[ https://issues.apache.org/jira/browse/HDFS-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852238#action_12852238 ] stack commented on HDFS-1024: - All tests pass but the expected (?) fail of: "org.apache.hadoop.streaming.TestStreamingExitStatus" I'd like to commit to 0.20 branch. Do we have to run a vote or is evidence of corruption of fsimage enough of a reason? > SecondaryNamenode fails to checkpoint because namenode fails with > CancelledKeyException > --- > > Key: HDFS-1024 > URL: https://issues.apache.org/jira/browse/HDFS-1024 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.20.1, 0.20.2, 0.20.3, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Dmytro Molkov >Priority: Blocker > Fix For: 0.22.0 > > Attachments: HDFS-1024.patch, HDFS-1024.patch.1, > HDFS-1024.patch.1-0.20.txt > > > The secondary namenode fails to retrieve the entire fsimage from the > Namenode. It fetches a part of the fsimage but believes that it has fetched > the entire fsimage file and proceeds ahead with the checkpointing. Stack > traces will be attached below. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads
[ https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Booth updated HDFS-918: --- Attachment: (was: hdfs-918-branch20.2.patch) > Use single Selector and small thread pool to replace many instances of > BlockSender for reads > > > Key: HDFS-918 > URL: https://issues.apache.org/jira/browse/HDFS-918 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Reporter: Jay Booth > Fix For: 0.22.0 > > Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, > hdfs-918-20100211.patch, hdfs-918-20100228.patch, hdfs-918-20100309.patch, > hdfs-918-branch20.2.patch, hdfs-multiplex.patch > > > Currently, on read requests, the DataXCeiver server allocates a new thread > per request, which must allocate its own buffers and leads to > higher-than-optimal CPU and memory usage by the sending threads. If we had a > single selector and a small threadpool to multiplex request packets, we could > theoretically achieve higher performance while taking up fewer resources and > leaving more CPU on datanodes available for mapred, hbase or whatever. This > can be done without changing any wire protocols. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads
[ https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Booth updated HDFS-918: --- Attachment: hdfs-918-branch20.2.patch Straightened out the block not found thing with Andrew, that was on his end, but then he found a resource leak that's fixed here -- I'll post a trunk patch which incorporates this fix and the previous fix shortly. > Use single Selector and small thread pool to replace many instances of > BlockSender for reads > > > Key: HDFS-918 > URL: https://issues.apache.org/jira/browse/HDFS-918 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Reporter: Jay Booth > Fix For: 0.22.0 > > Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, > hdfs-918-20100211.patch, hdfs-918-20100228.patch, hdfs-918-20100309.patch, > hdfs-918-branch20.2.patch, hdfs-multiplex.patch > > > Currently, on read requests, the DataXCeiver server allocates a new thread > per request, which must allocate its own buffers and leads to > higher-than-optimal CPU and memory usage by the sending threads. If we had a > single selector and a small threadpool to multiplex request packets, we could > theoretically achieve higher performance while taking up fewer resources and > leaving more CPU on datanodes available for mapred, hbase or whatever. This > can be done without changing any wire protocols. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-185) Chown , chgrp , chmod operations allowed when namenode is in safemode .
[ https://issues.apache.org/jira/browse/HDFS-185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Phulari updated HDFS-185: -- Attachment: SafeMode-Y20.100.patch Attaching patch for Yahoo! Hadoop 20.100 branch. > Chown , chgrp , chmod operations allowed when namenode is in safemode . > --- > > Key: HDFS-185 > URL: https://issues.apache.org/jira/browse/HDFS-185 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ravi Phulari >Assignee: Ravi Phulari > Fix For: 0.20.2 > > Attachments: HADOOP-5942v2.patch, HADOOPv20-5942.patch, > HDFS-185-1.patch, HDFS-5942.patch, SafeMode-Y20.100.patch > > > Chown , chgrp , chmod operations allowed when namenode is in safemode . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1072) TestReadWhileWriting may fail
[ https://issues.apache.org/jira/browse/HDFS-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852169#action_12852169 ] Tsz Wo (Nicholas), SZE commented on HDFS-1072: -- > ... Namenode should never throw AlreadyBeingCreatedException with > HDFS_NameNode as the lease holder. - If the lease is being recovered, then RecoveryInProgressException should be thrown. - If the file is being created, then it should throw AlreadyBeingCreatedException with DFSClient as the lease holder. > TestReadWhileWriting may fail > - > > Key: HDFS-1072 > URL: https://issues.apache.org/jira/browse/HDFS-1072 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client, name-node >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Erik Steffl > > If the lease recovery is taking a long time, TestReadWhileWriting may fail by > AlreadyBeingCreatedException. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1072) TestReadWhileWriting may fail
[ https://issues.apache.org/jira/browse/HDFS-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-1072: - Component/s: (was: test) name-node hdfs client I was thinking that this is just a test problem. However, it seems not the case since Namenode should never throw AlreadyBeingCreatedException with HDFS_NameNode as the lease holder. > TestReadWhileWriting may fail > - > > Key: HDFS-1072 > URL: https://issues.apache.org/jira/browse/HDFS-1072 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client, name-node >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Erik Steffl > > If the lease recovery is taking a long time, TestReadWhileWriting may fail by > AlreadyBeingCreatedException. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HDFS-1072) TestReadWhileWriting may fail
[ https://issues.apache.org/jira/browse/HDFS-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE reassigned HDFS-1072: Assignee: Erik Steffl (was: Tsz Wo (Nicholas), SZE) > TestReadWhileWriting may fail > - > > Key: HDFS-1072 > URL: https://issues.apache.org/jira/browse/HDFS-1072 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Erik Steffl > > If the lease recovery is taking a long time, TestReadWhileWriting may fail by > AlreadyBeingCreatedException. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1072) TestReadWhileWriting may fail
[ https://issues.apache.org/jira/browse/HDFS-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852130#action_12852130 ] Tsz Wo (Nicholas), SZE commented on HDFS-1072: -- Here is the stack trace. {noformat} org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /TestReadWhileWriting/file1 for DFSClient_-1436751315 on client 127.0.0.1, because this file is already being created by HDFS_NameNode on 127.0.0.1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1210) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:1305) at org.apache.hadoop.hdfs.server.namenode.NameNode.append(NameNode.java:636) at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:342) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1253) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1249) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:706) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1247) at org.apache.hadoop.ipc.Client.call(Client.java:895) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198) at $Proxy6.append(Unknown Source) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy6.append(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:701) at org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:212) at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:793) at org.apache.hadoop.hdfs.TestReadWhileWriting.append(TestReadWhileWriting.java:111) at org.apache.hadoop.hdfs.TestReadWhileWriting.pipeline_02_03(TestReadWhileWriting.java:95) {noformat} > TestReadWhileWriting may fail > - > > Key: HDFS-1072 > URL: https://issues.apache.org/jira/browse/HDFS-1072 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > > If the lease recovery is taking a long time, TestReadWhileWriting may fail by > AlreadyBeingCreatedException. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1072) TestReadWhileWriting may fail
TestReadWhileWriting may fail - Key: HDFS-1072 URL: https://issues.apache.org/jira/browse/HDFS-1072 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE If the lease recovery is taking a long time, TestReadWhileWriting may fail by AlreadyBeingCreatedException. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HDFS-291) combine FsShell.copyToLocal to ChecksumFileSystem.copyToLocalFile
[ https://issues.apache.org/jira/browse/HDFS-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE resolved HDFS-291. - Resolution: Won't Fix Closing this minor issue as "won't fix". > combine FsShell.copyToLocal to ChecksumFileSystem.copyToLocalFile > - > > Key: HDFS-291 > URL: https://issues.apache.org/jira/browse/HDFS-291 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Tsz Wo (Nicholas), SZE >Priority: Minor > > - Two methods provide similar functions > - ChecksumFileSystem.copyToLocalFile(Path src, Path dst, boolean copyCrc) is > no longer used anywhere in the system > - It is better to use ChecksumFileSystem.getRawFileSystem() for copying crc > in FsShell.copyToLocal > - FileSystem.isDirectory(Path) used in FsShell.copyToLocal is deprecated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-481) Bug Fixes + HdfsProxy to use proxy user to impresonate the real user
[ https://issues.apache.org/jira/browse/HDFS-481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-481: Status: Open (was: Patch Available) Srikanth, thank you for the update. I am looking forward to review your new patch. > Bug Fixes + HdfsProxy to use proxy user to impresonate the real user > > > Key: HDFS-481 > URL: https://issues.apache.org/jira/browse/HDFS-481 > Project: Hadoop HDFS > Issue Type: Bug > Components: contrib/hdfsproxy >Affects Versions: 0.21.0 >Reporter: zhiyong zhang >Assignee: Srikanth Sundarrajan > Attachments: HDFS-481-bp-y20.patch, HDFS-481-bp-y20s.patch, > HDFS-481.out, HDFS-481.patch, HDFS-481.patch, HDFS-481.patch, HDFS-481.patch, > HDFS-481.patch, HDFS-481.patch > > > Bugs: > 1. hadoop-version is not recognized if run ant command from src/contrib/ or > from src/contrib/hdfsproxy > If running ant command from $HADOOP_HDFS_HOME, hadoop-version will be passed > to contrib's build through subant. But if running from src/contrib or > src/contrib/hdfsproxy, the hadoop-version will not be recognized. > 2. LdapIpDirFilter.java is not thread safe. userName, Group & Paths are per > request and can't be class members. > 3. Addressed the following StackOverflowError. > ERROR [org.apache.catalina.core.ContainerBase.[Catalina].[localh > ost].[/].[proxyForward]] Servlet.service() for servlet proxyForward threw > exception > java.lang.StackOverflowError > at > org.apache.catalina.core.ApplicationHttpRequest.getAttribute(ApplicationHttpR > equest.java:229) > This is due to when the target war (/target.war) does not exist, the > forwarding war will forward to its parent context path /, which defines the > forwarding war itself. This cause infinite loop. Added "HDFS Proxy > Forward".equals(dstContext.getServletContextName() in the if logic to break > the loop. > 4. Kerberos credentials of remote user aren't available. HdfsProxy needs to > act on behalf of the real user to service the requests -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-955) FSImage.saveFSImage can lose edits
[ https://issues.apache.org/jira/browse/HDFS-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852100#action_12852100 ] Hadoop QA commented on HDFS-955: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12440371/saveNamespace.patch against trunk revision 929406. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 11 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/289/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/289/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/289/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/289/console This message is automatically generated. > FSImage.saveFSImage can lose edits > -- > > Key: HDFS-955 > URL: https://issues.apache.org/jira/browse/HDFS-955 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.20.1, 0.21.0, 0.22.0 >Reporter: Todd Lipcon >Assignee: Konstantin Shvachko >Priority: Blocker > Attachments: FSStateTransition7.htm, hdfs-955-moretests.txt, > hdfs-955-unittest.txt, PurgeEditsBeforeImageSave.patch, > saveNamespace-0.20.patch, saveNamespace-0.21.patch, saveNamespace.patch, > saveNamespace.patch, saveNamespace.txt > > > This is a continuation of a discussion from HDFS-909. The FSImage.saveFSImage > function (implementing dfsadmin -saveNamespace) can corrupt the NN storage > such that all current edits are lost. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-955) FSImage.saveFSImage can lose edits
[ https://issues.apache.org/jira/browse/HDFS-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-955: - Assignee: Konstantin Shvachko (was: Todd Lipcon) Status: Patch Available (was: Open) > FSImage.saveFSImage can lose edits > -- > > Key: HDFS-955 > URL: https://issues.apache.org/jira/browse/HDFS-955 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.20.1, 0.21.0, 0.22.0 >Reporter: Todd Lipcon >Assignee: Konstantin Shvachko >Priority: Blocker > Attachments: FSStateTransition7.htm, hdfs-955-moretests.txt, > hdfs-955-unittest.txt, PurgeEditsBeforeImageSave.patch, > saveNamespace-0.20.patch, saveNamespace-0.21.patch, saveNamespace.patch, > saveNamespace.patch, saveNamespace.txt > > > This is a continuation of a discussion from HDFS-909. The FSImage.saveFSImage > function (implementing dfsadmin -saveNamespace) can corrupt the NN storage > such that all current edits are lost. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-955) FSImage.saveFSImage can lose edits
[ https://issues.apache.org/jira/browse/HDFS-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-955: - Status: Open (was: Patch Available) > FSImage.saveFSImage can lose edits > -- > > Key: HDFS-955 > URL: https://issues.apache.org/jira/browse/HDFS-955 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.20.1, 0.21.0, 0.22.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Blocker > Attachments: FSStateTransition7.htm, hdfs-955-moretests.txt, > hdfs-955-unittest.txt, PurgeEditsBeforeImageSave.patch, > saveNamespace-0.20.patch, saveNamespace-0.21.patch, saveNamespace.patch, > saveNamespace.patch, saveNamespace.txt > > > This is a continuation of a discussion from HDFS-909. The FSImage.saveFSImage > function (implementing dfsadmin -saveNamespace) can corrupt the NN storage > such that all current edits are lost. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-955) FSImage.saveFSImage can lose edits
[ https://issues.apache.org/jira/browse/HDFS-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-955: - Attachment: saveNamespace.patch New patch addresses Suresh's comments. > FSImage.saveFSImage can lose edits > -- > > Key: HDFS-955 > URL: https://issues.apache.org/jira/browse/HDFS-955 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.20.1, 0.21.0, 0.22.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Blocker > Attachments: FSStateTransition7.htm, hdfs-955-moretests.txt, > hdfs-955-unittest.txt, PurgeEditsBeforeImageSave.patch, > saveNamespace-0.20.patch, saveNamespace-0.21.patch, saveNamespace.patch, > saveNamespace.patch, saveNamespace.txt > > > This is a continuation of a discussion from HDFS-909. The FSImage.saveFSImage > function (implementing dfsadmin -saveNamespace) can corrupt the NN storage > such that all current edits are lost. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-218) name node should provide status of dfs.name.dir's
[ https://issues.apache.org/jira/browse/HDFS-218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851944#action_12851944 ] dhruba borthakur commented on HDFS-218: --- Sounds like a good idea to me to show space left in the webUI and dfsadmin report. Also, as long as the 10% threshold is configurable (default could be a very small number), +1 > name node should provide status of dfs.name.dir's > - > > Key: HDFS-218 > URL: https://issues.apache.org/jira/browse/HDFS-218 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Allen Wittenauer > > We've had several reports of people letting their dfs.name.dir fill up. To > help prevent this, the name node web ui and perhaps dfsadmin -report or > another command should give a disk space report of all dfs.name.dir's as well > as whether or not the contents of that dir are actually being used, if the > copy is "good", last 2ndary name node update, and any thing else that might > be useful. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads
[ https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851869#action_12851869 ] Jay Booth commented on HDFS-918: Weird. We've been running an almost-the-same version of the patch on our dev cluster for a week and this version passed TestPRead and TestDataTransferProtocol.. admittedly this isn't the exact version we ran on our cluster so there could be a difference but it passes tests, I'm a little stymied. There weren't any exceptions or anything in the datanode log? That error will typically happen when it tries and fails to read the block from where it should be, so hopefully there will be some errors in the DN log. > Use single Selector and small thread pool to replace many instances of > BlockSender for reads > > > Key: HDFS-918 > URL: https://issues.apache.org/jira/browse/HDFS-918 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Reporter: Jay Booth > Fix For: 0.22.0 > > Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, > hdfs-918-20100211.patch, hdfs-918-20100228.patch, hdfs-918-20100309.patch, > hdfs-918-branch20.2.patch, hdfs-multiplex.patch > > > Currently, on read requests, the DataXCeiver server allocates a new thread > per request, which must allocate its own buffers and leads to > higher-than-optimal CPU and memory usage by the sending threads. If we had a > single selector and a small threadpool to multiplex request packets, we could > theoretically achieve higher performance while taking up fewer resources and > leaving more CPU on datanodes available for mapred, hbase or whatever. This > can be done without changing any wire protocols. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-218) name node should provide status of dfs.name.dir's
[ https://issues.apache.org/jira/browse/HDFS-218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851749#action_12851749 ] Ravi Phulari commented on HDFS-218: --- Is it good idea to show how much disk space is remaining for *dfs.name.dir* in web ui and in *dfsadmin -report* output ? How about auto activating Name node Safemode when *dfs.name.dir* remaining space is below certain percentage? (say 10%) > name node should provide status of dfs.name.dir's > - > > Key: HDFS-218 > URL: https://issues.apache.org/jira/browse/HDFS-218 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Allen Wittenauer > > We've had several reports of people letting their dfs.name.dir fill up. To > help prevent this, the name node web ui and perhaps dfsadmin -report or > another command should give a disk space report of all dfs.name.dir's as well > as whether or not the contents of that dir are actually being used, if the > copy is "good", last 2ndary name node update, and any thing else that might > be useful. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.