[jira] Commented: (HDFS-1024) SecondaryNamenode fails to checkpoint because namenode fails with CancelledKeyException

2010-03-31 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852246#action_12852246
 ] 

dhruba borthakur commented on HDFS-1024:


I am going to commit it to 0.20 branch now.

> SecondaryNamenode fails to checkpoint because namenode fails with 
> CancelledKeyException
> ---
>
> Key: HDFS-1024
> URL: https://issues.apache.org/jira/browse/HDFS-1024
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.1, 0.20.2, 0.20.3, 0.21.0, 0.22.0
>Reporter: dhruba borthakur
>Assignee: Dmytro Molkov
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: HDFS-1024.patch, HDFS-1024.patch.1, 
> HDFS-1024.patch.1-0.20.txt
>
>
> The secondary namenode fails to retrieve the entire fsimage from the 
> Namenode. It fetches a part of the fsimage but believes that it has fetched 
> the entire fsimage file and proceeds ahead with the checkpointing. Stack 
> traces will be attached below.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1024) SecondaryNamenode fails to checkpoint because namenode fails with CancelledKeyException

2010-03-31 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852245#action_12852245
 ] 

Todd Lipcon commented on HDFS-1024:
---

Not sure if we need a vote, but I do think we should put it in the branch.

> SecondaryNamenode fails to checkpoint because namenode fails with 
> CancelledKeyException
> ---
>
> Key: HDFS-1024
> URL: https://issues.apache.org/jira/browse/HDFS-1024
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.1, 0.20.2, 0.20.3, 0.21.0, 0.22.0
>Reporter: dhruba borthakur
>Assignee: Dmytro Molkov
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: HDFS-1024.patch, HDFS-1024.patch.1, 
> HDFS-1024.patch.1-0.20.txt
>
>
> The secondary namenode fails to retrieve the entire fsimage from the 
> Namenode. It fetches a part of the fsimage but believes that it has fetched 
> the entire fsimage file and proceeds ahead with the checkpointing. Stack 
> traces will be attached below.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads

2010-03-31 Thread Jay Booth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Booth updated HDFS-918:
---

Attachment: hdfs-918-TRUNK.patch

Trunk patch with previous fixes.  

> Use single Selector and small thread pool to replace many instances of 
> BlockSender for reads
> 
>
> Key: HDFS-918
> URL: https://issues.apache.org/jira/browse/HDFS-918
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Reporter: Jay Booth
> Fix For: 0.22.0
>
> Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, 
> hdfs-918-20100211.patch, hdfs-918-20100228.patch, hdfs-918-20100309.patch, 
> hdfs-918-branch20.2.patch, hdfs-918-TRUNK.patch, hdfs-multiplex.patch
>
>
> Currently, on read requests, the DataXCeiver server allocates a new thread 
> per request, which must allocate its own buffers and leads to 
> higher-than-optimal CPU and memory usage by the sending threads.  If we had a 
> single selector and a small threadpool to multiplex request packets, we could 
> theoretically achieve higher performance while taking up fewer resources and 
> leaving more CPU on datanodes available for mapred, hbase or whatever.  This 
> can be done without changing any wire protocols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1024) SecondaryNamenode fails to checkpoint because namenode fails with CancelledKeyException

2010-03-31 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852238#action_12852238
 ] 

stack commented on HDFS-1024:
-

All tests pass but the expected (?) fail of: 
"org.apache.hadoop.streaming.TestStreamingExitStatus"

I'd like to commit to 0.20 branch.  Do we have to run a vote or is evidence of 
corruption of fsimage enough of a reason?

> SecondaryNamenode fails to checkpoint because namenode fails with 
> CancelledKeyException
> ---
>
> Key: HDFS-1024
> URL: https://issues.apache.org/jira/browse/HDFS-1024
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.1, 0.20.2, 0.20.3, 0.21.0, 0.22.0
>Reporter: dhruba borthakur
>Assignee: Dmytro Molkov
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: HDFS-1024.patch, HDFS-1024.patch.1, 
> HDFS-1024.patch.1-0.20.txt
>
>
> The secondary namenode fails to retrieve the entire fsimage from the 
> Namenode. It fetches a part of the fsimage but believes that it has fetched 
> the entire fsimage file and proceeds ahead with the checkpointing. Stack 
> traces will be attached below.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads

2010-03-31 Thread Jay Booth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Booth updated HDFS-918:
---

Attachment: (was: hdfs-918-branch20.2.patch)

> Use single Selector and small thread pool to replace many instances of 
> BlockSender for reads
> 
>
> Key: HDFS-918
> URL: https://issues.apache.org/jira/browse/HDFS-918
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Reporter: Jay Booth
> Fix For: 0.22.0
>
> Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, 
> hdfs-918-20100211.patch, hdfs-918-20100228.patch, hdfs-918-20100309.patch, 
> hdfs-918-branch20.2.patch, hdfs-multiplex.patch
>
>
> Currently, on read requests, the DataXCeiver server allocates a new thread 
> per request, which must allocate its own buffers and leads to 
> higher-than-optimal CPU and memory usage by the sending threads.  If we had a 
> single selector and a small threadpool to multiplex request packets, we could 
> theoretically achieve higher performance while taking up fewer resources and 
> leaving more CPU on datanodes available for mapred, hbase or whatever.  This 
> can be done without changing any wire protocols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads

2010-03-31 Thread Jay Booth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Booth updated HDFS-918:
---

Attachment: hdfs-918-branch20.2.patch

Straightened out the block not found thing with Andrew, that was on his end, 
but then he found a resource leak that's fixed here -- I'll post a trunk patch 
which incorporates this fix and the previous fix shortly.

> Use single Selector and small thread pool to replace many instances of 
> BlockSender for reads
> 
>
> Key: HDFS-918
> URL: https://issues.apache.org/jira/browse/HDFS-918
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Reporter: Jay Booth
> Fix For: 0.22.0
>
> Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, 
> hdfs-918-20100211.patch, hdfs-918-20100228.patch, hdfs-918-20100309.patch, 
> hdfs-918-branch20.2.patch, hdfs-multiplex.patch
>
>
> Currently, on read requests, the DataXCeiver server allocates a new thread 
> per request, which must allocate its own buffers and leads to 
> higher-than-optimal CPU and memory usage by the sending threads.  If we had a 
> single selector and a small threadpool to multiplex request packets, we could 
> theoretically achieve higher performance while taking up fewer resources and 
> leaving more CPU on datanodes available for mapred, hbase or whatever.  This 
> can be done without changing any wire protocols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-185) Chown , chgrp , chmod operations allowed when namenode is in safemode .

2010-03-31 Thread Ravi Phulari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Phulari updated HDFS-185:
--

Attachment: SafeMode-Y20.100.patch

Attaching patch for Yahoo! Hadoop 20.100 branch.

> Chown , chgrp , chmod operations allowed when namenode is in safemode .
> ---
>
> Key: HDFS-185
> URL: https://issues.apache.org/jira/browse/HDFS-185
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ravi Phulari
>Assignee: Ravi Phulari
> Fix For: 0.20.2
>
> Attachments: HADOOP-5942v2.patch, HADOOPv20-5942.patch, 
> HDFS-185-1.patch, HDFS-5942.patch, SafeMode-Y20.100.patch
>
>
> Chown , chgrp , chmod operations allowed when namenode is in safemode .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1072) TestReadWhileWriting may fail

2010-03-31 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852169#action_12852169
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1072:
--

> ... Namenode should never throw AlreadyBeingCreatedException with 
> HDFS_NameNode as the lease holder. 
- If the lease is being recovered, then RecoveryInProgressException should be 
thrown.
- If the file is being created, then it should throw 
AlreadyBeingCreatedException with DFSClient as the lease holder. 

> TestReadWhileWriting may fail
> -
>
> Key: HDFS-1072
> URL: https://issues.apache.org/jira/browse/HDFS-1072
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client, name-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Erik Steffl
>
> If the lease recovery is taking a long time, TestReadWhileWriting may fail by 
> AlreadyBeingCreatedException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1072) TestReadWhileWriting may fail

2010-03-31 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-1072:
-

Component/s: (was: test)
 name-node
 hdfs client

I was thinking that this is just a test problem.  However, it seems not the 
case since Namenode should never throw AlreadyBeingCreatedException with 
HDFS_NameNode as the lease holder.

> TestReadWhileWriting may fail
> -
>
> Key: HDFS-1072
> URL: https://issues.apache.org/jira/browse/HDFS-1072
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client, name-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Erik Steffl
>
> If the lease recovery is taking a long time, TestReadWhileWriting may fail by 
> AlreadyBeingCreatedException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HDFS-1072) TestReadWhileWriting may fail

2010-03-31 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE reassigned HDFS-1072:


Assignee: Erik Steffl  (was: Tsz Wo (Nicholas), SZE)

> TestReadWhileWriting may fail
> -
>
> Key: HDFS-1072
> URL: https://issues.apache.org/jira/browse/HDFS-1072
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Erik Steffl
>
> If the lease recovery is taking a long time, TestReadWhileWriting may fail by 
> AlreadyBeingCreatedException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1072) TestReadWhileWriting may fail

2010-03-31 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852130#action_12852130
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1072:
--

Here is the stack trace.
{noformat}
org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:
 failed to create file /TestReadWhileWriting/file1 for DFSClient_-1436751315 on 
client 127.0.0.1,
 because this file is already being created by HDFS_NameNode on 127.0.0.1
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1210)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:1305)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.append(NameNode.java:636)
at 
org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:342)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1253)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1249)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:706)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1247)

at org.apache.hadoop.ipc.Client.call(Client.java:895)
at 
org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
at $Proxy6.append(Unknown Source)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy6.append(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:701)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:212)
at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:793)
at 
org.apache.hadoop.hdfs.TestReadWhileWriting.append(TestReadWhileWriting.java:111)
at 
org.apache.hadoop.hdfs.TestReadWhileWriting.pipeline_02_03(TestReadWhileWriting.java:95)
{noformat}

> TestReadWhileWriting may fail
> -
>
> Key: HDFS-1072
> URL: https://issues.apache.org/jira/browse/HDFS-1072
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>
> If the lease recovery is taking a long time, TestReadWhileWriting may fail by 
> AlreadyBeingCreatedException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1072) TestReadWhileWriting may fail

2010-03-31 Thread Tsz Wo (Nicholas), SZE (JIRA)
TestReadWhileWriting may fail
-

 Key: HDFS-1072
 URL: https://issues.apache.org/jira/browse/HDFS-1072
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE


If the lease recovery is taking a long time, TestReadWhileWriting may fail by 
AlreadyBeingCreatedException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-291) combine FsShell.copyToLocal to ChecksumFileSystem.copyToLocalFile

2010-03-31 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE resolved HDFS-291.
-

Resolution: Won't Fix

Closing this minor issue as "won't fix".

> combine FsShell.copyToLocal to ChecksumFileSystem.copyToLocalFile
> -
>
> Key: HDFS-291
> URL: https://issues.apache.org/jira/browse/HDFS-291
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Tsz Wo (Nicholas), SZE
>Priority: Minor
>
> - Two methods provide similar functions
> - ChecksumFileSystem.copyToLocalFile(Path src, Path dst, boolean copyCrc) is 
> no longer used anywhere in the system
> - It is better to use ChecksumFileSystem.getRawFileSystem() for copying crc 
> in FsShell.copyToLocal
> - FileSystem.isDirectory(Path) used in FsShell.copyToLocal is deprecated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-481) Bug Fixes + HdfsProxy to use proxy user to impresonate the real user

2010-03-31 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-481:


Status: Open  (was: Patch Available)

Srikanth, thank you for the update.  I am looking forward to review your new 
patch.

> Bug Fixes + HdfsProxy to use proxy user to impresonate the real user
> 
>
> Key: HDFS-481
> URL: https://issues.apache.org/jira/browse/HDFS-481
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: contrib/hdfsproxy
>Affects Versions: 0.21.0
>Reporter: zhiyong zhang
>Assignee: Srikanth Sundarrajan
> Attachments: HDFS-481-bp-y20.patch, HDFS-481-bp-y20s.patch, 
> HDFS-481.out, HDFS-481.patch, HDFS-481.patch, HDFS-481.patch, HDFS-481.patch, 
> HDFS-481.patch, HDFS-481.patch
>
>
> Bugs:
> 1. hadoop-version is not recognized if run ant command from src/contrib/ or 
> from src/contrib/hdfsproxy  
> If running ant command from $HADOOP_HDFS_HOME, hadoop-version will be passed 
> to contrib's build through subant. But if running from src/contrib or 
> src/contrib/hdfsproxy, the hadoop-version will not be recognized. 
> 2. LdapIpDirFilter.java is not thread safe. userName, Group & Paths are per 
> request and can't be class members.
> 3. Addressed the following StackOverflowError. 
> ERROR [org.apache.catalina.core.ContainerBase.[Catalina].[localh
> ost].[/].[proxyForward]] Servlet.service() for servlet proxyForward threw 
> exception
> java.lang.StackOverflowError
> at 
> org.apache.catalina.core.ApplicationHttpRequest.getAttribute(ApplicationHttpR
> equest.java:229)
>  This is due to when the target war (/target.war) does not exist, the 
> forwarding war will forward to its parent context path /, which defines the 
> forwarding war itself. This cause infinite loop.  Added "HDFS Proxy 
> Forward".equals(dstContext.getServletContextName() in the if logic to break 
> the loop.
> 4. Kerberos credentials of remote user aren't available. HdfsProxy needs to 
> act on behalf of the real user to service the requests

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-955) FSImage.saveFSImage can lose edits

2010-03-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852100#action_12852100
 ] 

Hadoop QA commented on HDFS-955:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12440371/saveNamespace.patch
  against trunk revision 929406.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 11 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/289/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/289/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/289/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/289/console

This message is automatically generated.

> FSImage.saveFSImage can lose edits
> --
>
> Key: HDFS-955
> URL: https://issues.apache.org/jira/browse/HDFS-955
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Todd Lipcon
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Attachments: FSStateTransition7.htm, hdfs-955-moretests.txt, 
> hdfs-955-unittest.txt, PurgeEditsBeforeImageSave.patch, 
> saveNamespace-0.20.patch, saveNamespace-0.21.patch, saveNamespace.patch, 
> saveNamespace.patch, saveNamespace.txt
>
>
> This is a continuation of a discussion from HDFS-909. The FSImage.saveFSImage 
> function (implementing dfsadmin -saveNamespace) can corrupt the NN storage 
> such that all current edits are lost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-955) FSImage.saveFSImage can lose edits

2010-03-31 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-955:
-

Assignee: Konstantin Shvachko  (was: Todd Lipcon)
  Status: Patch Available  (was: Open)

> FSImage.saveFSImage can lose edits
> --
>
> Key: HDFS-955
> URL: https://issues.apache.org/jira/browse/HDFS-955
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Todd Lipcon
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Attachments: FSStateTransition7.htm, hdfs-955-moretests.txt, 
> hdfs-955-unittest.txt, PurgeEditsBeforeImageSave.patch, 
> saveNamespace-0.20.patch, saveNamespace-0.21.patch, saveNamespace.patch, 
> saveNamespace.patch, saveNamespace.txt
>
>
> This is a continuation of a discussion from HDFS-909. The FSImage.saveFSImage 
> function (implementing dfsadmin -saveNamespace) can corrupt the NN storage 
> such that all current edits are lost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-955) FSImage.saveFSImage can lose edits

2010-03-31 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-955:
-

Status: Open  (was: Patch Available)

> FSImage.saveFSImage can lose edits
> --
>
> Key: HDFS-955
> URL: https://issues.apache.org/jira/browse/HDFS-955
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Attachments: FSStateTransition7.htm, hdfs-955-moretests.txt, 
> hdfs-955-unittest.txt, PurgeEditsBeforeImageSave.patch, 
> saveNamespace-0.20.patch, saveNamespace-0.21.patch, saveNamespace.patch, 
> saveNamespace.patch, saveNamespace.txt
>
>
> This is a continuation of a discussion from HDFS-909. The FSImage.saveFSImage 
> function (implementing dfsadmin -saveNamespace) can corrupt the NN storage 
> such that all current edits are lost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-955) FSImage.saveFSImage can lose edits

2010-03-31 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-955:
-

Attachment: saveNamespace.patch

New patch addresses Suresh's comments.

> FSImage.saveFSImage can lose edits
> --
>
> Key: HDFS-955
> URL: https://issues.apache.org/jira/browse/HDFS-955
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Attachments: FSStateTransition7.htm, hdfs-955-moretests.txt, 
> hdfs-955-unittest.txt, PurgeEditsBeforeImageSave.patch, 
> saveNamespace-0.20.patch, saveNamespace-0.21.patch, saveNamespace.patch, 
> saveNamespace.patch, saveNamespace.txt
>
>
> This is a continuation of a discussion from HDFS-909. The FSImage.saveFSImage 
> function (implementing dfsadmin -saveNamespace) can corrupt the NN storage 
> such that all current edits are lost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-218) name node should provide status of dfs.name.dir's

2010-03-31 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851944#action_12851944
 ] 

dhruba borthakur commented on HDFS-218:
---

Sounds like a good idea to me to show space left in the webUI and dfsadmin 
report.

Also, as long as the 10% threshold is configurable (default could be a very 
small number), +1

> name node should provide status of dfs.name.dir's
> -
>
> Key: HDFS-218
> URL: https://issues.apache.org/jira/browse/HDFS-218
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Allen Wittenauer
>
> We've had several reports of people letting their dfs.name.dir fill up.  To 
> help prevent this, the name node web ui and perhaps dfsadmin -report or 
> another command should give a disk space report of all dfs.name.dir's as well 
> as whether or not the contents of that dir are actually being used, if the 
> copy is "good", last 2ndary name node update, and any thing else that might 
> be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads

2010-03-31 Thread Jay Booth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851869#action_12851869
 ] 

Jay Booth commented on HDFS-918:


Weird.  We've been running an almost-the-same version of the patch on our dev 
cluster for a week and this version passed TestPRead and 
TestDataTransferProtocol..  admittedly this isn't the exact version we ran on 
our cluster so there could be a difference but it passes tests,  I'm a little 
stymied.  

There weren't any exceptions or anything in the datanode log?  That error will 
typically happen when it tries and fails to read the block from where it should 
be, so hopefully there will be some errors in the DN log.  



> Use single Selector and small thread pool to replace many instances of 
> BlockSender for reads
> 
>
> Key: HDFS-918
> URL: https://issues.apache.org/jira/browse/HDFS-918
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Reporter: Jay Booth
> Fix For: 0.22.0
>
> Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, 
> hdfs-918-20100211.patch, hdfs-918-20100228.patch, hdfs-918-20100309.patch, 
> hdfs-918-branch20.2.patch, hdfs-multiplex.patch
>
>
> Currently, on read requests, the DataXCeiver server allocates a new thread 
> per request, which must allocate its own buffers and leads to 
> higher-than-optimal CPU and memory usage by the sending threads.  If we had a 
> single selector and a small threadpool to multiplex request packets, we could 
> theoretically achieve higher performance while taking up fewer resources and 
> leaving more CPU on datanodes available for mapred, hbase or whatever.  This 
> can be done without changing any wire protocols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-218) name node should provide status of dfs.name.dir's

2010-03-31 Thread Ravi Phulari (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851749#action_12851749
 ] 

Ravi Phulari commented on HDFS-218:
---

Is it good idea to show how much disk space is remaining for *dfs.name.dir* in  
web ui and  in *dfsadmin -report* output ?
How about auto activating Name node Safemode when *dfs.name.dir*  remaining 
space is below certain percentage? (say 10%)

> name node should provide status of dfs.name.dir's
> -
>
> Key: HDFS-218
> URL: https://issues.apache.org/jira/browse/HDFS-218
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Allen Wittenauer
>
> We've had several reports of people letting their dfs.name.dir fill up.  To 
> help prevent this, the name node web ui and perhaps dfsadmin -report or 
> another command should give a disk space report of all dfs.name.dir's as well 
> as whether or not the contents of that dir are actually being used, if the 
> copy is "good", last 2ndary name node update, and any thing else that might 
> be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.