[jira] Commented: (HDFS-779) Automatic move to safe-mode when cluster size drops

2009-11-19 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780404#action_12780404
 ] 

Hong Tang commented on HDFS-779:


+1 on the direction. I my previous job, we also have a similar feature for our 
internal distributed storage system. 

N% would depend on a number of factors:
- how well is the cluster maintained? well maintained cluster should set N% 
low, 10% sounds right to me.
- what is expected capacity utilization of the cluster? For a cluster that is 
expected to be loaded to 90% full, then we have to claim emergency when 10% of 
the capacity is lost.

> Automatic move to safe-mode when cluster size drops
> ---
>
> Key: HDFS-779
> URL: https://issues.apache.org/jira/browse/HDFS-779
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: name-node
>Reporter: Owen O'Malley
>
> As part of looking at using Kerberos, we want to avoid the case where both 
> the primary (and optional secondary) KDC go offline causing a replication 
> storm as the DataNodes' service tickets time out and they lose the ability to 
> connect to the NameNode. However, this is a specific case of a more general 
> problem of loosing too many nodes too quickly. I think we should have an 
> option to go into safe mode if the cluster size goes down more than N% in 
> terms of DataNodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-756) libhdfs unit tests do not run

2009-11-19 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780391#action_12780391
 ] 

Konstantin Boudnik commented on HDFS-756:
-

bq. You need to apply the patch from HDFS-727 first

yup, my bad. everything seems to be working.

+1 patch looks good

> libhdfs unit tests do not run 
> --
>
> Key: HDFS-756
> URL: https://issues.apache.org/jira/browse/HDFS-756
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: contrib/libhdfs
>Reporter: dhruba borthakur
>Assignee: Eli Collins
>Priority: Critical
> Fix For: 0.22.0
>
> Attachments: hdfs-756-2.patch, hdfs-756.patch, log
>
>
> The libhdfs unit tests (ant test-c++-libhdfs -Dislibhdfs=1) do not run yet 
> because the scripts are in the common subproject,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-778) DistributedFileSystem.getFileBlockLocations() may occasionally return numeric ips as hostnames.

2009-11-19 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780386#action_12780386
 ] 

Suresh Srinivas commented on HDFS-778:
--

Looks like datanodes are registering with IP address as the name in 
registration instead of host name. I in suspect o.a.h.net.DNS, 
{{InetAddress.getLocalHost().getCanonicalHostName()}} is returning the IP 
address. Not sure if this is due to some java SecurityManager issue.

> DistributedFileSystem.getFileBlockLocations() may occasionally return numeric 
> ips as hostnames.
> ---
>
> Key: HDFS-778
> URL: https://issues.apache.org/jira/browse/HDFS-778
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hong Tang
>
> DistributedFileSystem.getFileBlockLocations() may occasionally return numeric 
> ips as hostnames. This seems to be a breach of the 
> FileSystem.getFileBlockLocation() contract:
> {noformat}
>   /**
>* Return an array containing hostnames, offset and size of 
>* portions of the given file.  For a nonexistent 
>* file or regions, null will be returned.
>*
>* This call is most helpful with DFS, where it returns 
>* hostnames of machines that contain the given file.
>*
>* The FileSystem will simply return an elt containing 'localhost'.
>*/
>   public BlockLocation[] getFileBlockLocations(FileStatus file, 
>   long start, long len) throws IOException
> {noformat}
> One (maybe minor) consequence of this issue is: When a job includes such 
> numeric ips in in its splits' locations, JobTracker would not be able to 
> assign the job's map tasks local to the file blocks.
> We should either fix the implementation or change the contract. In the latter 
> case, JobTracker needs to be fixed to maintain both the hostnames and ips of 
> the TaskTrackers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-767) Job failure due to BlockMissingException

2009-11-19 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HDFS-767:


Attachment: HDFS-767.patch

HDFS-767.patch contains the following changes:
 1) code change in DFSClient.java to add random backoff discussed in this JIRA. 
 2) add unit test in TestDFSClientRetries to test effectiveness and performance 
of different parameters settings.

> Job failure due to BlockMissingException
> 
>
> Key: HDFS-767
> URL: https://issues.apache.org/jira/browse/HDFS-767
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ning Zhang
> Attachments: HDFS-767.patch
>
>
> If a block is request by too many mappers/reducers (say, 3000) at the same 
> time, a BlockMissingException is thrown because it exceeds the upper limit (I 
> think 256 by default) of number of threads accessing the same block at the 
> same time. The DFSClient wil catch that exception and retry 3 times after 
> waiting for 3 seconds. Since the wait time is a fixed value, a lot of clients 
> will retry at about the same time and a large portion of them get another 
> failure. After 3 retries, there are about 256*4 = 1024 clients got the block. 
> If the number of clients are more than that, the job will fail. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-727) bug setting block size hdfsOpenFile

2009-11-19 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780373#action_12780373
 ] 

Konstantin Boudnik commented on HDFS-727:
-

Clearly, the problem with the test is unrelated. However, I have executed the 
failing test case from test-patch run with this patch in place and test test is 
passing ok. 

> bug setting block size hdfsOpenFile 
> 
>
> Key: HDFS-727
> URL: https://issues.apache.org/jira/browse/HDFS-727
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eli Collins
>Assignee: Eli Collins
> Fix For: 0.20.2, 0.21.0
>
> Attachments: hdfs727.patch, hdfs727.patch
>
>
> In hdfsOpenFile in libhdfs invokeMethod needs to cast the block size argument 
> to a jlong so a full 8 bytes are passed (rather than 4 plus some garbage 
> which causes writes to fail due to a bogus block size). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-727) bug setting block size hdfsOpenFile

2009-11-19 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780372#action_12780372
 ] 

Konstantin Boudnik commented on HDFS-727:
-

+1 patch looks good.

I've verified if it works for HDFS-756 and it seems to be doing the job. I'm 
going to wait till tomorrow if anyone wants to comment on this and will commit 
it. 

> bug setting block size hdfsOpenFile 
> 
>
> Key: HDFS-727
> URL: https://issues.apache.org/jira/browse/HDFS-727
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eli Collins
>Assignee: Eli Collins
> Fix For: 0.20.2, 0.21.0
>
> Attachments: hdfs727.patch, hdfs727.patch
>
>
> In hdfsOpenFile in libhdfs invokeMethod needs to cast the block size argument 
> to a jlong so a full 8 bytes are passed (rather than 4 plus some garbage 
> which causes writes to fail due to a bogus block size). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-758) Improve reporting of progress of decommissioning

2009-11-19 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-758:
--

Attachment: HDFS-758.3.patch

New patch uploaded.

> Improve reporting of progress of decommissioning
> 
>
> Key: HDFS-758
> URL: https://issues.apache.org/jira/browse/HDFS-758
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS-758.1.patch, HDFS-758.2.patch, HDFS-758.3.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-777) A zero size file is created when SpaceQuota exceeded

2009-11-19 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780353#action_12780353
 ] 

Raghu Angadi commented on HDFS-777:
---

To restate a comment in HDFS-172.

The file is created by the implementation of "-put" command (not as a side 
effect of SpaceQuotaExcceed exception).
I would think this is a bug about how "-put" (internally copyFromLocal) deals 
with errors.

> A zero size file is created when SpaceQuota exceeded
> 
>
> Key: HDFS-777
> URL: https://issues.apache.org/jira/browse/HDFS-777
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.1
> Environment: Debian GNU/Linux 5.0 
> hadoop-0.20.1
> java version "1.6.0_12"
> Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
> Java HotSpot(TM) Server VM (build 11.2-b01, mixed mode)
>Reporter: freestyler
>
> The issue can be reproduced by the following steps:
> $ cd hadoop
> $ bin/hadoop fs -mkdir /tmp
> $ bin/hadoop dfsadmin -setSpaceQuota 1m /tmp
> $ bin/hadoop fs -count -q /tmp  
> none inf 1048576 10485761 
>0  0 hdfs://debian:9000/tmp
> $ ls -l hadoop-0.20.1-core.jar
> -rw-r--r-- 1 freestyler freestyler 2682112 2009-09-02 04:59 
> hadoop-0.20.1-core.jar
> $ bin/hadoop fs -put hadoop-0.20.1-core.jar /tmp/test.jar
> {quote}
> 09/11/19 12:09:35 WARN hdfs.DFSClient: DataStreamer Exception: 
> org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: 
> org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota 
> of /tmp is exceeded: quota=1048576 diskspace consumed=128.0m  
>
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)   
>
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   
>   
> at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:96)
> 
> at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:58)
>
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2906)
>  
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2786)
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
> Caused by: org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota 
> of /tmp is exceeded: quota=1048576 diskspace consumed=128.0m
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectoryWithQuota.verifyQuota(INodeDirectoryWithQuota.java:156)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectoryWithQuota.updateNumItemsInTree(INodeDirectoryWithQuota.java:127)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:859)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addBlock(FSDirectory.java:265)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.allocateBlock(FSNamesystem.java:1436)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1285)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
> at java.security.AccessController.doPrivileged(Native 

[jira] Commented: (HDFS-756) libhdfs unit tests do not run

2009-11-19 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780322#action_12780322
 ] 

Eli Collins commented on HDFS-756:
--

You need to apply the patch from HDFS-727 first. That's patch is ready to go 
just need a comitter to commit it.

> libhdfs unit tests do not run 
> --
>
> Key: HDFS-756
> URL: https://issues.apache.org/jira/browse/HDFS-756
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: contrib/libhdfs
>Reporter: dhruba borthakur
>Assignee: Eli Collins
>Priority: Critical
> Fix For: 0.22.0
>
> Attachments: hdfs-756-2.patch, hdfs-756.patch, log
>
>
> The libhdfs unit tests (ant test-c++-libhdfs -Dislibhdfs=1) do not run yet 
> because the scripts are in the common subproject,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-756) libhdfs unit tests do not run

2009-11-19 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-756:


Attachment: log

The patch seems Ok. However, test is failing on my linux box (see the log file)

> libhdfs unit tests do not run 
> --
>
> Key: HDFS-756
> URL: https://issues.apache.org/jira/browse/HDFS-756
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: contrib/libhdfs
>Reporter: dhruba borthakur
>Assignee: Eli Collins
>Priority: Critical
> Fix For: 0.22.0
>
> Attachments: hdfs-756-2.patch, hdfs-756.patch, log
>
>
> The libhdfs unit tests (ant test-c++-libhdfs -Dislibhdfs=1) do not run yet 
> because the scripts are in the common subproject,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-423) libhdfs.so is pushed to a new location , hence fuds-dfs has to updated to point to the new location of libhdfs

2009-11-19 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-423:
-

Attachment: hdfs-423-2.patch

Here's an updated patch that gets fuse dfs building on trunk and 
fuse_dfs_wrapper.sh working post-project split. I tested basic operations 
(read, write, creat, rm etc) manually on a fuse mount. TestFuseDFS no longer 
runs, I filed HDFS-780 to revive it, but let's do that as a separate checkin 
and get the code compiling first.

> libhdfs.so is pushed to a new location , hence fuds-dfs has to updated to 
> point to the new location of libhdfs
> --
>
> Key: HDFS-423
> URL: https://issues.apache.org/jira/browse/HDFS-423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: contrib/fuse-dfs
>Reporter: Giridharan Kesavan
>Assignee: Eli Collins
> Attachments: hdfs-423-2.patch, hdfs423.patch, patch-4922.v1.txt
>
>
> fuse-dfs depends on libhdfs, and fuse-dfs build.xml still points to the 
> libhfds/libhdfs.so location but libhdfs now is build in a different location 
> please take a look at this bug for the location details 
> https://issues.apache.org/jira/browse/HADOOP-3344
> Thanks,
> Giri

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-780) Revive TestFuseDFS

2009-11-19 Thread Eli Collins (JIRA)
Revive TestFuseDFS
--

 Key: HDFS-780
 URL: https://issues.apache.org/jira/browse/HDFS-780
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Eli Collins
Assignee: Eli Collins


Looks like TestFuseDFS has bit rot. Let's revive it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-727) bug setting block size hdfsOpenFile

2009-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780243#action_12780243
 ] 

Hadoop QA commented on HDFS-727:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12425186/hdfs727.patch
  against trunk revision 881695.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 126 release audit warnings 
(more than the trunk's current 0 warnings).

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/78/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/78/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/78/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/78/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/78/console

This message is automatically generated.

> bug setting block size hdfsOpenFile 
> 
>
> Key: HDFS-727
> URL: https://issues.apache.org/jira/browse/HDFS-727
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eli Collins
>Assignee: Eli Collins
> Fix For: 0.20.2, 0.21.0
>
> Attachments: hdfs727.patch, hdfs727.patch
>
>
> In hdfsOpenFile in libhdfs invokeMethod needs to cast the block size argument 
> to a jlong so a full 8 bytes are passed (rather than 4 plus some garbage 
> which causes writes to fail due to a bogus block size). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-756) libhdfs unit tests do not run

2009-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780238#action_12780238
 ] 

Hadoop QA commented on HDFS-756:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12425498/hdfs-756-2.patch
  against trunk revision 881695.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/119/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/119/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/119/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/119/console

This message is automatically generated.

> libhdfs unit tests do not run 
> --
>
> Key: HDFS-756
> URL: https://issues.apache.org/jira/browse/HDFS-756
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: contrib/libhdfs
>Reporter: dhruba borthakur
>Assignee: Eli Collins
>Priority: Critical
> Fix For: 0.22.0
>
> Attachments: hdfs-756-2.patch, hdfs-756.patch
>
>
> The libhdfs unit tests (ant test-c++-libhdfs -Dislibhdfs=1) do not run yet 
> because the scripts are in the common subproject,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-777) A zero size file is created when SpaceQuota exceeded

2009-11-19 Thread Ravi Phulari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Phulari resolved HDFS-777.
---

Resolution: Duplicate

 This Jira is duplicate of HDFS-172.

> A zero size file is created when SpaceQuota exceeded
> 
>
> Key: HDFS-777
> URL: https://issues.apache.org/jira/browse/HDFS-777
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.1
> Environment: Debian GNU/Linux 5.0 
> hadoop-0.20.1
> java version "1.6.0_12"
> Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
> Java HotSpot(TM) Server VM (build 11.2-b01, mixed mode)
>Reporter: freestyler
>
> The issue can be reproduced by the following steps:
> $ cd hadoop
> $ bin/hadoop fs -mkdir /tmp
> $ bin/hadoop dfsadmin -setSpaceQuota 1m /tmp
> $ bin/hadoop fs -count -q /tmp  
> none inf 1048576 10485761 
>0  0 hdfs://debian:9000/tmp
> $ ls -l hadoop-0.20.1-core.jar
> -rw-r--r-- 1 freestyler freestyler 2682112 2009-09-02 04:59 
> hadoop-0.20.1-core.jar
> $ bin/hadoop fs -put hadoop-0.20.1-core.jar /tmp/test.jar
> {quote}
> 09/11/19 12:09:35 WARN hdfs.DFSClient: DataStreamer Exception: 
> org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: 
> org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota 
> of /tmp is exceeded: quota=1048576 diskspace consumed=128.0m  
>
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)   
>
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   
>   
> at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:96)
> 
> at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:58)
>
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2906)
>  
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2786)
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
> Caused by: org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota 
> of /tmp is exceeded: quota=1048576 diskspace consumed=128.0m
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectoryWithQuota.verifyQuota(INodeDirectoryWithQuota.java:156)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectoryWithQuota.updateNumItemsInTree(INodeDirectoryWithQuota.java:127)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:859)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addBlock(FSDirectory.java:265)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.allocateBlock(FSNamesystem.java:1436)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1285)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
> at org.apache.hadoop.ipc.Client.call(Client.java:739)
> at org.apache.ha

[jira] Commented: (HDFS-779) Automatic move to safe-mode when cluster size drops

2009-11-19 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780232#action_12780232
 ] 

Todd Lipcon commented on HDFS-779:
--

This is similar to HDFS-528, though that patch only does this behavior at 
startup, and doesn't track the "peak datanode count" as you're suggesting. I 
think we should try to kill both birds with one stone here. The top patch in 
that issue has been tested for a couple months in our distribution.

> Automatic move to safe-mode when cluster size drops
> ---
>
> Key: HDFS-779
> URL: https://issues.apache.org/jira/browse/HDFS-779
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: name-node
>Reporter: Owen O'Malley
>
> As part of looking at using Kerberos, we want to avoid the case where both 
> the primary (and optional secondary) KDC go offline causing a replication 
> storm as the DataNodes' service tickets time out and they lose the ability to 
> connect to the NameNode. However, this is a specific case of a more general 
> problem of loosing too many nodes too quickly. I think we should have an 
> option to go into safe mode if the cluster size goes down more than N% in 
> terms of DataNodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-779) Automatic move to safe-mode when cluster size drops

2009-11-19 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780228#action_12780228
 ] 

Owen O'Malley commented on HDFS-779:


To address Allen's concern about small clusters, I'd suggest a minimum 
threshold of 25 nodes being down from the peak.

> Automatic move to safe-mode when cluster size drops
> ---
>
> Key: HDFS-779
> URL: https://issues.apache.org/jira/browse/HDFS-779
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: name-node
>Reporter: Owen O'Malley
>
> As part of looking at using Kerberos, we want to avoid the case where both 
> the primary (and optional secondary) KDC go offline causing a replication 
> storm as the DataNodes' service tickets time out and they lose the ability to 
> connect to the NameNode. However, this is a specific case of a more general 
> problem of loosing too many nodes too quickly. I think we should have an 
> option to go into safe mode if the cluster size goes down more than N% in 
> terms of DataNodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-779) Automatic move to safe-mode when cluster size drops

2009-11-19 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780226#action_12780226
 ] 

Owen O'Malley commented on HDFS-779:


I'd also point out that with the default of 8 hours ticket life, that you would 
need both of the KDCs to be out for 8*60*0.90=46 minutes before you hit this 
limit.

However, this is a more general problem than just KDC failure, which is very 
very remote.

> Automatic move to safe-mode when cluster size drops
> ---
>
> Key: HDFS-779
> URL: https://issues.apache.org/jira/browse/HDFS-779
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: name-node
>Reporter: Owen O'Malley
>
> As part of looking at using Kerberos, we want to avoid the case where both 
> the primary (and optional secondary) KDC go offline causing a replication 
> storm as the DataNodes' service tickets time out and they lose the ability to 
> connect to the NameNode. However, this is a specific case of a more general 
> problem of loosing too many nodes too quickly. I think we should have an 
> option to go into safe mode if the cluster size goes down more than N% in 
> terms of DataNodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-779) Automatic move to safe-mode when cluster size drops

2009-11-19 Thread Kan Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780215#action_12780215
 ] 

Kan Zhang commented on HDFS-779:


Just a comment on the case of connection failures due to unavailable Kerberos 
KDCs. We can remove the dependency on Kerberos by manually configuring a shared 
secret key between NN and DN and using that shared key for the authentication 
between DN and NN. We already plan to support two-party shared key 
authentication on both RPC (SASL DIGEST-MD5) and HTTP (HTTP-DIGEST) regardless.

> Automatic move to safe-mode when cluster size drops
> ---
>
> Key: HDFS-779
> URL: https://issues.apache.org/jira/browse/HDFS-779
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: name-node
>Reporter: Owen O'Malley
>
> As part of looking at using Kerberos, we want to avoid the case where both 
> the primary (and optional secondary) KDC go offline causing a replication 
> storm as the DataNodes' service tickets time out and they lose the ability to 
> connect to the NameNode. However, this is a specific case of a more general 
> problem of loosing too many nodes too quickly. I think we should have an 
> option to go into safe mode if the cluster size goes down more than N% in 
> terms of DataNodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-779) Automatic move to safe-mode when cluster size drops

2009-11-19 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780198#action_12780198
 ] 

Owen O'Malley commented on HDFS-779:


Other points:
  1. Decommissioned nodes clearly shouldn't count in the totals
  2. The default of 10% DataNode lossage probably makes sense.
  3. The high water mark should be over the last day so that long term gradual 
losses don't cause problems.

Thoughts?

> Automatic move to safe-mode when cluster size drops
> ---
>
> Key: HDFS-779
> URL: https://issues.apache.org/jira/browse/HDFS-779
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: name-node
>Reporter: Owen O'Malley
>
> As part of looking at using Kerberos, we want to avoid the case where both 
> the primary (and optional secondary) KDC go offline causing a replication 
> storm as the DataNodes' service tickets time out and they lose the ability to 
> connect to the NameNode. However, this is a specific case of a more general 
> problem of loosing too many nodes too quickly. I think we should have an 
> option to go into safe mode if the cluster size goes down more than N% in 
> terms of DataNodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-779) Automatic move to safe-mode when cluster size drops

2009-11-19 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780197#action_12780197
 ] 

Allen Wittenauer commented on HDFS-779:
---

N% actually gets tricky when total grid size is small.

> Automatic move to safe-mode when cluster size drops
> ---
>
> Key: HDFS-779
> URL: https://issues.apache.org/jira/browse/HDFS-779
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: name-node
>Reporter: Owen O'Malley
>
> As part of looking at using Kerberos, we want to avoid the case where both 
> the primary (and optional secondary) KDC go offline causing a replication 
> storm as the DataNodes' service tickets time out and they lose the ability to 
> connect to the NameNode. However, this is a specific case of a more general 
> problem of loosing too many nodes too quickly. I think we should have an 
> option to go into safe mode if the cluster size goes down more than N% in 
> terms of DataNodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-779) Automatic move to safe-mode when cluster size drops

2009-11-19 Thread Owen O'Malley (JIRA)
Automatic move to safe-mode when cluster size drops
---

 Key: HDFS-779
 URL: https://issues.apache.org/jira/browse/HDFS-779
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: name-node
Reporter: Owen O'Malley


As part of looking at using Kerberos, we want to avoid the case where both the 
primary (and optional secondary) KDC go offline causing a replication storm as 
the DataNodes' service tickets time out and they lose the ability to connect to 
the NameNode. However, this is a specific case of a more general problem of 
loosing too many nodes too quickly. I think we should have an option to go into 
safe mode if the cluster size goes down more than N% in terms of DataNodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2009-11-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780185#action_12780185
 ] 

stack commented on HDFS-630:


I was going to commit this in a day or so unless objection (The formatting is a 
little odd at times in this patch but Cosmin seems to be doing his best to 
follow the formatting that is already in-place in the files he's patching, at 
least for the few I checked).

> In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
> datanodes when locating the next block.
> ---
>
> Key: HDFS-630
> URL: https://issues.apache.org/jira/browse/HDFS-630
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs client
>Affects Versions: 0.21.0
>Reporter: Ruyue Ma
>Assignee: Ruyue Ma
>Priority: Minor
> Attachments: 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
> 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
> 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 
> HDFS-630.patch
>
>
> created from hdfs-200.
> If during a write, the dfsclient sees that a block replica location for a 
> newly allocated block is not-connectable, it re-requests the NN to get a 
> fresh set of replica locations of the block. It tries this 
> dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
> each retry ( see DFSClient.nextBlockOutputStream).
> This setting works well when you have a reasonable size cluster; if u have 
> few datanodes in the cluster, every retry maybe pick the dead-datanode and 
> the above logic bails out.
> Our solution: when getting block location from namenode, we give nn the 
> excluded datanodes. The list of dead datanodes is only for one block 
> allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-756) libhdfs unit tests do not run

2009-11-19 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-756:
-

Status: Patch Available  (was: Open)

> libhdfs unit tests do not run 
> --
>
> Key: HDFS-756
> URL: https://issues.apache.org/jira/browse/HDFS-756
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: contrib/libhdfs
>Reporter: dhruba borthakur
>Assignee: Eli Collins
>Priority: Critical
> Fix For: 0.22.0
>
> Attachments: hdfs-756-2.patch, hdfs-756.patch
>
>
> The libhdfs unit tests (ant test-c++-libhdfs -Dislibhdfs=1) do not run yet 
> because the scripts are in the common subproject,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-727) bug setting block size hdfsOpenFile

2009-11-19 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-727:
-

Status: Open  (was: Patch Available)

> bug setting block size hdfsOpenFile 
> 
>
> Key: HDFS-727
> URL: https://issues.apache.org/jira/browse/HDFS-727
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eli Collins
>Assignee: Eli Collins
> Fix For: 0.20.2, 0.21.0
>
> Attachments: hdfs727.patch, hdfs727.patch
>
>
> In hdfsOpenFile in libhdfs invokeMethod needs to cast the block size argument 
> to a jlong so a full 8 bytes are passed (rather than 4 plus some garbage 
> which causes writes to fail due to a bogus block size). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-727) bug setting block size hdfsOpenFile

2009-11-19 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-727:
-

Status: Patch Available  (was: Open)

> bug setting block size hdfsOpenFile 
> 
>
> Key: HDFS-727
> URL: https://issues.apache.org/jira/browse/HDFS-727
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eli Collins
>Assignee: Eli Collins
> Fix For: 0.20.2, 0.21.0
>
> Attachments: hdfs727.patch, hdfs727.patch
>
>
> In hdfsOpenFile in libhdfs invokeMethod needs to cast the block size argument 
> to a jlong so a full 8 bytes are passed (rather than 4 plus some garbage 
> which causes writes to fail due to a bogus block size). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-756) libhdfs unit tests do not run

2009-11-19 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780164#action_12780164
 ] 

Eli Collins commented on HDFS-756:
--

That last part should have read "the tests fails because it can not find the 
scripts to start the daemons". Sorry for the noise.

> libhdfs unit tests do not run 
> --
>
> Key: HDFS-756
> URL: https://issues.apache.org/jira/browse/HDFS-756
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: contrib/libhdfs
>Reporter: dhruba borthakur
>Assignee: Eli Collins
>Priority: Critical
> Fix For: 0.22.0
>
> Attachments: hdfs-756-2.patch, hdfs-756.patch
>
>
> The libhdfs unit tests (ant test-c++-libhdfs -Dislibhdfs=1) do not run yet 
> because the scripts are in the common subproject,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-756) libhdfs unit tests do not run

2009-11-19 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-756:
-

Attachment: hdfs-756-2.patch

Here's an updated patch that creates the bin directory from the core jar. The 
config scripts in the bin directory assume the bin directory is named "bin" and 
that it lives in the hadoop home directory, so that's where the test scripts 
extracts it. The test script does not clobber a bin directory if it already 
exists, which I tested by creating an empty bin directory and checking that the 
tests fail because they can find the scripts to start the daemons. The tests 
pass after applying the patch from HDFS-727.

> libhdfs unit tests do not run 
> --
>
> Key: HDFS-756
> URL: https://issues.apache.org/jira/browse/HDFS-756
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: contrib/libhdfs
>Reporter: dhruba borthakur
>Assignee: Eli Collins
>Priority: Critical
> Fix For: 0.22.0
>
> Attachments: hdfs-756-2.patch, hdfs-756.patch
>
>
> The libhdfs unit tests (ant test-c++-libhdfs -Dislibhdfs=1) do not run yet 
> because the scripts are in the common subproject,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-778) DistributedFileSystem.getFileBlockLocations() may occasionally return numeric ips as hostnames.

2009-11-19 Thread Hong Tang (JIRA)
DistributedFileSystem.getFileBlockLocations() may occasionally return numeric 
ips as hostnames.
---

 Key: HDFS-778
 URL: https://issues.apache.org/jira/browse/HDFS-778
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Hong Tang


DistributedFileSystem.getFileBlockLocations() may occasionally return numeric 
ips as hostnames. This seems to be a breach of the 
FileSystem.getFileBlockLocation() contract:
{noformat}
  /**
   * Return an array containing hostnames, offset and size of 
   * portions of the given file.  For a nonexistent 
   * file or regions, null will be returned.
   *
   * This call is most helpful with DFS, where it returns 
   * hostnames of machines that contain the given file.
   *
   * The FileSystem will simply return an elt containing 'localhost'.
   */
  public BlockLocation[] getFileBlockLocations(FileStatus file, 
  long start, long len) throws IOException
{noformat}

One (maybe minor) consequence of this issue is: When a job includes such 
numeric ips in in its splits' locations, JobTracker would not be able to assign 
the job's map tasks local to the file blocks.

We should either fix the implementation or change the contract. In the latter 
case, JobTracker needs to be fixed to maintain both the hostnames and ips of 
the TaskTrackers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.