[jira] Updated: (HDFS-1292) Allow artifacts to be published to the staging Apache Nexus Maven Repository

2010-09-15 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated HDFS-1292:
-

Status: Patch Available  (was: Open)

 Allow artifacts to be published to the staging Apache Nexus Maven Repository
 

 Key: HDFS-1292
 URL: https://issues.apache.org/jira/browse/HDFS-1292
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Reporter: Tom White
Assignee: Giridharan Kesavan
Priority: Blocker
 Fix For: 0.21.0

 Attachments: hdfs-1292.patch


 HDFS companion issue to HADOOP-6847.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1292) Allow artifacts to be published to the staging Apache Nexus Maven Repository

2010-09-15 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated HDFS-1292:
-

Status: Open  (was: Patch Available)

 Allow artifacts to be published to the staging Apache Nexus Maven Repository
 

 Key: HDFS-1292
 URL: https://issues.apache.org/jira/browse/HDFS-1292
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Reporter: Tom White
Assignee: Giridharan Kesavan
Priority: Blocker
 Fix For: 0.21.0

 Attachments: hdfs-1292.patch


 HDFS companion issue to HADOOP-6847.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1292) Allow artifacts to be published to the staging Apache Nexus Maven Repository

2010-09-15 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated HDFS-1292:
-

Status: Open  (was: Patch Available)

 Allow artifacts to be published to the staging Apache Nexus Maven Repository
 

 Key: HDFS-1292
 URL: https://issues.apache.org/jira/browse/HDFS-1292
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Reporter: Tom White
Assignee: Giridharan Kesavan
Priority: Blocker
 Fix For: 0.21.0

 Attachments: hdfs-1292.patch


 HDFS companion issue to HADOOP-6847.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1292) Allow artifacts to be published to the staging Apache Nexus Maven Repository

2010-09-15 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated HDFS-1292:
-

Status: Patch Available  (was: Open)

 Allow artifacts to be published to the staging Apache Nexus Maven Repository
 

 Key: HDFS-1292
 URL: https://issues.apache.org/jira/browse/HDFS-1292
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Reporter: Tom White
Assignee: Giridharan Kesavan
Priority: Blocker
 Fix For: 0.21.0

 Attachments: hdfs-1292.patch


 HDFS companion issue to HADOOP-6847.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1292) Allow artifacts to be published to the staging Apache Nexus Maven Repository

2010-09-15 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated HDFS-1292:
-

Status: Open  (was: Patch Available)

 Allow artifacts to be published to the staging Apache Nexus Maven Repository
 

 Key: HDFS-1292
 URL: https://issues.apache.org/jira/browse/HDFS-1292
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Reporter: Tom White
Assignee: Giridharan Kesavan
Priority: Blocker
 Fix For: 0.21.0

 Attachments: hdfs-1292.patch


 HDFS companion issue to HADOOP-6847.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1292) Allow artifacts to be published to the staging Apache Nexus Maven Repository

2010-09-15 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated HDFS-1292:
-

Status: Patch Available  (was: Open)

 Allow artifacts to be published to the staging Apache Nexus Maven Repository
 

 Key: HDFS-1292
 URL: https://issues.apache.org/jira/browse/HDFS-1292
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Reporter: Tom White
Assignee: Giridharan Kesavan
Priority: Blocker
 Fix For: 0.21.0

 Attachments: hdfs-1292.patch


 HDFS companion issue to HADOOP-6847.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1383) Better error messages on hftp

2010-09-15 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-1383:
-

Attachment: h1383_20100915_y20.patch

h1383_20100915_y20.patch: just cheking s!= null and added a few unit tests.

 Better error messages on hftp 
 --

 Key: HDFS-1383
 URL: https://issues.apache.org/jira/browse/HDFS-1383
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h1383_20100913_y20.patch, h1383_20100915_y20.patch


 If the file is not accessible, HftpFileSystem returns only a HTTP response 
 code.
 {noformat}
 2010-08-27 20:57:48,091 INFO org.apache.hadoop.tools.DistCp: FAIL README.txt 
 : java.io.IOException:
  Server returned HTTP response code: 400 for URL: 
 http:/namenode:50070/data/user/tsz/README.txt?ugi=tsz,users
 at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1290)
 at org.apache.hadoop.hdfs.HftpFileSystem.open(HftpFileSystem.java:143)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
 ...
  {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1383) Better error messages on hftp

2010-09-15 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909843#action_12909843
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1383:
--

Below is the outputs after the patch.
{noformat}
[r...@host yahoo-hadoop-0.20.1xx]# ./bin/hadoop fs -cat 
hftp://host.xx.yy:50070/user/root/foo/s.txt
cat: user=root, access=EXECUTE, inode=foo:root:supergroup:-

[r...@host yahoo-hadoop-0.20.1xx]# ./bin/hadoop fs -cat 
hftp://host.xx.yy:50070/user/root/foo
cat: user=root, access=READ_EXECUTE, inode=foo:root:supergroup:-

[r...@host yahoo-hadoop-0.20.1xx]# ./bin/hadoop fs -cat 
hftp://host.xx.yy:50070/user/root/bar
cat: /user/root/bar is a directory (error code=400)
{noformat}

 Better error messages on hftp 
 --

 Key: HDFS-1383
 URL: https://issues.apache.org/jira/browse/HDFS-1383
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h1383_20100913_y20.patch, h1383_20100915_y20.patch


 If the file is not accessible, HftpFileSystem returns only a HTTP response 
 code.
 {noformat}
 2010-08-27 20:57:48,091 INFO org.apache.hadoop.tools.DistCp: FAIL README.txt 
 : java.io.IOException:
  Server returned HTTP response code: 400 for URL: 
 http:/namenode:50070/data/user/tsz/README.txt?ugi=tsz,users
 at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1290)
 at org.apache.hadoop.hdfs.HftpFileSystem.open(HftpFileSystem.java:143)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
 ...
  {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-779) Automatic move to safe-mode when cluster size drops

2010-09-15 Thread Robert Chansler (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909844#action_12909844
 ] 

Robert Chansler commented on HDFS-779:
--

I think Dhruba's comment #2 from the 13th supports my contention that counting 
missing replicas is the key ides. The loss of empty nodes has not been a 
problem. Too many missing replicas--regardless of the number of missing 
nodes--has been a problem.

And the issue is whether there is a catastrophic circumstance _right now_ 
rather than whether today is (much) _worse than yesterday_. Does Dhruba's 
suggestion protect against things becoming exponentially bad but at rate less 
than _m_?

But supposing a catastrophe is declared by whatever policy, how should the 
system behave? Retreat to safe mode is intuitively understandable, and 
answers a lot of questions. I'm always reluctant to withdraw service, and so 
catastrophe should mean that the users are going to lose in any case. I'm 
even more reluctant to allow HDFS to continue in an operating mode where things 
seem to work, but replication has been suspended. If a zillion replicas are 
missing, the system requires professional attention from an administrator.

 Automatic move to safe-mode when cluster size drops
 ---

 Key: HDFS-779
 URL: https://issues.apache.org/jira/browse/HDFS-779
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: name-node
Reporter: Owen O'Malley
Assignee: dhruba borthakur

 As part of looking at using Kerberos, we want to avoid the case where both 
 the primary (and optional secondary) KDC go offline causing a replication 
 storm as the DataNodes' service tickets time out and they lose the ability to 
 connect to the NameNode. However, this is a specific case of a more general 
 problem of loosing too many nodes too quickly. I think we should have an 
 option to go into safe mode if the cluster size goes down more than N% in 
 terms of DataNodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1383) Better error messages on hftp

2010-09-15 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909854#action_12909854
 ] 

Suresh Srinivas commented on HDFS-1383:
---

+1 for the patch.

One minor comment - There is a change in UGI with this patch. Was that 
intentionally introduced by this patch?

 Better error messages on hftp 
 --

 Key: HDFS-1383
 URL: https://issues.apache.org/jira/browse/HDFS-1383
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h1383_20100913_y20.patch, h1383_20100915_y20.patch


 If the file is not accessible, HftpFileSystem returns only a HTTP response 
 code.
 {noformat}
 2010-08-27 20:57:48,091 INFO org.apache.hadoop.tools.DistCp: FAIL README.txt 
 : java.io.IOException:
  Server returned HTTP response code: 400 for URL: 
 http:/namenode:50070/data/user/tsz/README.txt?ugi=tsz,users
 at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1290)
 at org.apache.hadoop.hdfs.HftpFileSystem.open(HftpFileSystem.java:143)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
 ...
  {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1383) Better error messages on hftp

2010-09-15 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909864#action_12909864
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1383:
--

No, I don't want to change UGI.  Thanks for reviewing it.

 Better error messages on hftp 
 --

 Key: HDFS-1383
 URL: https://issues.apache.org/jira/browse/HDFS-1383
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h1383_20100913_y20.patch, h1383_20100915_y20.patch


 If the file is not accessible, HftpFileSystem returns only a HTTP response 
 code.
 {noformat}
 2010-08-27 20:57:48,091 INFO org.apache.hadoop.tools.DistCp: FAIL README.txt 
 : java.io.IOException:
  Server returned HTTP response code: 400 for URL: 
 http:/namenode:50070/data/user/tsz/README.txt?ugi=tsz,users
 at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1290)
 at org.apache.hadoop.hdfs.HftpFileSystem.open(HftpFileSystem.java:143)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
 ...
  {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1383) Better error messages on hftp

2010-09-15 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-1383:
-

Attachment: h1383_20100915b_y20.patch

h1383_20100915b_y20.patch: reverted the change in UserGroupInformation

 Better error messages on hftp 
 --

 Key: HDFS-1383
 URL: https://issues.apache.org/jira/browse/HDFS-1383
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h1383_20100913_y20.patch, h1383_20100915_y20.patch, 
 h1383_20100915b_y20.patch


 If the file is not accessible, HftpFileSystem returns only a HTTP response 
 code.
 {noformat}
 2010-08-27 20:57:48,091 INFO org.apache.hadoop.tools.DistCp: FAIL README.txt 
 : java.io.IOException:
  Server returned HTTP response code: 400 for URL: 
 http:/namenode:50070/data/user/tsz/README.txt?ugi=tsz,users
 at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1290)
 at org.apache.hadoop.hdfs.HftpFileSystem.open(HftpFileSystem.java:143)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
 ...
  {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1399) Distinct minicluster services (e.g. NN and JT) overwrite each other's service policies

2010-09-15 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-1399:
-

Attachment: hdfs-1399.1.txt

Updated patch to address Todd's comments.

 Distinct minicluster services (e.g. NN and JT) overwrite each other's service 
 policies
 --

 Key: HDFS-1399
 URL: https://issues.apache.org/jira/browse/HDFS-1399
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Fix For: 0.22.0

 Attachments: hdfs-1399.1.txt, hdfs-1399.txt.0


 HDFS portion of HADOOP-6951.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Moved: (HDFS-1402) Optimize input split creation

2010-09-15 Thread Paul Burkhardt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Burkhardt moved MAPREDUCE-1973 to HDFS-1402:
-

  Project: Hadoop HDFS  (was: Hadoop Map/Reduce)
  Key: HDFS-1402  (was: MAPREDUCE-1973)
Affects Version/s: 0.22.0
   (was: 0.20.1)
   (was: 0.20.2)

 Optimize input split creation
 -

 Key: HDFS-1402
 URL: https://issues.apache.org/jira/browse/HDFS-1402
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.22.0
 Environment: Intel Nehalem cluster running Red Hat.
Reporter: Paul Burkhardt
Priority: Minor
 Attachments: HADOOP-1973.patch


 The input split returns the locations that host the file blocks in the split. 
 The locations are determined by the getBlockLocations method of the 
 filesystem client which requires a remote connection to the filesystem (i.e. 
 HDFS). The remote connection is made for each file in the entire input split. 
 For jobs with many input files the network connections dominate the cost of 
 writing the input split file.
 A job requests a listing of the input files from the remote filesystem and 
 creates a FileStatus object as a handle for each file in the listing. The 
 FileStatus object can be imbued with the necessary host information on the 
 remote end and passed to the client-side in the bulk return of the listing 
 request. A getHosts method of the FileStatus would then return the locations 
 for the blocks comprising that file and eliminate the need for another trip 
 to the remote filesystem.
 The INodeFile maintains the blocks for a file and is an obvious choice to be 
 the originator for the locations of that file. It is also available to the 
 FSDirectory which first creates the listing of FileStatus objects. We propose 
 that the block locations be generated by the INodeFile to instantiate the 
 FileStatus object during the getListing request.
 Our tests demonstrated a factor of 2000 speedup for approximately 60,000 
 input files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1402) Optimize input split creation

2010-09-15 Thread Paul Burkhardt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Burkhardt updated HDFS-1402:
-

Attachment: HDFS-1402.patch
HDFS-1402.common.patch

Patched against the trunk.

 Optimize input split creation
 -

 Key: HDFS-1402
 URL: https://issues.apache.org/jira/browse/HDFS-1402
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.22.0
 Environment: Intel Nehalem cluster running Red Hat.
Reporter: Paul Burkhardt
Priority: Minor
 Attachments: HADOOP-1973.patch, HDFS-1402.common.patch, 
 HDFS-1402.patch


 The input split returns the locations that host the file blocks in the split. 
 The locations are determined by the getBlockLocations method of the 
 filesystem client which requires a remote connection to the filesystem (i.e. 
 HDFS). The remote connection is made for each file in the entire input split. 
 For jobs with many input files the network connections dominate the cost of 
 writing the input split file.
 A job requests a listing of the input files from the remote filesystem and 
 creates a FileStatus object as a handle for each file in the listing. The 
 FileStatus object can be imbued with the necessary host information on the 
 remote end and passed to the client-side in the bulk return of the listing 
 request. A getHosts method of the FileStatus would then return the locations 
 for the blocks comprising that file and eliminate the need for another trip 
 to the remote filesystem.
 The INodeFile maintains the blocks for a file and is an obvious choice to be 
 the originator for the locations of that file. It is also available to the 
 FSDirectory which first creates the listing of FileStatus objects. We propose 
 that the block locations be generated by the INodeFile to instantiate the 
 FileStatus object during the getListing request.
 Our tests demonstrated a factor of 2000 speedup for approximately 60,000 
 input files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1383) Better error messages on hftp

2010-09-15 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-1383:
-

   Status: Patch Available  (was: Open)
Fix Version/s: 0.22.0

 Better error messages on hftp 
 --

 Key: HDFS-1383
 URL: https://issues.apache.org/jira/browse/HDFS-1383
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 0.22.0

 Attachments: h1383_20100913_y20.patch, h1383_20100915_y20.patch, 
 h1383_20100915b.patch, h1383_20100915b_y20.patch


 If the file is not accessible, HftpFileSystem returns only a HTTP response 
 code.
 {noformat}
 2010-08-27 20:57:48,091 INFO org.apache.hadoop.tools.DistCp: FAIL README.txt 
 : java.io.IOException:
  Server returned HTTP response code: 400 for URL: 
 http:/namenode:50070/data/user/tsz/README.txt?ugi=tsz,users
 at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1290)
 at org.apache.hadoop.hdfs.HftpFileSystem.open(HftpFileSystem.java:143)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
 ...
  {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1383) Better error messages on hftp

2010-09-15 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-1383:
-

Attachment: h1383_20100915b.patch

h1383_20100915b.patch: for trunk

 Better error messages on hftp 
 --

 Key: HDFS-1383
 URL: https://issues.apache.org/jira/browse/HDFS-1383
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 0.22.0

 Attachments: h1383_20100913_y20.patch, h1383_20100915_y20.patch, 
 h1383_20100915b.patch, h1383_20100915b_y20.patch


 If the file is not accessible, HftpFileSystem returns only a HTTP response 
 code.
 {noformat}
 2010-08-27 20:57:48,091 INFO org.apache.hadoop.tools.DistCp: FAIL README.txt 
 : java.io.IOException:
  Server returned HTTP response code: 400 for URL: 
 http:/namenode:50070/data/user/tsz/README.txt?ugi=tsz,users
 at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1290)
 at org.apache.hadoop.hdfs.HftpFileSystem.open(HftpFileSystem.java:143)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
 ...
  {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1402) Optimize input split creation

2010-09-15 Thread Paul Burkhardt (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909924#action_12909924
 ] 

Paul Burkhardt commented on HDFS-1402:
--

I decided to patch against the trunk. The changes span both HDFS and Common but 
I attached two separate patches to this ticket for now.

As previously noted, this patch addresses the same core issue as HDFS-202. My 
concern is HDFS-202 adds a parallel set of interfaces to support file status 
objects with location information. My argument is the locations of a file 
should be a first-class attribute shared by all file types. If we force an 
interface, getHosts or getLocations, for any file status type we can simplify 
the client and server API for creating and listing file status objects. File 
status types from a distributed file system, i.e. HDFS, return the hosts for 
the file blocks whereas a file status type from a non-distributed or local file 
system would return a single host, all by the same interface.

 Optimize input split creation
 -

 Key: HDFS-1402
 URL: https://issues.apache.org/jira/browse/HDFS-1402
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.22.0
 Environment: Intel Nehalem cluster running Red Hat.
Reporter: Paul Burkhardt
Priority: Minor
 Attachments: HADOOP-1973.patch, HDFS-1402.common.patch, 
 HDFS-1402.patch


 The input split returns the locations that host the file blocks in the split. 
 The locations are determined by the getBlockLocations method of the 
 filesystem client which requires a remote connection to the filesystem (i.e. 
 HDFS). The remote connection is made for each file in the entire input split. 
 For jobs with many input files the network connections dominate the cost of 
 writing the input split file.
 A job requests a listing of the input files from the remote filesystem and 
 creates a FileStatus object as a handle for each file in the listing. The 
 FileStatus object can be imbued with the necessary host information on the 
 remote end and passed to the client-side in the bulk return of the listing 
 request. A getHosts method of the FileStatus would then return the locations 
 for the blocks comprising that file and eliminate the need for another trip 
 to the remote filesystem.
 The INodeFile maintains the blocks for a file and is an obvious choice to be 
 the originator for the locations of that file. It is also available to the 
 FSDirectory which first creates the listing of FileStatus objects. We propose 
 that the block locations be generated by the INodeFile to instantiate the 
 FileStatus object during the getListing request.
 Our tests demonstrated a factor of 2000 speedup for approximately 60,000 
 input files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-375) DFSClient cpu overhead is too high

2010-09-15 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE resolved HDFS-375.
-

Resolution: Not A Problem

I believe this issue went stale.  Closing.

 DFSClient cpu overhead is too high
 --

 Key: HDFS-375
 URL: https://issues.apache.org/jira/browse/HDFS-375
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Runping Qi

 When we do dfs throughput test using hadoop dfs -cat, we have observed that 
 the client side cpu usage is very high, 3 to five times that of a data node 
 serving the file.
 Before 0.18, the data node cpu usage was equally high, and this problem is 
 fixed since 0.18. However, the client side problem still exists.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-198) org.apache.hadoop.dfs.LeaseExpiredException during dfs write

2010-09-15 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE resolved HDFS-198.
-

Resolution: Not A Problem

I believe this issue went stale.  Closing.

 org.apache.hadoop.dfs.LeaseExpiredException during dfs write
 

 Key: HDFS-198
 URL: https://issues.apache.org/jira/browse/HDFS-198
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Runping Qi

 Many long running cpu intensive map tasks failed due to 
 org.apache.hadoop.dfs.LeaseExpiredException.
 Here is except from the log:
 2008-10-26 11:54:17,282 INFO org.apache.hadoop.dfs.DFSClient: 
 org.apache.hadoop.ipc.RemoteException: 
 org.apache.hadoop.dfs.LeaseExpiredException: No lease on 
 /xxx/_temporary/_task_200810232126_0001_m_33_0/part-00033 File does not 
 exist. [Lease.  Holder: 44 46 53 43 6c 69 65 6e 74 5f 74 61 73 6b 5f 32 30 30 
 38 31 30 32 33 32 31 32 36 5f 30 30 30 31 5f 6d 5f 30 30 30 30 33 33 5f 30, 
 heldlocks: 0, pendingcreates: 1]
   at org.apache.hadoop.dfs.FSNamesystem.checkLease(FSNamesystem.java:1194)
   at 
 org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1125)
   at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:300)
   at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
   at org.apache.hadoop.ipc.Client.call(Client.java:557)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
   at org.apache.hadoop.dfs.$Proxy1.addBlock(Unknown Source)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
   at org.apache.hadoop.dfs.$Proxy1.addBlock(Unknown Source)
   at 
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2335)
   at 
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2220)
   at 
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1700(DFSClient.java:1702)
   at 
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1842)
 2008-10-26 11:54:17,282 WARN org.apache.hadoop.dfs.DFSClient: 
 NotReplicatedYetException sleeping 
 /xxx/_temporary/_task_200810232126_0001_m_33_0/part-00033 retries left 2
 2008-10-26 11:54:18,886 INFO org.apache.hadoop.dfs.DFSClient: 
 org.apache.hadoop.ipc.RemoteException: 
 org.apache.hadoop.dfs.LeaseExpiredException: No lease on 
 /xxx/_temporary/_task_200810232126_0001_m_33_0/part-00033 File does not 
 exist. [Lease.  Holder: 44 46 53 43 6c 69 65 6e 74 5f 74 61 73 6b 5f 32 30 30 
 38 31 30 32 33 32 31 32 36 5f 30 30 30 31 5f 6d 5f 30 30 30 30 33 33 5f 30, 
 heldlocks: 0, pendingcreates: 1]
   at org.apache.hadoop.dfs.FSNamesystem.checkLease(FSNamesystem.java:1194)
   at 
 org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1125)
   at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:300)
   at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
   at org.apache.hadoop.ipc.Client.call(Client.java:557)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
   at org.apache.hadoop.dfs.$Proxy1.addBlock(Unknown Source)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
   at org.apache.hadoop.dfs.$Proxy1.addBlock(Unknown Source)
   at 
 

[jira] Resolved: (HDFS-106) DataNode log message includes toString of an array

2010-09-15 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE resolved HDFS-106.
-

Resolution: Not A Problem

Thanks Rong-En for checking it.  Closing.

 DataNode log message includes toString of an array
 --

 Key: HDFS-106
 URL: https://issues.apache.org/jira/browse/HDFS-106
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Nigel Daley
Priority: Minor

 DataNode.java line 596:
 LOG.info(Starting thread to transfer block  + blocks[i] +  to  + 
 xferTargets[i]);
 xferTargets is a two dimensional array, so this line calls toString on the 
 array referenced by xferTargets[i].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1292) Allow artifacts to be published to the staging Apache Nexus Maven Repository

2010-09-15 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated HDFS-1292:


Attachment: HDFS-1292.patch

Thanks for working on this, Giri. I tried to upload some artifacts to the 
staging repository but got the following error when I tried to close the 
release:

{noformat}
Staging Signature Validation

-Missing Signature: 
'/org/apache/hadoop/hadoop-hdfs-instrumented/0.22.0/hadoop-hdfs-instrumented-0.22.0-sources.jar.asc'
 does not exist for 'hadoop-hdfs-instrumented-0.22.0-sources.jar'.
-Invalid Signature: 
'/org/apache/hadoop/hadoop-hdfs-instrumented/0.22.0/hadoop-hdfs-instrumented-0.22.0.jar.asc'
 is not a valid signature for 'hadoop-hdfs-instrumented-0.22.0.jar'.
-Missing Signature: 
'/org/apache/hadoop/hadoop-hdfs-test/0.22.0/hadoop-hdfs-test-0.22.0-sources.jar.asc'
 does not exist for 'hadoop-hdfs-test-0.22.0-sources.jar'.
-Invalid Signature: 
'/org/apache/hadoop/hadoop-hdfs-test/0.22.0/hadoop-hdfs-test-0.22.0.jar.asc' is 
not a valid signature for 'hadoop-hdfs-test-0.22.0.jar'.
-Invalid Signature: 
'/org/apache/hadoop/hadoop-hdfs/0.22.0/hadoop-hdfs-0.22.0.jar.asc' is not a 
valid signature for 'hadoop-hdfs-0.22.0.jar'.
-Missing Signature: 
'/org/apache/hadoop/hadoop-hdfs/0.22.0/hadoop-hdfs-0.22.0-sources.jar.asc' does 
not exist for 'hadoop-hdfs-0.22.0-sources.jar'.
-Invalid Signature: 
'/org/apache/hadoop/hadoop-hdfs-instrumented-test/0.22.0/hadoop-hdfs-instrumented-test-0.22.0.jar.asc'
 is not a valid signature for 'hadoop-hdfs-instrumented-test-0.22.0.jar'.
-Missing Signature: 
'/org/apache/hadoop/hadoop-hdfs-instrumented-test/0.22.0/hadoop-hdfs-instrumented-test-0.22.0-sources.jar.asc'
 does not exist for 'hadoop-hdfs-instrumented-test-0.22.0-sources.jar'.
{noformat}

I think that the sources.jar.asc is overwriting the jar.asc file. This can be 
fixed by adding a {{classifier=sources}} attribute to attach element. I made 
this change in the attached patch, and the close was successful.

 Allow artifacts to be published to the staging Apache Nexus Maven Repository
 

 Key: HDFS-1292
 URL: https://issues.apache.org/jira/browse/HDFS-1292
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Reporter: Tom White
Assignee: Giridharan Kesavan
Priority: Blocker
 Fix For: 0.21.0

 Attachments: HDFS-1292.patch, hdfs-1292.patch


 HDFS companion issue to HADOOP-6847.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1403) add -truncate option to fsck

2010-09-15 Thread sam rash (JIRA)
add -truncate option to fsck


 Key: HDFS-1403
 URL: https://issues.apache.org/jira/browse/HDFS-1403
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs client, name-node
Reporter: sam rash


When running fsck, it would be useful to be able to tell hdfs to truncate any 
corrupt file to the last valid position in the latest block.  Then, when 
running hadoop fsck, an admin can cleanup the filesystem.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1383) Better error messages on hftp

2010-09-15 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909986#action_12909986
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1383:
--

Ran unit tests.  TestFiHFlush failed.  See HDFS-1206.

 Better error messages on hftp 
 --

 Key: HDFS-1383
 URL: https://issues.apache.org/jira/browse/HDFS-1383
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 0.22.0

 Attachments: h1383_20100913_y20.patch, h1383_20100915_y20.patch, 
 h1383_20100915b.patch, h1383_20100915b_y20.patch


 If the file is not accessible, HftpFileSystem returns only a HTTP response 
 code.
 {noformat}
 2010-08-27 20:57:48,091 INFO org.apache.hadoop.tools.DistCp: FAIL README.txt 
 : java.io.IOException:
  Server returned HTTP response code: 400 for URL: 
 http:/namenode:50070/data/user/tsz/README.txt?ugi=tsz,users
 at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1290)
 at org.apache.hadoop.hdfs.HftpFileSystem.open(HftpFileSystem.java:143)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
 ...
  {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1383) Better error messages on hftp

2010-09-15 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909987#action_12909987
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1383:
--

ant test-patch
{noformat}
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 17 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 system tests framework.  The patch passed system tests 
framework compile.
{noformat}

 Better error messages on hftp 
 --

 Key: HDFS-1383
 URL: https://issues.apache.org/jira/browse/HDFS-1383
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 0.22.0

 Attachments: h1383_20100913_y20.patch, h1383_20100915_y20.patch, 
 h1383_20100915b.patch, h1383_20100915b_y20.patch


 If the file is not accessible, HftpFileSystem returns only a HTTP response 
 code.
 {noformat}
 2010-08-27 20:57:48,091 INFO org.apache.hadoop.tools.DistCp: FAIL README.txt 
 : java.io.IOException:
  Server returned HTTP response code: 400 for URL: 
 http:/namenode:50070/data/user/tsz/README.txt?ugi=tsz,users
 at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1290)
 at org.apache.hadoop.hdfs.HftpFileSystem.open(HftpFileSystem.java:143)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
 ...
  {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1383) Better error messages on hftp

2010-09-15 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909989#action_12909989
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1383:
--

Tested manually again.  It works fine.
{noformat}
[r...@yahoo-hadoop-0.20.1xx]# ./bin/hadoop fs -cat /user/tsz/r.txt
cat: org.apache.hadoop.security.AccessControlException: Permission denied: 
user=root, access=READ, inode=r.txt:tsz:supergroup:-
{noformat}


 Better error messages on hftp 
 --

 Key: HDFS-1383
 URL: https://issues.apache.org/jira/browse/HDFS-1383
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 0.22.0

 Attachments: h1383_20100913_y20.patch, h1383_20100915_y20.patch, 
 h1383_20100915b.patch, h1383_20100915b_y20.patch


 If the file is not accessible, HftpFileSystem returns only a HTTP response 
 code.
 {noformat}
 2010-08-27 20:57:48,091 INFO org.apache.hadoop.tools.DistCp: FAIL README.txt 
 : java.io.IOException:
  Server returned HTTP response code: 400 for URL: 
 http:/namenode:50070/data/user/tsz/README.txt?ugi=tsz,users
 at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1290)
 at org.apache.hadoop.hdfs.HftpFileSystem.open(HftpFileSystem.java:143)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
 ...
  {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1403) add -truncate option to fsck

2010-09-15 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910003#action_12910003
 ] 

dhruba borthakur commented on HDFS-1403:


This is especially needed when the system supports hflush. A client could issue 
a hflush, it will persist block locations in the namenode. Then the client 
could fail even before it could write any bytes to that block. In this case, 
the last block of the file will be permanently missing. It would be nice to 
have an option to fsck to delete the last block of a file if it is of size zero 
and does not have any valid replicas.

 add -truncate option to fsck
 

 Key: HDFS-1403
 URL: https://issues.apache.org/jira/browse/HDFS-1403
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs client, name-node
Reporter: sam rash

 When running fsck, it would be useful to be able to tell hdfs to truncate any 
 corrupt file to the last valid position in the latest block.  Then, when 
 running hadoop fsck, an admin can cleanup the filesystem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.