[jira] Commented: (HDFS-946) NameNode should not return full path name when lisitng a diretory or getting the status of a file

2010-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853868#action_12853868
 ] 

Hudson commented on HDFS-946:
-

Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #302 (See 
[http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/302/])


 NameNode should not return full path name when lisitng a diretory or getting 
 the status of a file
 -

 Key: HDFS-946
 URL: https://issues.apache.org/jira/browse/HDFS-946
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: HdfsFileStatus-yahoo20.patch, HDFSFileStatus.patch, 
 HDFSFileStatus1.patch, HdfsFileStatus3.patch, HdfsFileStatus4.patch, 
 HdfsFileStatusProxy-Yahoo20.patch


 FSDirectory#getListring(String src) has the following code:
   int i = 0;
   for (INode cur : contents) {
 listing[i] = createFileStatus(srcs+cur.getLocalName(), cur);
 i++;
   }
 So listing a directory will return an array of FileStatus. Each FileStatus 
 element has the full path name. This increases the return message size and 
 adds non-negligible CPU time to the operation.
 FSDirectory#getFileInfo(String) does not need to return the file name either.
 Another optimization is that in the version of FileStatus that's used in the 
 wire protocol, the field path does not need to be Path; It could be a String 
 or a byte array ideally. This could avoid unnecessary creation of the Path 
 objects at NameNode, thus help reduce the GC problem observed when a large 
 number of getFileInfo or getListing operations hit NameNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-946) NameNode should not return full path name when lisitng a diretory or getting the status of a file

2010-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853958#action_12853958
 ] 

Hudson commented on HDFS-946:
-

Integrated in Hadoop-Hdfs-trunk #275 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/275/])


 NameNode should not return full path name when lisitng a diretory or getting 
 the status of a file
 -

 Key: HDFS-946
 URL: https://issues.apache.org/jira/browse/HDFS-946
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: HdfsFileStatus-yahoo20.patch, HDFSFileStatus.patch, 
 HDFSFileStatus1.patch, HdfsFileStatus3.patch, HdfsFileStatus4.patch, 
 HdfsFileStatusProxy-Yahoo20.patch


 FSDirectory#getListring(String src) has the following code:
   int i = 0;
   for (INode cur : contents) {
 listing[i] = createFileStatus(srcs+cur.getLocalName(), cur);
 i++;
   }
 So listing a directory will return an array of FileStatus. Each FileStatus 
 element has the full path name. This increases the return message size and 
 adds non-negligible CPU time to the operation.
 FSDirectory#getFileInfo(String) does not need to return the file name either.
 Another optimization is that in the version of FileStatus that's used in the 
 wire protocol, the field path does not need to be Path; It could be a String 
 or a byte array ideally. This could avoid unnecessary creation of the Path 
 objects at NameNode, thus help reduce the GC problem observed when a large 
 number of getFileInfo or getListing operations hit NameNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-946) NameNode should not return full path name when lisitng a diretory or getting the status of a file

2010-04-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853659#action_12853659
 ] 

Hudson commented on HDFS-946:
-

Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #146 (See 
[http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/146/])


 NameNode should not return full path name when lisitng a diretory or getting 
 the status of a file
 -

 Key: HDFS-946
 URL: https://issues.apache.org/jira/browse/HDFS-946
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: HdfsFileStatus-yahoo20.patch, HDFSFileStatus.patch, 
 HDFSFileStatus1.patch, HdfsFileStatus3.patch, HdfsFileStatus4.patch, 
 HdfsFileStatusProxy-Yahoo20.patch


 FSDirectory#getListring(String src) has the following code:
   int i = 0;
   for (INode cur : contents) {
 listing[i] = createFileStatus(srcs+cur.getLocalName(), cur);
 i++;
   }
 So listing a directory will return an array of FileStatus. Each FileStatus 
 element has the full path name. This increases the return message size and 
 adds non-negligible CPU time to the operation.
 FSDirectory#getFileInfo(String) does not need to return the file name either.
 Another optimization is that in the version of FileStatus that's used in the 
 wire protocol, the field path does not need to be Path; It could be a String 
 or a byte array ideally. This could avoid unnecessary creation of the Path 
 objects at NameNode, thus help reduce the GC problem observed when a large 
 number of getFileInfo or getListing operations hit NameNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-946) NameNode should not return full path name when lisitng a diretory or getting the status of a file

2010-02-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836984#action_12836984
 ] 

Hudson commented on HDFS-946:
-

Integrated in Hadoop-Hdfs-trunk-Commit #197 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/197/])
. NameNode should not return full path name when lisitng a diretory or 
getting the status of a file. Contributed by Hairong Kuang.


 NameNode should not return full path name when lisitng a diretory or getting 
 the status of a file
 -

 Key: HDFS-946
 URL: https://issues.apache.org/jira/browse/HDFS-946
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: HDFSFileStatus.patch, HDFSFileStatus1.patch, 
 HdfsFileStatus3.patch, HdfsFileStatus4.patch


 FSDirectory#getListring(String src) has the following code:
   int i = 0;
   for (INode cur : contents) {
 listing[i] = createFileStatus(srcs+cur.getLocalName(), cur);
 i++;
   }
 So listing a directory will return an array of FileStatus. Each FileStatus 
 element has the full path name. This increases the return message size and 
 adds non-negligible CPU time to the operation.
 FSDirectory#getFileInfo(String) does not need to return the file name either.
 Another optimization is that in the version of FileStatus that's used in the 
 wire protocol, the field path does not need to be Path; It could be a String 
 or a byte array ideally. This could avoid unnecessary creation of the Path 
 objects at NameNode, thus help reduce the GC problem observed when a large 
 number of getFileInfo or getListing operations hit NameNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-946) NameNode should not return full path name when lisitng a diretory or getting the status of a file

2010-02-19 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835876#action_12835876
 ] 

Suresh Srinivas commented on HDFS-946:
--

+1 the patch looks good.

 NameNode should not return full path name when lisitng a diretory or getting 
 the status of a file
 -

 Key: HDFS-946
 URL: https://issues.apache.org/jira/browse/HDFS-946
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: HDFSFileStatus.patch, HDFSFileStatus1.patch, 
 HdfsFileStatus3.patch


 FSDirectory#getListring(String src) has the following code:
   int i = 0;
   for (INode cur : contents) {
 listing[i] = createFileStatus(srcs+cur.getLocalName(), cur);
 i++;
   }
 So listing a directory will return an array of FileStatus. Each FileStatus 
 element has the full path name. This increases the return message size and 
 adds non-negligible CPU time to the operation.
 FSDirectory#getFileInfo(String) does not need to return the file name either.
 Another optimization is that in the version of FileStatus that's used in the 
 wire protocol, the field path does not need to be Path; It could be a String 
 or a byte array ideally. This could avoid unnecessary creation of the Path 
 objects at NameNode, thus help reduce the GC problem observed when a large 
 number of getFileInfo or getListing operations hit NameNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-946) NameNode should not return full path name when lisitng a diretory or getting the status of a file

2010-02-16 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834396#action_12834396
 ] 

Suresh Srinivas commented on HDFS-946:
--

# TestDFSShell.java not sure why the methods are named starting with caps. Also 
is the change to this file needed?
# FSDirectory.createFileStatus - consider moving isDirectory check outside. 
Also current code extends beyond 80 columns.
# HDFSFileStatus 
#* consider naming it HdfsFileStatus
#* final static public should public static final
#* since this if for HDFS, comments in the code about different notions in the 
FS is not required in methods getPermission(), getOwner(), getGroup(), 
#* Some of the method parameters and other variables could be declared final
# getFulName() - without unnecessary else code is more readable. Same for 
getFullPath()


 NameNode should not return full path name when lisitng a diretory or getting 
 the status of a file
 -

 Key: HDFS-946
 URL: https://issues.apache.org/jira/browse/HDFS-946
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: HDFSFileStatus.patch, HDFSFileStatus1.patch


 FSDirectory#getListring(String src) has the following code:
   int i = 0;
   for (INode cur : contents) {
 listing[i] = createFileStatus(srcs+cur.getLocalName(), cur);
 i++;
   }
 So listing a directory will return an array of FileStatus. Each FileStatus 
 element has the full path name. This increases the return message size and 
 adds non-negligible CPU time to the operation.
 FSDirectory#getFileInfo(String) does not need to return the file name either.
 Another optimization is that in the version of FileStatus that's used in the 
 wire protocol, the field path does not need to be Path; It could be a String 
 or a byte array ideally. This could avoid unnecessary creation of the Path 
 objects at NameNode, thus help reduce the GC problem observed when a large 
 number of getFileInfo or getListing operations hit NameNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-946) NameNode should not return full path name when lisitng a diretory or getting the status of a file

2010-02-11 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832745#action_12832745
 ] 

Hairong Kuang commented on HDFS-946:


 If you are proposing that the object that is sent over-the-wire is different 
 from FileStatus. If so, please consider the requirement of HDFS-878 too.
This jira tries to reduce the cost of getFileInfo and listing a directory, 
where HDFS-878 adds cost to these two operations.. So I will not implement 
HDFS-878 in this jira. Since we are having so many problems with getFileInfo 
and list a directory, we should be very cautious about adding anything to 
FileStatus in hdfs unless it is absolutely necessary.

I have conducted some experiments with my patch. I write an application that 
spawns 100 threads, each of which lists a directory of size 1300 for 200 times. 
I use yourKit to profile the NameNode while the application is running. Without 
the patch, NameNode's CPU utilization is 20~26% and time spent on GC is 3~5%. 
With the patch, NameNode's CPU utilization drops to 12~17% and the time spent 
on GS is mostly 0% but occasionally becomes 1 or 2%.

 NameNode should not return full path name when lisitng a diretory or getting 
 the status of a file
 -

 Key: HDFS-946
 URL: https://issues.apache.org/jira/browse/HDFS-946
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: HDFSFileStatus.patch, HDFSFileStatus1.patch


 FSDirectory#getListring(String src) has the following code:
   int i = 0;
   for (INode cur : contents) {
 listing[i] = createFileStatus(srcs+cur.getLocalName(), cur);
 i++;
   }
 So listing a directory will return an array of FileStatus. Each FileStatus 
 element has the full path name. This increases the return message size and 
 adds non-negligible CPU time to the operation.
 FSDirectory#getFileInfo(String) does not need to return the file name either.
 Another optimization is that in the version of FileStatus that's used in the 
 wire protocol, the field path does not need to be Path; It could be a String 
 or a byte array ideally. This could avoid unnecessary creation of the Path 
 objects at NameNode, thus help reduce the GC problem observed when a large 
 number of getFileInfo or getListing operations hit NameNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-946) NameNode should not return full path name when lisitng a diretory or getting the status of a file

2010-02-04 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829564#action_12829564
 ] 

dhruba borthakur commented on HDFS-946:
---

Client's should continue to get the full path name in a FileStatus object, 
isn't it? Otherwise many many existing client applications will break.

If you are proposing that the object that is sent over-the-wire is different 
from FileStatus. If so, please consider the requirement of HDFS-878 too.

 NameNode should not return full path name when lisitng a diretory or getting 
 the status of a file
 -

 Key: HDFS-946
 URL: https://issues.apache.org/jira/browse/HDFS-946
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hairong Kuang
 Fix For: 0.22.0


 FSDirectory#getListring(String src) has the following code:
   int i = 0;
   for (INode cur : contents) {
 listing[i] = createFileStatus(srcs+cur.getLocalName(), cur);
 i++;
   }
 So listing a directory will return an array of FileStatus. Each FileStatus 
 element has the full path name. This increases the return message size and 
 adds non-negligible CPU time to the operation.
 FSDirectory#getFileInfo(String) does not need to return the file name either.
 Another optimization is that in the version of FileStatus that's used in the 
 wire protocol, the field path does not need to be Path; It could be a String 
 or a byte array ideally. This could avoid unnecessary creation of the Path 
 objects at NameNode, thus help reduce the GC problem observed when a large 
 number of getFileInfo or getListing operations hit NameNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-946) NameNode should not return full path name when lisitng a diretory or getting the status of a file

2010-02-03 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829250#action_12829250
 ] 

Doug Cutting commented on HDFS-946:
---

This sounds reasonable.  But the client would still return fully-qualified 
paths, no?

 NameNode should not return full path name when lisitng a diretory or getting 
 the status of a file
 -

 Key: HDFS-946
 URL: https://issues.apache.org/jira/browse/HDFS-946
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hairong Kuang
 Fix For: 0.22.0


 FSDirectory#getListring(String src) has the following code:
   int i = 0;
   for (INode cur : contents) {
 listing[i] = createFileStatus(srcs+cur.getLocalName(), cur);
 i++;
   }
 So listing a directory will return an array of FileStatus. Each FileStatus 
 element has the full path name. This increases the return message size and 
 adds non-negligible CPU time to the operation.
 FSDirectory#getFileInfo(String) does not need to return the file name either.
 Another optimization is that in the version of FileStatus that's used in the 
 wire protocol, the field path does not need to be Path; It could be a String 
 or a byte array ideally. This could avoid unnecessary creation of the Path 
 objects at NameNode, thus help reduce the GC problem observed when a large 
 number of getFileInfo or getListing operations hit NameNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-946) NameNode should not return full path name when lisitng a diretory or getting the status of a file

2010-02-03 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829269#action_12829269
 ] 

Hairong Kuang commented on HDFS-946:


For this jira, the client will still return fully-qualified paths. 

But I am thinking that even at the FileContext level it is not necessary to 
return fully-qualified paths. However this is a user-facing incompatible 
change. I would prefer to discuss it in a different jira. 

 NameNode should not return full path name when lisitng a diretory or getting 
 the status of a file
 -

 Key: HDFS-946
 URL: https://issues.apache.org/jira/browse/HDFS-946
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hairong Kuang
 Fix For: 0.22.0


 FSDirectory#getListring(String src) has the following code:
   int i = 0;
   for (INode cur : contents) {
 listing[i] = createFileStatus(srcs+cur.getLocalName(), cur);
 i++;
   }
 So listing a directory will return an array of FileStatus. Each FileStatus 
 element has the full path name. This increases the return message size and 
 adds non-negligible CPU time to the operation.
 FSDirectory#getFileInfo(String) does not need to return the file name either.
 Another optimization is that in the version of FileStatus that's used in the 
 wire protocol, the field path does not need to be Path; It could be a String 
 or a byte array ideally. This could avoid unnecessary creation of the Path 
 objects at NameNode, thus help reduce the GC problem observed when a large 
 number of getFileInfo or getListing operations hit NameNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.