[jira] Commented: (HDFS-909) Race condition between rollEditLog or rollFSImage ant FSEditsLog.write operations corrupts edits log

2010-02-16 Thread Cosmin Lehene (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834158#action_12834158
 ] 

Cosmin Lehene commented on HDFS-909:


@Todd what's the state of this patch? This happens more often than I initially 
thought. Just hit it again.

 Race condition between rollEditLog or rollFSImage ant FSEditsLog.write 
 operations  corrupts edits log
 -

 Key: HDFS-909
 URL: https://issues.apache.org/jira/browse/HDFS-909
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0
 Environment: CentOS
Reporter: Cosmin Lehene
Assignee: Todd Lipcon
Priority: Blocker
 Fix For: 0.21.0, 0.22.0

 Attachments: hdfs-909-unittest.txt, hdfs-909.txt, hdfs-909.txt, 
 hdfs-909.txt, hdfs-909.txt


 closing the edits log file can race with write to edits log file operation 
 resulting in OP_INVALID end-of-file marker being initially overwritten by the 
 concurrent (in setReadyToFlush) threads and then removed twice from the 
 buffer, losing a good byte from edits log.
 Example:
 {code}
 FSNameSystem.rollEditLog() - FSEditLog.divertFileStreams() - 
 FSEditLog.closeStream() - EditLogOutputStream.setReadyToFlush()
 FSNameSystem.rollEditLog() - FSEditLog.divertFileStreams() - 
 FSEditLog.closeStream() - EditLogOutputStream.flush() - 
 EditLogFileOutputStream.flushAndSync()
 OR
 FSNameSystem.rollFSImage() - FSIMage.rollFSImage() - 
 FSEditLog.purgeEditLog() - FSEditLog.revertFileStreams() - 
 FSEditLog.closeStream() -EditLogOutputStream.setReadyToFlush() 
 FSNameSystem.rollFSImage() - FSIMage.rollFSImage() - 
 FSEditLog.purgeEditLog() - FSEditLog.revertFileStreams() - 
 FSEditLog.closeStream() -EditLogOutputStream.flush() - 
 EditLogFileOutputStream.flushAndSync()
 VERSUS
 FSNameSystem.completeFile - FSEditLog.logSync() - 
 EditLogOutputStream.setReadyToFlush()
 FSNameSystem.completeFile - FSEditLog.logSync() - 
 EditLogOutputStream.flush() - EditLogFileOutputStream.flushAndSync()
 OR 
 Any FSEditLog.write
 {code}
 Access on the edits flush operations is synchronized only in the 
 FSEdits.logSync() method level. However at a lower level access to 
 EditsLogOutputStream setReadyToFlush(), flush() or flushAndSync() is NOT 
 synchronized. These can be called from concurrent threads like in the example 
 above
 So if a rollEditLog or rollFSIMage is happening at the same time with a write 
 operation it can race for EditLogFileOutputStream.setReadyToFlush that will 
 overwrite the the last byte (normally the FSEditsLog.OP_INVALID which is the 
 end-of-file marker) and then remove it twice (from each thread) in 
 flushAndSync()! Hence there will be a valid byte missing from the edits log 
 that leads to a SecondaryNameNode silent failure and a full HDFS failure upon 
 cluster restart. 
 We got to this point after investigating a corrupted edits file that made 
 HDFS unable to start with 
 {code:title=namenode.log}
 java.io.IOException: Incorrect data format. logVersion is -20 but 
 writables.length is 768. 
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:450
 {code}
 EDIT: moved the logs to a comment to make this readable

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads

2010-02-16 Thread Jay Booth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834252#action_12834252
 ] 

Jay Booth commented on HDFS-918:


Yeah, I'll do my best to get benchmarks by the end of the weekend, kind of a 
crazy week this week and I moved this past weekend, so I don't have a ton of 
time.  Todd, if you feel like blasting a couple stream-of-consciousness 
comments to me via email, go right ahead, otherwise I'll run the benchmarks 
this weekend and wait for the well-written version :).  

Zlatin, I originally had a similar architecture to what you're describing, 
using a BlockingQueue to funnel the actual work to a threadpool, but I had some 
issues with being able to get the locking quite right, either I wasn't getting 
things into the queue as fast as possible, or I was burning a lot of empty 
cycles in the selector thread.  Specifically, I can't cancel a SelectionKey and 
then re-register with the same selector afterwards, it leads to exceptions, so 
my Selector thread was spinning in a tight loop verifying that, yes, all of 
these writable SelectionKeys are currently enqueued for work, whenever anything 
was being processed.  But that was a couple iterations ago, maybe I'll have 
better luck trying it now.  What we really need is a libevent-like framework, 
I'll spend a little time reviewing the outward facing API for that and 
scratching my noggin.

Ultimately, only so much I/O can actually happen at a time before the disk is 
swamped, so it might be that a set of, say, 32 selector threads gets the same 
performance as 1024 threads.  In that case, we'd be taking up fewer resources 
for the same performance.  At any rate, I need to benchmark before speculating 
further.



 Use single Selector and small thread pool to replace many instances of 
 BlockSender for reads
 

 Key: HDFS-918
 URL: https://issues.apache.org/jira/browse/HDFS-918
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Jay Booth
 Fix For: 0.22.0

 Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, 
 hdfs-918-20100211.patch, hdfs-multiplex.patch


 Currently, on read requests, the DataXCeiver server allocates a new thread 
 per request, which must allocate its own buffers and leads to 
 higher-than-optimal CPU and memory usage by the sending threads.  If we had a 
 single selector and a small threadpool to multiplex request packets, we could 
 theoretically achieve higher performance while taking up fewer resources and 
 leaving more CPU on datanodes available for mapred, hbase or whatever.  This 
 can be done without changing any wire protocols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-909) Race condition between rollEditLog or rollFSImage ant FSEditsLog.write operations corrupts edits log

2010-02-16 Thread Cosmin Lehene (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834264#action_12834264
 ] 

Cosmin Lehene commented on HDFS-909:


@Todd,

Thanks! We're using 0.21 :)

 Race condition between rollEditLog or rollFSImage ant FSEditsLog.write 
 operations  corrupts edits log
 -

 Key: HDFS-909
 URL: https://issues.apache.org/jira/browse/HDFS-909
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0
 Environment: CentOS
Reporter: Cosmin Lehene
Assignee: Todd Lipcon
Priority: Blocker
 Fix For: 0.21.0, 0.22.0

 Attachments: hdfs-909-unittest.txt, hdfs-909.txt, hdfs-909.txt, 
 hdfs-909.txt, hdfs-909.txt


 closing the edits log file can race with write to edits log file operation 
 resulting in OP_INVALID end-of-file marker being initially overwritten by the 
 concurrent (in setReadyToFlush) threads and then removed twice from the 
 buffer, losing a good byte from edits log.
 Example:
 {code}
 FSNameSystem.rollEditLog() - FSEditLog.divertFileStreams() - 
 FSEditLog.closeStream() - EditLogOutputStream.setReadyToFlush()
 FSNameSystem.rollEditLog() - FSEditLog.divertFileStreams() - 
 FSEditLog.closeStream() - EditLogOutputStream.flush() - 
 EditLogFileOutputStream.flushAndSync()
 OR
 FSNameSystem.rollFSImage() - FSIMage.rollFSImage() - 
 FSEditLog.purgeEditLog() - FSEditLog.revertFileStreams() - 
 FSEditLog.closeStream() -EditLogOutputStream.setReadyToFlush() 
 FSNameSystem.rollFSImage() - FSIMage.rollFSImage() - 
 FSEditLog.purgeEditLog() - FSEditLog.revertFileStreams() - 
 FSEditLog.closeStream() -EditLogOutputStream.flush() - 
 EditLogFileOutputStream.flushAndSync()
 VERSUS
 FSNameSystem.completeFile - FSEditLog.logSync() - 
 EditLogOutputStream.setReadyToFlush()
 FSNameSystem.completeFile - FSEditLog.logSync() - 
 EditLogOutputStream.flush() - EditLogFileOutputStream.flushAndSync()
 OR 
 Any FSEditLog.write
 {code}
 Access on the edits flush operations is synchronized only in the 
 FSEdits.logSync() method level. However at a lower level access to 
 EditsLogOutputStream setReadyToFlush(), flush() or flushAndSync() is NOT 
 synchronized. These can be called from concurrent threads like in the example 
 above
 So if a rollEditLog or rollFSIMage is happening at the same time with a write 
 operation it can race for EditLogFileOutputStream.setReadyToFlush that will 
 overwrite the the last byte (normally the FSEditsLog.OP_INVALID which is the 
 end-of-file marker) and then remove it twice (from each thread) in 
 flushAndSync()! Hence there will be a valid byte missing from the edits log 
 that leads to a SecondaryNameNode silent failure and a full HDFS failure upon 
 cluster restart. 
 We got to this point after investigating a corrupted edits file that made 
 HDFS unable to start with 
 {code:title=namenode.log}
 java.io.IOException: Incorrect data format. logVersion is -20 but 
 writables.length is 768. 
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:450
 {code}
 EDIT: moved the logs to a comment to make this readable

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads

2010-02-16 Thread Zlatin Balevsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834274#action_12834274
 ] 

Zlatin Balevsky commented on HDFS-918:
--

Jay,

the selector thread is likely busylooping because select() will return 
immediately if any channels are writable.  Cancelling takes a select() call and 
you cannot re-register the channel until the key has been properly cancelled 
and removed from the selector key sets.  It is easier to turn write interest 
off before passing the writable channel to the threadpool.  When the threadpool 
is done with transferTo(), pass the channel back to the select()-ing thread and 
instruct it to turn write interest back on.  (Do not change the interest 
outside the selecting thread.)

Hope this helps.


 Use single Selector and small thread pool to replace many instances of 
 BlockSender for reads
 

 Key: HDFS-918
 URL: https://issues.apache.org/jira/browse/HDFS-918
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Jay Booth
 Fix For: 0.22.0

 Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, 
 hdfs-918-20100211.patch, hdfs-multiplex.patch


 Currently, on read requests, the DataXCeiver server allocates a new thread 
 per request, which must allocate its own buffers and leads to 
 higher-than-optimal CPU and memory usage by the sending threads.  If we had a 
 single selector and a small threadpool to multiplex request packets, we could 
 theoretically achieve higher performance while taking up fewer resources and 
 leaving more CPU on datanodes available for mapred, hbase or whatever.  This 
 can be done without changing any wire protocols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-939) libhdfs test is broken

2010-02-16 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-939:
-

Status: Open  (was: Patch Available)

 libhdfs test is broken
 --

 Key: HDFS-939
 URL: https://issues.apache.org/jira/browse/HDFS-939
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/libhdfs
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hdfs-939-1.patch


 The libhdfs test currently does not run because hadoop.tmp.dir is specified 
 as a relative path, and it looks like a side-effect of HDFS-873 was that 
 relative paths get made absolute, so build/test/libhdfs gets turned into 
 /test/libhdfs, which the NN can not create. Let's make the test generate conf 
 files that use the appropriate directory (build/test/libhdfs) specified by 
 fully qualified URIs. 
 Also, are relative paths in conf files supported? If not rather than fail we 
 should detect this and print a warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-939) libhdfs test is broken

2010-02-16 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-939:
-

Status: Patch Available  (was: Open)

 libhdfs test is broken
 --

 Key: HDFS-939
 URL: https://issues.apache.org/jira/browse/HDFS-939
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/libhdfs
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hdfs-939-1.patch


 The libhdfs test currently does not run because hadoop.tmp.dir is specified 
 as a relative path, and it looks like a side-effect of HDFS-873 was that 
 relative paths get made absolute, so build/test/libhdfs gets turned into 
 /test/libhdfs, which the NN can not create. Let's make the test generate conf 
 files that use the appropriate directory (build/test/libhdfs) specified by 
 fully qualified URIs. 
 Also, are relative paths in conf files supported? If not rather than fail we 
 should detect this and print a warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-940) libhdfs test uses UnixUserGroupInformation

2010-02-16 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-940:
-

 Priority: Blocker  (was: Major)
Affects Version/s: 0.22.0
Fix Version/s: 0.22.0

Bumping priority. libhdfs is not contrib.

 libhdfs test uses UnixUserGroupInformation
 --

 Key: HDFS-940
 URL: https://issues.apache.org/jira/browse/HDFS-940
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/libhdfs
Affects Versions: 0.22.0
Reporter: Eli Collins
Priority: Blocker
 Fix For: 0.22.0


 The libhdfs test fails with the following, needs to be updated since 
 UnixUserGroupInformation was removed.
  [exec] failed to construct hadoop user unix group info object
  [exec] Exception in thread main java.lang.NoSuchMethodError: 
 org.apache.hadoop.security.UserGroupInformation: method init()V not found
  [exec]   at 
 org.apache.hadoop.security.UnixUserGroupInformation.init(UnixUserGroupInformation.java:69)
  [exec] Call to org/apache/hadoop/security/UnixUserGroupInformation 
 failed!
  [exec] Oops! Failed to connect to hdfs as user nobody!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-939) libhdfs test is broken

2010-02-16 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-939:
-

 Priority: Blocker  (was: Major)
Affects Version/s: 0.22.0
Fix Version/s: 0.22.0

 libhdfs test is broken
 --

 Key: HDFS-939
 URL: https://issues.apache.org/jira/browse/HDFS-939
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/libhdfs
Affects Versions: 0.22.0
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Blocker
 Fix For: 0.22.0

 Attachments: hdfs-939-1.patch


 The libhdfs test currently does not run because hadoop.tmp.dir is specified 
 as a relative path, and it looks like a side-effect of HDFS-873 was that 
 relative paths get made absolute, so build/test/libhdfs gets turned into 
 /test/libhdfs, which the NN can not create. Let's make the test generate conf 
 files that use the appropriate directory (build/test/libhdfs) specified by 
 fully qualified URIs. 
 Also, are relative paths in conf files supported? If not rather than fail we 
 should detect this and print a warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-939) libhdfs test is broken

2010-02-16 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834311#action_12834311
 ] 

Eli Collins commented on HDFS-939:
--

Could a comitter review this? I doubt this breaks a core or contrib test, the 
Hudson output doesn't give details.

 libhdfs test is broken
 --

 Key: HDFS-939
 URL: https://issues.apache.org/jira/browse/HDFS-939
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/libhdfs
Affects Versions: 0.22.0
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 0.22.0

 Attachments: hdfs-939-1.patch


 The libhdfs test currently does not run because hadoop.tmp.dir is specified 
 as a relative path, and it looks like a side-effect of HDFS-873 was that 
 relative paths get made absolute, so build/test/libhdfs gets turned into 
 /test/libhdfs, which the NN can not create. Let's make the test generate conf 
 files that use the appropriate directory (build/test/libhdfs) specified by 
 fully qualified URIs. 
 Also, are relative paths in conf files supported? If not rather than fail we 
 should detect this and print a warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads

2010-02-16 Thread Jay Booth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834321#action_12834321
 ] 

Jay Booth commented on HDFS-918:


Thanks Zlatin, I think you're right.  I'll look at finding a way to remove 
writable interest without cancelling the key, that could fix the busy looping 
issue, then I could use a condition to ensure wakeup when something is newly 
writable-interested (via completed packet or new request) and refactor back to 
a single selector thread and several executing threads.  I'll make a copy of 
the patch and try benchmarking both methods.

 Use single Selector and small thread pool to replace many instances of 
 BlockSender for reads
 

 Key: HDFS-918
 URL: https://issues.apache.org/jira/browse/HDFS-918
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Jay Booth
 Fix For: 0.22.0

 Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, 
 hdfs-918-20100211.patch, hdfs-multiplex.patch


 Currently, on read requests, the DataXCeiver server allocates a new thread 
 per request, which must allocate its own buffers and leads to 
 higher-than-optimal CPU and memory usage by the sending threads.  If we had a 
 single selector and a small threadpool to multiplex request packets, we could 
 theoretically achieve higher performance while taking up fewer resources and 
 leaving more CPU on datanodes available for mapred, hbase or whatever.  This 
 can be done without changing any wire protocols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-946) NameNode should not return full path name when lisitng a diretory or getting the status of a file

2010-02-16 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834396#action_12834396
 ] 

Suresh Srinivas commented on HDFS-946:
--

# TestDFSShell.java not sure why the methods are named starting with caps. Also 
is the change to this file needed?
# FSDirectory.createFileStatus - consider moving isDirectory check outside. 
Also current code extends beyond 80 columns.
# HDFSFileStatus 
#* consider naming it HdfsFileStatus
#* final static public should public static final
#* since this if for HDFS, comments in the code about different notions in the 
FS is not required in methods getPermission(), getOwner(), getGroup(), 
#* Some of the method parameters and other variables could be declared final
# getFulName() - without unnecessary else code is more readable. Same for 
getFullPath()


 NameNode should not return full path name when lisitng a diretory or getting 
 the status of a file
 -

 Key: HDFS-946
 URL: https://issues.apache.org/jira/browse/HDFS-946
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: HDFSFileStatus.patch, HDFSFileStatus1.patch


 FSDirectory#getListring(String src) has the following code:
   int i = 0;
   for (INode cur : contents) {
 listing[i] = createFileStatus(srcs+cur.getLocalName(), cur);
 i++;
   }
 So listing a directory will return an array of FileStatus. Each FileStatus 
 element has the full path name. This increases the return message size and 
 adds non-negligible CPU time to the operation.
 FSDirectory#getFileInfo(String) does not need to return the file name either.
 Another optimization is that in the version of FileStatus that's used in the 
 wire protocol, the field path does not need to be Path; It could be a String 
 or a byte array ideally. This could avoid unnecessary creation of the Path 
 objects at NameNode, thus help reduce the GC problem observed when a large 
 number of getFileInfo or getListing operations hit NameNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-939) libhdfs test is broken

2010-02-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834447#action_12834447
 ] 

Hadoop QA commented on HDFS-939:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12434479/hdfs-939-1.patch
  against trunk revision 908628.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 15 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/232/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/232/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/232/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/232/console

This message is automatically generated.

 libhdfs test is broken
 --

 Key: HDFS-939
 URL: https://issues.apache.org/jira/browse/HDFS-939
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/libhdfs
Affects Versions: 0.22.0
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Blocker
 Fix For: 0.22.0

 Attachments: hdfs-939-1.patch


 The libhdfs test currently does not run because hadoop.tmp.dir is specified 
 as a relative path, and it looks like a side-effect of HDFS-873 was that 
 relative paths get made absolute, so build/test/libhdfs gets turned into 
 /test/libhdfs, which the NN can not create. Let's make the test generate conf 
 files that use the appropriate directory (build/test/libhdfs) specified by 
 fully qualified URIs. 
 Also, are relative paths in conf files supported? If not rather than fail we 
 should detect this and print a warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-939) libhdfs test is broken

2010-02-16 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834465#action_12834465
 ] 

Eli Collins commented on HDFS-939:
--

The core test failure is HDFS-982 and the contrib test failure is HDFS-981.

 libhdfs test is broken
 --

 Key: HDFS-939
 URL: https://issues.apache.org/jira/browse/HDFS-939
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/libhdfs
Affects Versions: 0.22.0
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Blocker
 Fix For: 0.22.0

 Attachments: hdfs-939-1.patch


 The libhdfs test currently does not run because hadoop.tmp.dir is specified 
 as a relative path, and it looks like a side-effect of HDFS-873 was that 
 relative paths get made absolute, so build/test/libhdfs gets turned into 
 /test/libhdfs, which the NN can not create. Let's make the test generate conf 
 files that use the appropriate directory (build/test/libhdfs) specified by 
 fully qualified URIs. 
 Also, are relative paths in conf files supported? If not rather than fail we 
 should detect this and print a warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-520) Create new tests for block recovery

2010-02-16 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-520:
---

Attachment: blockRecoveryPositive.patch

This patch imlements the block recovery tests described in section 5.
1. TestBlockRecovery covers BlockRecovery_02.8 - 13.
2. TestLeaseRecovery2#testHardLeaseRecovery is a funcational test that covers 
BlockRecovery_02 (02.1-02.5,) and BlockRecovery_03, 03.1, 04.
3. I do not think BlockRecovery_02.6 is in the scope of this test.
4. BlockRecovery_01 is a negative test that will be covered in the negative 
test suite.

 Create new tests for block recovery
 ---

 Key: HDFS-520
 URL: https://issues.apache.org/jira/browse/HDFS-520
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Reporter: Konstantin Boudnik
Assignee: Konstantin Boudnik
 Attachments: blockRecoveryPositive.patch


 According to the test plan a number of new features are going to be 
 implemented as a part of this umbrella (HDFS-265) JIRA.
 These new features are have to be tested properly. Block recovery is one of 
 new functionality which require new tests to be developed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-894) DatanodeID.ipcPort is not updated when existing node re-registers

2010-02-16 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated HDFS-894:
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

I've just committed this. Thanks Todd!

 DatanodeID.ipcPort is not updated when existing node re-registers
 -

 Key: HDFS-894
 URL: https://issues.apache.org/jira/browse/HDFS-894
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1, 0.21.0, 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker
 Attachments: hdfs-894.txt


 In FSNamesystem.registerDatanode, it checks if a registering node is a 
 reregistration of an old one based on storage ID. If so, it simply updates 
 the old one with the new registration info. However, the new ipcPort is lost 
 when this happens.
 I produced manually this by setting up a DN with IPC port set to 0 (so it 
 picks an ephemeral port) and then restarting the DN. At this point, the NN's 
 view of the ipcPort is stale, and clients will not be able to achieve 
 pipeline recovery.
 This should be easy to fix and unit test, but not sure when I'll get to it, 
 so anyone else should feel free to grab it if they get to it first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-245) Create symbolic links in HDFS

2010-02-16 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-245:
-

Attachment: symlink40-hdfs.patch

Patch attached. Removes testSetTimes since it was moved to common but otherwise 
the same.

 Create symbolic links in HDFS
 -

 Key: HDFS-245
 URL: https://issues.apache.org/jira/browse/HDFS-245
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: dhruba borthakur
Assignee: Eli Collins
 Attachments: 4044_20081030spi.java, design-doc-v4.txt, 
 designdocv1.txt, designdocv2.txt, designdocv3.txt, 
 HADOOP-4044-strawman.patch, symlink-0.20.0.patch, symlink-25-hdfs.patch, 
 symlink-26-hdfs.patch, symlink-26-hdfs.patch, symLink1.patch, symLink1.patch, 
 symLink11.patch, symLink12.patch, symLink13.patch, symLink14.patch, 
 symLink15.txt, symLink15.txt, symlink16-common.patch, symlink16-hdfs.patch, 
 symlink16-mr.patch, symlink17-common.txt, symlink17-hdfs.txt, 
 symlink18-common.txt, symlink19-common-delta.patch, symlink19-common.txt, 
 symlink19-common.txt, symlink19-hdfs-delta.patch, symlink19-hdfs.txt, 
 symlink20-common.patch, symlink20-hdfs.patch, symlink21-common.patch, 
 symlink21-hdfs.patch, symlink22-common.patch, symlink22-hdfs.patch, 
 symlink23-common.patch, symlink23-hdfs.patch, symlink24-hdfs.patch, 
 symlink27-hdfs.patch, symlink28-hdfs.patch, symlink29-hdfs.patch, 
 symlink29-hdfs.patch, symlink30-hdfs.patch, symlink31-hdfs.patch, 
 symlink33-hdfs.patch, symlink35-hdfs.patch, symlink36-hdfs.patch, 
 symlink37-hdfs.patch, symlink38-hdfs.patch, symlink39-hdfs.patch, 
 symLink4.patch, symlink40-hdfs.patch, symLink5.patch, symLink6.patch, 
 symLink8.patch, symLink9.patch


 HDFS should support symbolic links. A symbolic link is a special type of file 
 that contains a reference to another file or directory in the form of an 
 absolute or relative path and that affects pathname resolution. Programs 
 which read or write to files named by a symbolic link will behave as if 
 operating directly on the target file. However, archiving utilities can 
 handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-245) Create symbolic links in HDFS

2010-02-16 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-245:
-

Status: Patch Available  (was: Open)

 Create symbolic links in HDFS
 -

 Key: HDFS-245
 URL: https://issues.apache.org/jira/browse/HDFS-245
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: dhruba borthakur
Assignee: Eli Collins
 Attachments: 4044_20081030spi.java, design-doc-v4.txt, 
 designdocv1.txt, designdocv2.txt, designdocv3.txt, 
 HADOOP-4044-strawman.patch, symlink-0.20.0.patch, symlink-25-hdfs.patch, 
 symlink-26-hdfs.patch, symlink-26-hdfs.patch, symLink1.patch, symLink1.patch, 
 symLink11.patch, symLink12.patch, symLink13.patch, symLink14.patch, 
 symLink15.txt, symLink15.txt, symlink16-common.patch, symlink16-hdfs.patch, 
 symlink16-mr.patch, symlink17-common.txt, symlink17-hdfs.txt, 
 symlink18-common.txt, symlink19-common-delta.patch, symlink19-common.txt, 
 symlink19-common.txt, symlink19-hdfs-delta.patch, symlink19-hdfs.txt, 
 symlink20-common.patch, symlink20-hdfs.patch, symlink21-common.patch, 
 symlink21-hdfs.patch, symlink22-common.patch, symlink22-hdfs.patch, 
 symlink23-common.patch, symlink23-hdfs.patch, symlink24-hdfs.patch, 
 symlink27-hdfs.patch, symlink28-hdfs.patch, symlink29-hdfs.patch, 
 symlink29-hdfs.patch, symlink30-hdfs.patch, symlink31-hdfs.patch, 
 symlink33-hdfs.patch, symlink35-hdfs.patch, symlink36-hdfs.patch, 
 symlink37-hdfs.patch, symlink38-hdfs.patch, symlink39-hdfs.patch, 
 symLink4.patch, symlink40-hdfs.patch, symLink5.patch, symLink6.patch, 
 symLink8.patch, symLink9.patch


 HDFS should support symbolic links. A symbolic link is a special type of file 
 that contains a reference to another file or directory in the form of an 
 absolute or relative path and that affects pathname resolution. Programs 
 which read or write to files named by a symbolic link will behave as if 
 operating directly on the target file. However, archiving utilities can 
 handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-894) DatanodeID.ipcPort is not updated when existing node re-registers

2010-02-16 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated HDFS-894:
---

Fix Version/s: 0.22.0

 DatanodeID.ipcPort is not updated when existing node re-registers
 -

 Key: HDFS-894
 URL: https://issues.apache.org/jira/browse/HDFS-894
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1, 0.21.0, 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker
 Fix For: 0.22.0

 Attachments: hdfs-894.txt


 In FSNamesystem.registerDatanode, it checks if a registering node is a 
 reregistration of an old one based on storage ID. If so, it simply updates 
 the old one with the new registration info. However, the new ipcPort is lost 
 when this happens.
 I produced manually this by setting up a DN with IPC port set to 0 (so it 
 picks an ephemeral port) and then restarting the DN. At this point, the NN's 
 view of the ipcPort is stale, and clients will not be able to achieve 
 pipeline recovery.
 This should be easy to fix and unit test, but not sure when I'll get to it, 
 so anyone else should feel free to grab it if they get to it first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-245) Create symbolic links in HDFS

2010-02-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834605#action_12834605
 ] 

Hadoop QA commented on HDFS-245:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12436052/symlink40-hdfs.patch
  against trunk revision 910760.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 15 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 117 release audit warnings 
(more than the trunk's current 0 warnings).

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/233/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/233/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/233/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/233/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/233/console

This message is automatically generated.

 Create symbolic links in HDFS
 -

 Key: HDFS-245
 URL: https://issues.apache.org/jira/browse/HDFS-245
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: dhruba borthakur
Assignee: Eli Collins
 Attachments: 4044_20081030spi.java, design-doc-v4.txt, 
 designdocv1.txt, designdocv2.txt, designdocv3.txt, 
 HADOOP-4044-strawman.patch, symlink-0.20.0.patch, symlink-25-hdfs.patch, 
 symlink-26-hdfs.patch, symlink-26-hdfs.patch, symLink1.patch, symLink1.patch, 
 symLink11.patch, symLink12.patch, symLink13.patch, symLink14.patch, 
 symLink15.txt, symLink15.txt, symlink16-common.patch, symlink16-hdfs.patch, 
 symlink16-mr.patch, symlink17-common.txt, symlink17-hdfs.txt, 
 symlink18-common.txt, symlink19-common-delta.patch, symlink19-common.txt, 
 symlink19-common.txt, symlink19-hdfs-delta.patch, symlink19-hdfs.txt, 
 symlink20-common.patch, symlink20-hdfs.patch, symlink21-common.patch, 
 symlink21-hdfs.patch, symlink22-common.patch, symlink22-hdfs.patch, 
 symlink23-common.patch, symlink23-hdfs.patch, symlink24-hdfs.patch, 
 symlink27-hdfs.patch, symlink28-hdfs.patch, symlink29-hdfs.patch, 
 symlink29-hdfs.patch, symlink30-hdfs.patch, symlink31-hdfs.patch, 
 symlink33-hdfs.patch, symlink35-hdfs.patch, symlink36-hdfs.patch, 
 symlink37-hdfs.patch, symlink38-hdfs.patch, symlink39-hdfs.patch, 
 symLink4.patch, symlink40-hdfs.patch, symLink5.patch, symLink6.patch, 
 symLink8.patch, symLink9.patch


 HDFS should support symbolic links. A symbolic link is a special type of file 
 that contains a reference to another file or directory in the form of an 
 absolute or relative path and that affects pathname resolution. Programs 
 which read or write to files named by a symbolic link will behave as if 
 operating directly on the target file. However, archiving utilities can 
 handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-245) Create symbolic links in HDFS

2010-02-16 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834607#action_12834607
 ] 

Eli Collins commented on HDFS-245:
--

Test failures are due to HDFS-981, HDFS-982, and errors reading partially 
download zip files.

 Create symbolic links in HDFS
 -

 Key: HDFS-245
 URL: https://issues.apache.org/jira/browse/HDFS-245
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: dhruba borthakur
Assignee: Eli Collins
 Attachments: 4044_20081030spi.java, design-doc-v4.txt, 
 designdocv1.txt, designdocv2.txt, designdocv3.txt, 
 HADOOP-4044-strawman.patch, symlink-0.20.0.patch, symlink-25-hdfs.patch, 
 symlink-26-hdfs.patch, symlink-26-hdfs.patch, symLink1.patch, symLink1.patch, 
 symLink11.patch, symLink12.patch, symLink13.patch, symLink14.patch, 
 symLink15.txt, symLink15.txt, symlink16-common.patch, symlink16-hdfs.patch, 
 symlink16-mr.patch, symlink17-common.txt, symlink17-hdfs.txt, 
 symlink18-common.txt, symlink19-common-delta.patch, symlink19-common.txt, 
 symlink19-common.txt, symlink19-hdfs-delta.patch, symlink19-hdfs.txt, 
 symlink20-common.patch, symlink20-hdfs.patch, symlink21-common.patch, 
 symlink21-hdfs.patch, symlink22-common.patch, symlink22-hdfs.patch, 
 symlink23-common.patch, symlink23-hdfs.patch, symlink24-hdfs.patch, 
 symlink27-hdfs.patch, symlink28-hdfs.patch, symlink29-hdfs.patch, 
 symlink29-hdfs.patch, symlink30-hdfs.patch, symlink31-hdfs.patch, 
 symlink33-hdfs.patch, symlink35-hdfs.patch, symlink36-hdfs.patch, 
 symlink37-hdfs.patch, symlink38-hdfs.patch, symlink39-hdfs.patch, 
 symLink4.patch, symlink40-hdfs.patch, symLink5.patch, symLink6.patch, 
 symLink8.patch, symLink9.patch


 HDFS should support symbolic links. A symbolic link is a special type of file 
 that contains a reference to another file or directory in the form of an 
 absolute or relative path and that affects pathname resolution. Programs 
 which read or write to files named by a symbolic link will behave as if 
 operating directly on the target file. However, archiving utilities can 
 handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-946) NameNode should not return full path name when lisitng a diretory or getting the status of a file

2010-02-16 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-946:
---

Attachment: HdfsFileStatus3.patch

This patch incorporates all review comments except for 2. consider move 
isDirectory outside. In addition, it
1. removes the method getPath(), which returns the byte array, from 
HdfsFileStatus. This makes the byte array immutable, which makes the change 
safer.
2. fixes a few more bugs in Jsp code which misuses getPath().
3. adds comments to INode#name that reminds that if this encoding is changed, 
the ClientProtocol is changed and the decoding of HdfsFileStatus#name should 
change too.

 NameNode should not return full path name when lisitng a diretory or getting 
 the status of a file
 -

 Key: HDFS-946
 URL: https://issues.apache.org/jira/browse/HDFS-946
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: HDFSFileStatus.patch, HDFSFileStatus1.patch, 
 HdfsFileStatus3.patch


 FSDirectory#getListring(String src) has the following code:
   int i = 0;
   for (INode cur : contents) {
 listing[i] = createFileStatus(srcs+cur.getLocalName(), cur);
 i++;
   }
 So listing a directory will return an array of FileStatus. Each FileStatus 
 element has the full path name. This increases the return message size and 
 adds non-negligible CPU time to the operation.
 FSDirectory#getFileInfo(String) does not need to return the file name either.
 Another optimization is that in the version of FileStatus that's used in the 
 wire protocol, the field path does not need to be Path; It could be a String 
 or a byte array ideally. This could avoid unnecessary creation of the Path 
 objects at NameNode, thus help reduce the GC problem observed when a large 
 number of getFileInfo or getListing operations hit NameNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-946) NameNode should not return full path name when lisitng a diretory or getting the status of a file

2010-02-16 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-946:
---

Hadoop Flags: [Incompatible change]
  Status: Patch Available  (was: Open)

 NameNode should not return full path name when lisitng a diretory or getting 
 the status of a file
 -

 Key: HDFS-946
 URL: https://issues.apache.org/jira/browse/HDFS-946
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: HDFSFileStatus.patch, HDFSFileStatus1.patch, 
 HdfsFileStatus3.patch


 FSDirectory#getListring(String src) has the following code:
   int i = 0;
   for (INode cur : contents) {
 listing[i] = createFileStatus(srcs+cur.getLocalName(), cur);
 i++;
   }
 So listing a directory will return an array of FileStatus. Each FileStatus 
 element has the full path name. This increases the return message size and 
 adds non-negligible CPU time to the operation.
 FSDirectory#getFileInfo(String) does not need to return the file name either.
 Another optimization is that in the version of FileStatus that's used in the 
 wire protocol, the field path does not need to be Path; It could be a String 
 or a byte array ideally. This could avoid unnecessary creation of the Path 
 objects at NameNode, thus help reduce the GC problem observed when a large 
 number of getFileInfo or getListing operations hit NameNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.