[jira] [Commented] (HDFS-2126) Improve Namenode startup time [umbrella task]

2011-07-06 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060310#comment-13060310
 ] 

Matt Foley commented on HDFS-2126:
--

The intent is to close this umbrella Jira after HDFS-1391 and HDFS-1732 are 
committed.

Other ideas for speeding up Namenode startup have been proposed, and in some 
cases Jiras opened.  We record some of them here for historical interest, but 
they are more speculative and may be pursued under other Jiras or other 
projects:
* Fully background the FSImage writes when in Safe Mode (HDFS-1798)
* Further optimization of FSImage reads (e.g. HDFS-1366)
* Concurrent FSImage read processing, by splitting the FSImage file into 
independently-processable partitions (speculative)
* Improvements for Edits log read processing, similar to the efficiency 
improvements obtained for FSImage reads (speculative)
* Concurrent Block Report processing (e.g. HDFS-1667)
* Fully background Termination Scan (taking the improvements of HDFS-1391 to 
their maximum)


 Improve Namenode startup time [umbrella task]
 -

 Key: HDFS-2126
 URL: https://issues.apache.org/jira/browse/HDFS-2126
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.20.2
Reporter: Matt Foley

 This is an umbrella task to group the improvements in Namenode startup 
 latency made over the last few months, and track remaining ideas.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2010) Clean up and test behavior under failed edit streams

2011-07-06 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060326#comment-13060326
 ] 

Matt Foley commented on HDFS-2010:
--

Nope, lgtm!

 Clean up and test behavior under failed edit streams
 

 Key: HDFS-2010
 URL: https://issues.apache.org/jira/browse/HDFS-2010
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Aaron T. Myers
 Fix For: Edit log branch (HDFS-1073)

 Attachments: hdfs-2010.0.patch, hdfs-2010.1.patch, hdfs-2010.2.patch


 Right now there is very little test coverage of situations where one or more 
 of the edits directories fails. In trunk, the behavior when all of the edits 
 directories are dead is that the NN prints a fatal level log message and 
 calls Runtime.exit(-1).
 I don't think this is really the behavior we want. Needs a bit of thought, 
 but I think something like the following would make more sense:
 - any calls currently waiting on logSync should end up throwing an exception
 - NN should probably enter safe mode
 - ops can restore edits directories and then ask the NN to restore storage, 
 at which point it could edit safemode
 - alternatively, ops could call ask the NN to do saveNamespace and then shut 
 it down

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-503) Implement erasure coding as a layer on HDFS

2011-07-06 Thread sri (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060343#comment-13060343
 ] 

sri commented on HDFS-503:
--

I have couple of questions, 

1)With, Raid being setup, I am not able to generate DFSAdmin report (hadoop 
dfsadmin -report). Why is that so ?

2)I am not able to reduce the targetReplicationFactor to 0 (I want to run 
mapreduce where the Bloackfixer retrives the data from the raided disks) Is der 
any way to do this.

Thanks in advance

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-503) Implement erasure coding as a layer on HDFS

2011-07-06 Thread sri (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060450#comment-13060450
 ] 

sri commented on HDFS-503:
--

I would like to know, if the stripes just act as a recovery option(when other 
datanodes have failed), or can they act as input to the mapreduce jobs(to 
satisfy locality). 


 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1990) Resource leaks in HDFS

2011-07-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060540#comment-13060540
 ] 

Hudson commented on HDFS-1990:
--

Integrated in Hadoop-Hdfs-trunk #717 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/717/])
HDFS-1990. Fix  resource leaks in BlockReceiver.close().  Contributed by 
Uma Maheswara Rao G

szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1143147
Files : 
* /hadoop/common/trunk/hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hdfs/src/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java


 Resource leaks in HDFS
 --

 Key: HDFS-1990
 URL: https://issues.apache.org/jira/browse/HDFS-1990
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.23.0
Reporter: ramkrishna.s.vasudevan
Assignee: Uma Maheswara Rao G
Priority: Minor
 Fix For: 0.23.0

 Attachments: HDFS-1990.patch, HDFS-1990.patch


 Possible resource leakage in HDFS.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1753) Resource Leak in org.apache.hadoop.hdfs.server.namenode.StreamFile

2011-07-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060541#comment-13060541
 ] 

Hudson commented on HDFS-1753:
--

Integrated in Hadoop-Hdfs-trunk #717 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/717/])
HDFS-1753. Resource Leak in StreamFile. Contributed by Uma Maheswara Rao G

eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1143106
Files : 
* 
/hadoop/common/trunk/hdfs/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestStreamFile.java
* /hadoop/common/trunk/hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/StreamFile.java


 Resource Leak in org.apache.hadoop.hdfs.server.namenode.StreamFile
 --

 Key: HDFS-1753
 URL: https://issues.apache.org/jira/browse/HDFS-1753
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1, 0.23.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor
 Attachments: HDFS-1753.1.patch, HDFS-1753.2.patch, HDFS-1753.3.patch, 
 HDFS-1753.4.patch, HDFS-1753.patch


 In doGet Method, 
 final DFSInputStream in = dfs.open(filename);
 final long fileLen = in.getFileLength();
 OutputStream os = response.getOutputStream(); 
 Here this lines are present at out side of the try block.
 If response.getOutputStream() throws any exception then DFSInputStream will 
 not be closed.So, better to move response.getOutputStream() into try block.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2131) Tests for HADOOP-7361

2011-07-06 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-2131:
--

Attachment: HADOOP-7361-test.patch

 Tests for HADOOP-7361
 -

 Key: HDFS-2131
 URL: https://issues.apache.org/jira/browse/HDFS-2131
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: HADOOP-7361-test.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2131) Tests for HADOOP-7361

2011-07-06 Thread Uma Maheswara Rao G (JIRA)
Tests for HADOOP-7361
-

 Key: HDFS-2131
 URL: https://issues.apache.org/jira/browse/HDFS-2131
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: HADOOP-7361-test.patch



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2131) Tests for HADOOP-7361

2011-07-06 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-2131:
--

Status: Patch Available  (was: Open)

 Tests for HADOOP-7361
 -

 Key: HDFS-2131
 URL: https://issues.apache.org/jira/browse/HDFS-2131
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: HADOOP-7361-test.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2126) Improve Namenode startup time [umbrella task]

2011-07-06 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060681#comment-13060681
 ] 

Koji Noguchi commented on HDFS-2126:


Is it only us(Yahoo) namenodes which hits FullGC right after the handlers first 
start up?  That wastes 1 to 3 mins.  (We are giving large heap size with -Xmx 
and -Xms.)

 Improve Namenode startup time [umbrella task]
 -

 Key: HDFS-2126
 URL: https://issues.apache.org/jira/browse/HDFS-2126
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.20.2
Reporter: Matt Foley

 This is an umbrella task to group the improvements in Namenode startup 
 latency made over the last few months, and track remaining ideas.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2126) Improve Namenode startup time [umbrella task]

2011-07-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060690#comment-13060690
 ] 

Todd Lipcon commented on HDFS-2126:
---

Do you set CMSInitiatingOccupancyFraction in your GC options? I haven't noticed 
this behavior but haven't personally worked on startup time.

 Improve Namenode startup time [umbrella task]
 -

 Key: HDFS-2126
 URL: https://issues.apache.org/jira/browse/HDFS-2126
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.20.2
Reporter: Matt Foley

 This is an umbrella task to group the improvements in Namenode startup 
 latency made over the last few months, and track remaining ideas.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2131) Tests for HADOOP-7361

2011-07-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060691#comment-13060691
 ] 

Hadoop QA commented on HDFS-2131:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12485438/HADOOP-7361-test.patch
  against trunk revision 1143147.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.hdfs.server.namenode.TestBackupNode
  org.apache.hadoop.hdfs.TestDFSShell
  org.apache.hadoop.hdfs.TestSeekBug

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/879//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/879//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/879//console

This message is automatically generated.

 Tests for HADOOP-7361
 -

 Key: HDFS-2131
 URL: https://issues.apache.org/jira/browse/HDFS-2131
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: HADOOP-7361-test.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2131) Tests for HADOOP-7361

2011-07-06 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060695#comment-13060695
 ] 

Uma Maheswara Rao G commented on HDFS-2131:
---

TestDFSShell will pass after commiting the patch HADOOP-7361.
remaining failures are not related to this patch.

 Tests for HADOOP-7361
 -

 Key: HDFS-2131
 URL: https://issues.apache.org/jira/browse/HDFS-2131
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: HADOOP-7361-test.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close

2011-07-06 Thread Aaron T. Myers (JIRA)
Potential resource leak in EditLogFileOutputStream.close


 Key: HDFS-2132
 URL: https://issues.apache.org/jira/browse/HDFS-2132
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Aaron T. Myers


{{EditLogFileOutputStream.close(...)}} sequentially closes a series of 
underlying resources. If any of the calls to {{close()}} throw before the last 
one, the later resources will never be closed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close

2011-07-06 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers reassigned HDFS-2132:


Assignee: Aaron T. Myers

 Potential resource leak in EditLogFileOutputStream.close
 

 Key: HDFS-2132
 URL: https://issues.apache.org/jira/browse/HDFS-2132
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers

 {{EditLogFileOutputStream.close(...)}} sequentially closes a series of 
 underlying resources. If any of the calls to {{close()}} throw before the 
 last one, the later resources will never be closed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2131) Tests for HADOOP-7361

2011-07-06 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060738#comment-13060738
 ] 

Daryn Sharp commented on HDFS-2131:
---

These tests are probably better suited to {{TestHDFSCLI}}.  Although I'm not 
fond of the custom framework and it's weak integration with junit, testing for 
a -1 return is a feeble check since it can occur for any number of reasons.  
The {{TestHDFSCLI}} tests will let you verify that the exception output is 
correct.  Come to think of it, I'm surprised the commands aren't failing with 
exit 1... -1 is usually a usage error.



 Tests for HADOOP-7361
 -

 Key: HDFS-2131
 URL: https://issues.apache.org/jira/browse/HDFS-2131
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: HADOOP-7361-test.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-07-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060745#comment-13060745
 ] 

Todd Lipcon commented on HDFS-2011:
---

I'm working on merging this with HDFS-1073, and had one question: when do we 
expect that an editlog stream would be closed twice? In 1073 there are some 
extra asserts, so instead of ignoring the second close, it now throws 
java.io.IOException: Trying to use aborted output stream. I'm debating 
whether to remove this exception like you've done in this patch, vs remove the 
patch, since it seems like it might be indicative of a bug to close a stream 
twice.

 Removal and restoration of storage directories on checkpointing failure 
 doesn't work properly
 -

 Key: HDFS-2011
 URL: https://issues.apache.org/jira/browse/HDFS-2011
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Fix For: 0.23.0

 Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, 
 HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch, 
 HDFS-2011.patch, HDFS-2011.patch


 Removal and restoration of storage directories on checkpointing failure 
 doesn't work properly. Sometimes it throws a NullPointerException and 
 sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close

2011-07-06 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-2132:
-

Attachment: hdfs-2132.0.patch

Patch which makes sure that {{EditLogFileOutputStream.close(...)}} cleans up 
after itself.

 Potential resource leak in EditLogFileOutputStream.close
 

 Key: HDFS-2132
 URL: https://issues.apache.org/jira/browse/HDFS-2132
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Attachments: hdfs-2132.0.patch


 {{EditLogFileOutputStream.close(...)}} sequentially closes a series of 
 underlying resources. If any of the calls to {{close()}} throw before the 
 last one, the later resources will never be closed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close

2011-07-06 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-2132:
-

Fix Version/s: 0.23.0
   Status: Patch Available  (was: Open)

 Potential resource leak in EditLogFileOutputStream.close
 

 Key: HDFS-2132
 URL: https://issues.apache.org/jira/browse/HDFS-2132
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Fix For: 0.23.0

 Attachments: hdfs-2132.0.patch


 {{EditLogFileOutputStream.close(...)}} sequentially closes a series of 
 underlying resources. If any of the calls to {{close()}} throw before the 
 last one, the later resources will never be closed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-07-06 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060775#comment-13060775
 ] 

Ravi Prakash commented on HDFS-2011:


I had noticed close being called twice while testing this functionality . This 
was causing a NullPointerException the second time. The stack trace is given in 
comment 
https://issues.apache.org/jira/browse/HDFS-2011?focusedCommentId=13041858page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13041858

{quote}
2011-04-05 17:36:56,187 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
87 on 8020, call getEditLogSize() from
98.137.97.99:35862: error: java.io.IOException: java.lang.NullPointerException
java.io.IOException: java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.namenode.EditLogFileOutputStream.close(EditLogFileOutputStream.java:109)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.processIOError(FSEditLog.java:299)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.getEditLogSize(FSEditLog.java:849)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getEditLogSize(FSNamesystem.java:4270)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.getEditLogSize(NameNode.java:1095)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:346)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1399)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1395)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1094)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1393)
{quote}

The bug itself is quite hard to reproduce. I had to run my tests in an infinite 
loop and the NullPointerException happened after 3-4 hours (each run of the 
test would take 2 mins maybe). After the NullPointerException, the namenode 
would essentially be useless. Even hdfs dfs -ls would throw a 
NullPointerException.

I am not sure myself which philosophy would be better. FileOutputStream itself 
ignores a second close. I checked this with the following program

{noformat}
import java.io.*;

public class TestJAVA 
{

public static void main(String args[]) 
{
System.out.println(Hello World);
try {

FileOutputStream fos = new 
FileOutputStream(/tmp/ravi.txt);
fos.write(50);
fos.write(50);
fos.write(50);
fos.write(50);
fos.write(50);
fos.write(50);
fos.close();
fos.close();
} catch (IOException ioe) {
System.out.println(Hello California);
System.out.println (ioe);
}
System.out.println(Hello Champaign);

}

}
{noformat}

 Removal and restoration of storage directories on checkpointing failure 
 doesn't work properly
 -

 Key: HDFS-2011
 URL: https://issues.apache.org/jira/browse/HDFS-2011
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Fix For: 0.23.0

 Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, 
 HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch, 
 HDFS-2011.patch, HDFS-2011.patch


 Removal and restoration of storage directories on checkpointing failure 
 doesn't work properly. Sometimes it throws a NullPointerException and 
 sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-07-06 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060777#comment-13060777
 ] 

Ravi Prakash commented on HDFS-2011:


The program above output
{noformat}
Hello World
Hello Champaign
{noformat}

 Removal and restoration of storage directories on checkpointing failure 
 doesn't work properly
 -

 Key: HDFS-2011
 URL: https://issues.apache.org/jira/browse/HDFS-2011
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Fix For: 0.23.0

 Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, 
 HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch, 
 HDFS-2011.patch, HDFS-2011.patch


 Removal and restoration of storage directories on checkpointing failure 
 doesn't work properly. Sometimes it throws a NullPointerException and 
 sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-07-06 Thread John George (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060780#comment-13060780
 ] 

John George commented on HDFS-2011:
---

If I remember right, it was a case of an incomplete create as opposed to 
close being called twice. So, the close() was being called on a stream that was 
not really created...

 Removal and restoration of storage directories on checkpointing failure 
 doesn't work properly
 -

 Key: HDFS-2011
 URL: https://issues.apache.org/jira/browse/HDFS-2011
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Fix For: 0.23.0

 Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, 
 HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch, 
 HDFS-2011.patch, HDFS-2011.patch


 Removal and restoration of storage directories on checkpointing failure 
 doesn't work properly. Sometimes it throws a NullPointerException and 
 sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close

2011-07-06 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060793#comment-13060793
 ] 

Ravi Prakash commented on HDFS-2132:


I am new to Hadoop so please forgive me if I do not understand the philosophies 
behind this patch. If any of the close methods fail, they will throw an 
IOException which will be propagated up the stack. Isn't this the way all JAVA 
works?
Comments on your patch
1. In normal operation all close methods within the try will be called once, 
and then once again in the IOUtils.cleanup method. What purpose does this 
serve? I would rather the methods be called only once. 
2. In the finally block, all IOExceptions which might have been thrown are 
logged, and then programmatically swallowed. The upstream functions are never 
made aware of these IOExceptions and I am not sure this is the right behavior. 

 Potential resource leak in EditLogFileOutputStream.close
 

 Key: HDFS-2132
 URL: https://issues.apache.org/jira/browse/HDFS-2132
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Fix For: 0.23.0

 Attachments: hdfs-2132.0.patch


 {{EditLogFileOutputStream.close(...)}} sequentially closes a series of 
 underlying resources. If any of the calls to {{close()}} throw before the 
 last one, the later resources will never be closed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close

2011-07-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060794#comment-13060794
 ] 

Hadoop QA commented on HDFS-2132:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12485477/hdfs-2132.0.patch
  against trunk revision 1143147.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/880//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/880//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/880//console

This message is automatically generated.

 Potential resource leak in EditLogFileOutputStream.close
 

 Key: HDFS-2132
 URL: https://issues.apache.org/jira/browse/HDFS-2132
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Fix For: 0.23.0

 Attachments: hdfs-2132.0.patch


 {{EditLogFileOutputStream.close(...)}} sequentially closes a series of 
 underlying resources. If any of the calls to {{close()}} throw before the 
 last one, the later resources will never be closed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close

2011-07-06 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-2132:
-

Attachment: hdfs-2132.1.patch

Whoops, uploaded the wrong patch. Here's one with tests.

 Potential resource leak in EditLogFileOutputStream.close
 

 Key: HDFS-2132
 URL: https://issues.apache.org/jira/browse/HDFS-2132
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Fix For: 0.23.0

 Attachments: hdfs-2132.0.patch, hdfs-2132.1.patch


 {{EditLogFileOutputStream.close(...)}} sequentially closes a series of 
 underlying resources. If any of the calls to {{close()}} throw before the 
 last one, the later resources will never be closed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close

2011-07-06 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060806#comment-13060806
 ] 

Aaron T. Myers commented on HDFS-2132:
--

bq. I am new to Hadoop so please forgive me if I do not understand the 
philosophies behind this patch. If any of the close methods fail, they will 
throw an IOException which will be propagated up the stack. Isn't this the way 
all JAVA works?

This is indeed the way it works and is the desired behavior. The point of this 
patch is that when a close fails for any one of the {{Closeables}}, we should 
still make a last-ditch effort to close the others. If we can't close them 
then, there's nothing we can do.

bq. 1. In normal operation all close methods within the try will be called 
once, and then once again in the IOUtils.cleanup method. What purpose does this 
serve? I would rather the methods be called only once. 

In the normal case all of the {{Closeables}} will be set to {{null}}. Note that 
{{IOUtils.cleanup(...)}} expressly handles {{nulls}}, and will not attempt to 
call {{close()}} again.

bq. 2. In the finally block, all IOExceptions which might have been thrown are 
logged, and then programmatically swallowed. The upstream functions are never 
made aware of these IOExceptions and I am not sure this is the right behavior.

It's true that in the exceptional case any failures to call {{close()}} in 
{{IOUtils.cleanup(...)}} will be logged and not propagated. This is exactly the 
intended behavior. Note that the original exception caused by the call to 
{{close()}} outside of {{IOUtils.cleanup(...)}} will still be propagated up.

 Potential resource leak in EditLogFileOutputStream.close
 

 Key: HDFS-2132
 URL: https://issues.apache.org/jira/browse/HDFS-2132
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Fix For: 0.23.0

 Attachments: hdfs-2132.0.patch, hdfs-2132.1.patch


 {{EditLogFileOutputStream.close(...)}} sequentially closes a series of 
 underlying resources. If any of the calls to {{close()}} throw before the 
 last one, the later resources will never be closed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-07-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060812#comment-13060812
 ] 

Todd Lipcon commented on HDFS-2011:
---

In the HDFS-1073 branch, EditLogOutputStream now has separate close() and 
abort() methods. abort() is used when there has been some error on the stream 
and we expect to do an unclean close (ie without flushing). close() is used 
for clean closes. If close() itself fails, it will then proceed to abort() when 
the IO error is handled.

So, I think the correct test case on the branch is to call abort() twice and 
make sure that's ignored, or call close() and then abort() to make sure that's 
ignored. Does that sound reasonable?

 Removal and restoration of storage directories on checkpointing failure 
 doesn't work properly
 -

 Key: HDFS-2011
 URL: https://issues.apache.org/jira/browse/HDFS-2011
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Fix For: 0.23.0

 Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, 
 HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch, 
 HDFS-2011.patch, HDFS-2011.patch


 Removal and restoration of storage directories on checkpointing failure 
 doesn't work properly. Sometimes it throws a NullPointerException and 
 sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2018) Move all journal stream management code into one place

2011-07-06 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060822#comment-13060822
 ] 

Jitendra Nath Pandey commented on HDFS-2018:


Some comments:
1. FileJournalManager.java
  getMaxLoadableTransaction can be made private.
2. JournalManager interface
   Instead of adding archiveLogsOlderThan to the interface, we could add 
purgeTransactions method (as in HDFS-1580 design). FileJournalManager could 
implement purgeTransactions using archives, i.e. instead of really deleting, it 
archives them. Since Checkpoints are being archived, we don't need to force any 
JournalManager to archive edit logs as well.

Apart from the above, the patch looks good to me. +1


 Move all journal stream management code into one place
 --

 Key: HDFS-2018
 URL: https://issues.apache.org/jira/browse/HDFS-2018
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ivan Kelly
Assignee: Ivan Kelly
 Fix For: Edit log branch (HDFS-1073)

 Attachments: HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, 
 HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff


 Currently in the HDFS-1073 branch, the code for creating output streams is in 
 FileJournalManager and the code for input streams is in the inspectors. This 
 change does a number of things.
   - Input and Output streams are now created by the JournalManager.
   - FSImageStorageInspectors now deals with URIs when referring to edit logs
   - Recovery of inprogress logs is performed by counting the number of 
 transactions instead of looking at the length of the file.
 The patch for this applies on top of the HDFS-1073 branch + HDFS-2003 patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close

2011-07-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060823#comment-13060823
 ] 

Todd Lipcon commented on HDFS-2132:
---

hrm, it doesn't look like bufReady should ever throw an IOE on close, right? 
it's just a memory buffer.

But, fc.truncate() might throw an IOE - that seems like the more realistic case 
to worry about. Maybe that would be a better fault to inject for the test?

 Potential resource leak in EditLogFileOutputStream.close
 

 Key: HDFS-2132
 URL: https://issues.apache.org/jira/browse/HDFS-2132
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Fix For: 0.23.0

 Attachments: hdfs-2132.0.patch, hdfs-2132.1.patch


 {{EditLogFileOutputStream.close(...)}} sequentially closes a series of 
 underlying resources. If any of the calls to {{close()}} throw before the 
 last one, the later resources will never be closed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close

2011-07-06 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-2132:
-

Attachment: hdfs-2132.2.patch

Thanks a lot for the review, Todd. Here's a patch which addresses your comment.

 Potential resource leak in EditLogFileOutputStream.close
 

 Key: HDFS-2132
 URL: https://issues.apache.org/jira/browse/HDFS-2132
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Fix For: 0.23.0

 Attachments: hdfs-2132.0.patch, hdfs-2132.1.patch, hdfs-2132.2.patch


 {{EditLogFileOutputStream.close(...)}} sequentially closes a series of 
 underlying resources. If any of the calls to {{close()}} throw before the 
 last one, the later resources will never be closed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close

2011-07-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060847#comment-13060847
 ] 

Hadoop QA commented on HDFS-2132:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12485480/hdfs-2132.1.patch
  against trunk revision 1143147.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/881//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/881//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/881//console

This message is automatically generated.

 Potential resource leak in EditLogFileOutputStream.close
 

 Key: HDFS-2132
 URL: https://issues.apache.org/jira/browse/HDFS-2132
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Fix For: 0.23.0

 Attachments: hdfs-2132.0.patch, hdfs-2132.1.patch, hdfs-2132.2.patch


 {{EditLogFileOutputStream.close(...)}} sequentially closes a series of 
 underlying resources. If any of the calls to {{close()}} throw before the 
 last one, the later resources will never be closed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-07-06 Thread John George (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060853#comment-13060853
 ] 

John George commented on HDFS-2011:
---

I think calling
1. abort() twice
2. close() twice
3. close() followed by an abort()

would test most cases.

 Removal and restoration of storage directories on checkpointing failure 
 doesn't work properly
 -

 Key: HDFS-2011
 URL: https://issues.apache.org/jira/browse/HDFS-2011
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Fix For: 0.23.0

 Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, 
 HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch, 
 HDFS-2011.patch, HDFS-2011.patch


 Removal and restoration of storage directories on checkpointing failure 
 doesn't work properly. Sometimes it throws a NullPointerException and 
 sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1977) Stop using StringUtils.stringifyException()

2011-07-06 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1977:
-

Attachment: HDFS-1977-2.patch

Things changed since last post, reattaching with new changes.

 Stop using StringUtils.stringifyException()
 ---

 Key: HDFS-1977
 URL: https://issues.apache.org/jira/browse/HDFS-1977
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Joey Echeverria
Assignee: Bharath Mundlapudi
Priority: Minor
 Attachments: HDFS-1977-1.patch, HDFS-1977-2.patch


 The old version of the logging APIs didn't support logging stack traces by 
 passing exceptions to the logging methods (e.g. Log.error()). A number of log 
 statements make use of StringUtils.stringifyException() to get around the old 
 behavior. It would be nice if this could get cleaned up to make use of the 
 the logger's stack trace printing. This also gives users more control since 
 you can configure how the stack traces are written to the logs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1979) HDFS-1073: Fix backupnode for new edits/image layout

2011-07-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1979:
--

Attachment: hdfs-1979.txt

Cleaned up the patch. I think this should be ready to go. Here's a summary of 
some of the changes to make the rather large patch easier to follow:

BackupNode itself:
- no longer uses the spool file. Instead, the state tracks whether the BN is 
in sync or journaling only. In essence, the next log segment is used as the 
spool file.
- lots of refactoring so that checkpoint code is primarily shared with the 
SecondaryNameNode. We could pull this into a new CheckpointUtils class or 
something, but didn't want to make that change here since it would make the 
patch even larger.
- moved the BN-specific RPCs into a new BackupNodeProtocol instead of sharing 
NameNodeProtocol. This makes sense since the NN as is was just throwing 
exceptions on those calls.
- split the BN RPCs into several pieces, rather than using journal() for 
everything. This makes the API easier to follow
- fixed bugs where the NN would send uncheckpointed txns to the BackupNode (BN 
is currently non-functional in trunk)

EditLog:
- added new BackupJournalManager to coordinate talking to BN
- added new parameter to start/end log segment about whether to include the 
special START/END transactions. This was necessary since the BN will receive 
these replicated from the NN, and thus shouldn't add its own in addition to 
what the NN wrote.

BackupImage/FSImage:
- new concept of lastAppliedTxId which tracks the latest txnid reflected by 
the namesystem. Some refactoring done so that this is properly tracked during 
image loading, etc. We used to simply use the edit log's last written txid 
for this, but in the case of the BN the edit log may be writing ahead of where 
the NS actually reflects.

Storage inspector:
- refactored out the planning of loading logs from which image. This will 
probably get changed again by the work in HDFS-1579, but this was the minimal 
change to get this working. Used when the BN is synchronizing with the NN.

Tests:
- added new test for the BN that makes sure it can stay in sync with the NN, 
replicates edits identically, etc.
- split CN test and BN tests into separate methods to be easier to run just one
- removed testBackupRegistration since we no longer have to enforce 
only-one-backupnode

 HDFS-1073: Fix backupnode for new edits/image layout
 

 Key: HDFS-1979
 URL: https://issues.apache.org/jira/browse/HDFS-1979
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-1979-prelim.txt, hdfs-1979.txt




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close

2011-07-06 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060872#comment-13060872
 ] 

Ravi Prakash commented on HDFS-2132:


Thanks Aaron for the explanation! :) I agree. 

I might be missing a trick (again ;-) ) , but are you sure the Closeables will 
be null after .close()? Won't they be references pointing to a closed stream) 
and so close() will be called twice on them. I don't see an easy way to avoid 
that though. So cool.

 Potential resource leak in EditLogFileOutputStream.close
 

 Key: HDFS-2132
 URL: https://issues.apache.org/jira/browse/HDFS-2132
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Fix For: 0.23.0

 Attachments: hdfs-2132.0.patch, hdfs-2132.1.patch, hdfs-2132.2.patch


 {{EditLogFileOutputStream.close(...)}} sequentially closes a series of 
 underlying resources. If any of the calls to {{close()}} throw before the 
 last one, the later resources will never be closed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close

2011-07-06 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060877#comment-13060877
 ] 

Aaron T. Myers commented on HDFS-2132:
--

bq. Thanks Aaron for the explanation!

No problem.

bq. I might be missing a trick (again  ) , but are you sure the Closeables will 
be null after .close()?

Well, now that I look at it, you've effectively caught a bug. :)

The previous code was expressly setting {{bufReady}} and {{bufCurrent}} to 
{{null}}, but not {{fp}} or {{fc}}. My patch didn't touch that code, but it 
might as well fix it. I'll upload another patch in a moment.

 Potential resource leak in EditLogFileOutputStream.close
 

 Key: HDFS-2132
 URL: https://issues.apache.org/jira/browse/HDFS-2132
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Fix For: 0.23.0

 Attachments: hdfs-2132.0.patch, hdfs-2132.1.patch, hdfs-2132.2.patch


 {{EditLogFileOutputStream.close(...)}} sequentially closes a series of 
 underlying resources. If any of the calls to {{close()}} throw before the 
 last one, the later resources will never be closed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close

2011-07-06 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-2132:
-

Attachment: hdfs-2132.3.patch

Patch which addresses the issue Ravi pointed out.

 Potential resource leak in EditLogFileOutputStream.close
 

 Key: HDFS-2132
 URL: https://issues.apache.org/jira/browse/HDFS-2132
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Fix For: 0.23.0

 Attachments: hdfs-2132.0.patch, hdfs-2132.1.patch, hdfs-2132.2.patch, 
 hdfs-2132.3.patch


 {{EditLogFileOutputStream.close(...)}} sequentially closes a series of 
 underlying resources. If any of the calls to {{close()}} throw before the 
 last one, the later resources will never be closed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2065) Fix NPE in DFSClient.getFileChecksum

2011-07-06 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-2065:
-

Hadoop Flags: [Reviewed]
  Status: Patch Available  (was: Open)

+1 patch looks good.

 Fix NPE in DFSClient.getFileChecksum
 

 Key: HDFS-2065
 URL: https://issues.apache.org/jira/browse/HDFS-2065
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0

 Attachments: HDFS-2065-1.patch


 The following code can throw NPE if callGetBlockLocations returns null.
 If server returns null 
 {code}
 ListLocatedBlock locatedblocks
 = callGetBlockLocations(namenode, src, 0, 
 Long.MAX_VALUE).getLocatedBlocks();
 {code}
 The right fix for this is server should throw right exception.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1979) HDFS-1073: Fix backupnode for new edits/image layout

2011-07-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1979:
--

Attachment: hdfs-1979.txt

Slight update to patch: I had forgotten to implement releaseBackupStreams 
properly.

 HDFS-1073: Fix backupnode for new edits/image layout
 

 Key: HDFS-1979
 URL: https://issues.apache.org/jira/browse/HDFS-1979
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-1979-prelim.txt, hdfs-1979.txt, hdfs-1979.txt




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1977) Stop using StringUtils.stringifyException()

2011-07-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060904#comment-13060904
 ] 

Hadoop QA commented on HDFS-1977:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12485490/HDFS-1977-2.patch
  against trunk revision 1143147.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/884//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/884//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/884//console

This message is automatically generated.

 Stop using StringUtils.stringifyException()
 ---

 Key: HDFS-1977
 URL: https://issues.apache.org/jira/browse/HDFS-1977
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Joey Echeverria
Assignee: Bharath Mundlapudi
Priority: Minor
 Attachments: HDFS-1977-1.patch, HDFS-1977-2.patch


 The old version of the logging APIs didn't support logging stack traces by 
 passing exceptions to the logging methods (e.g. Log.error()). A number of log 
 statements make use of StringUtils.stringifyException() to get around the old 
 behavior. It would be nice if this could get cleaned up to make use of the 
 the logger's stack trace printing. This also gives users more control since 
 you can configure how the stack traces are written to the logs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2128) Support for pluggable Trash policies

2011-07-06 Thread Usman Masood (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060909#comment-13060909
 ] 

Usman Masood commented on HDFS-2128:


The issue is to choose the right interface for pluggable Trash modules. 
Currently the public methods are:
- moveToTrash(..)
- getEmptier()
- checkpoint()
- expunge()

The first two methods should be part of the Trash interface, but I'm not sure 
about the last two. Not every Trash policy should be required to implement a 
checkpoint mechanism.

Currently expunge() and checkpoint() are used by FsShell for the -expunge 
arg. 

 Support for pluggable Trash policies
 

 Key: HDFS-2128
 URL: https://issues.apache.org/jira/browse/HDFS-2128
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: dhruba borthakur
Assignee: dhruba borthakur

 It would be beneficial to make the Trash policy pluggable. One primary 
 use-case for this is to archive files (in some remote store) when they get 
 removed by Trash emptier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1977) Stop using StringUtils.stringifyException()

2011-07-06 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060910#comment-13060910
 ] 

Bharath Mundlapudi commented on HDFS-1977:
--

This patch doesn't include unit tests, since its just adopting to new logging 
api. No new tests are required.

 Stop using StringUtils.stringifyException()
 ---

 Key: HDFS-1977
 URL: https://issues.apache.org/jira/browse/HDFS-1977
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Joey Echeverria
Assignee: Bharath Mundlapudi
Priority: Minor
 Attachments: HDFS-1977-1.patch, HDFS-1977-2.patch


 The old version of the logging APIs didn't support logging stack traces by 
 passing exceptions to the logging methods (e.g. Log.error()). A number of log 
 statements make use of StringUtils.stringifyException() to get around the old 
 behavior. It would be nice if this could get cleaned up to make use of the 
 the logger's stack trace printing. This also gives users more control since 
 you can configure how the stack traces are written to the logs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1780) reduce need to rewrite fsimage on statrtup

2011-07-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1780:
--

Attachment: hdfs-1780.txt

Here's a patch on the 1073 branch which implements this.

 reduce need to rewrite fsimage on statrtup
 --

 Key: HDFS-1780
 URL: https://issues.apache.org/jira/browse/HDFS-1780
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Daryn Sharp
 Attachments: hdfs-1780.txt


 On startup, the namenode will read the fs image, apply edits, then rewrite 
 the fs image.  This requires a non-trivial amount of time for very large 
 directory structures.  Perhaps the namenode should employ some logic to 
 decide that the edits are simple enough that it doesn't warrant rewriting the 
 image back out to disk.
 A few ideas:
 Use the size of the edit logs, if the size is below a threshold, assume it's 
 cheaper to reprocess the edit log instead of writing the image back out.
 Time the processing of the edits and if the time is below a defined 
 threshold, the image isn't rewritten.
 Timing the reading of the image, and the processing of the edits.  Base the 
 decision on the time it would take to write the image (a multiplier is 
 applied to the read time?) versus the time it would take to reprocess the 
 edits.  If a certain threshold (perhaps percentage or expected time to 
 rewrite) is exceeded, rewrite the image.
 Somethingalong the lines of the last suggestion may allow for defaults that 
 adapt for any size cluster, thus eliminating the need to keep tweaking a 
 cluster's settings based on its size.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1780) reduce need to rewrite fsimage on statrtup

2011-07-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1780:
--

Affects Version/s: Edit log branch (HDFS-1073)
Fix Version/s: Edit log branch (HDFS-1073)
 Assignee: Todd Lipcon

 reduce need to rewrite fsimage on statrtup
 --

 Key: HDFS-1780
 URL: https://issues.apache.org/jira/browse/HDFS-1780
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Daryn Sharp
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)

 Attachments: hdfs-1780.txt


 On startup, the namenode will read the fs image, apply edits, then rewrite 
 the fs image.  This requires a non-trivial amount of time for very large 
 directory structures.  Perhaps the namenode should employ some logic to 
 decide that the edits are simple enough that it doesn't warrant rewriting the 
 image back out to disk.
 A few ideas:
 Use the size of the edit logs, if the size is below a threshold, assume it's 
 cheaper to reprocess the edit log instead of writing the image back out.
 Time the processing of the edits and if the time is below a defined 
 threshold, the image isn't rewritten.
 Timing the reading of the image, and the processing of the edits.  Base the 
 decision on the time it would take to write the image (a multiplier is 
 applied to the read time?) versus the time it would take to reprocess the 
 edits.  If a certain threshold (perhaps percentage or expected time to 
 rewrite) is exceeded, rewrite the image.
 Somethingalong the lines of the last suggestion may allow for defaults that 
 adapt for any size cluster, thus eliminating the need to keep tweaking a 
 cluster's settings based on its size.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close

2011-07-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060938#comment-13060938
 ] 

Hadoop QA commented on HDFS-2132:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12485485/hdfs-2132.2.patch
  against trunk revision 1143147.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:


-1 contrib tests.  The patch failed contrib unit tests.

-1 system test framework.  The patch failed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/882//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/882//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/882//console

This message is automatically generated.

 Potential resource leak in EditLogFileOutputStream.close
 

 Key: HDFS-2132
 URL: https://issues.apache.org/jira/browse/HDFS-2132
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Fix For: 0.23.0

 Attachments: hdfs-2132.0.patch, hdfs-2132.1.patch, hdfs-2132.2.patch, 
 hdfs-2132.3.patch


 {{EditLogFileOutputStream.close(...)}} sequentially closes a series of 
 underlying resources. If any of the calls to {{close()}} throw before the 
 last one, the later resources will never be closed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2065) Fix NPE in DFSClient.getFileChecksum

2011-07-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060939#comment-13060939
 ] 

Hadoop QA commented on HDFS-2065:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12483562/HDFS-2065-1.patch
  against trunk revision 1143147.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:


-1 contrib tests.  The patch failed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/886//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/886//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/886//console

This message is automatically generated.

 Fix NPE in DFSClient.getFileChecksum
 

 Key: HDFS-2065
 URL: https://issues.apache.org/jira/browse/HDFS-2065
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0

 Attachments: HDFS-2065-1.patch


 The following code can throw NPE if callGetBlockLocations returns null.
 If server returns null 
 {code}
 ListLocatedBlock locatedblocks
 = callGetBlockLocations(namenode, src, 0, 
 Long.MAX_VALUE).getLocatedBlocks();
 {code}
 The right fix for this is server should throw right exception.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1896) Additional QA tasks for Edit Log branch

2011-07-06 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060949#comment-13060949
 ] 

Aaron T. Myers commented on HDFS-1896:
--

I just took a manual pass through the commits to trunk which have occurred 
since the branch was created for HDFS-1073. Here's the list of JIRAs which I 
think may be relevant:

* HDFS-2011
* HDFS-1955
* HDFS-988
* HDFS-2041
* HDFS-2030
* HDFS-2003
* HDFS-1948
* HDFS-1149
* HDFS-1969
* HDFS-1636
* HDFS-1936

I'm going to try to manually verify that these fixes did not regress in the 
HDFS-1073 branch.

 Additional QA tasks for Edit Log branch
 ---

 Key: HDFS-1896
 URL: https://issues.apache.org/jira/browse/HDFS-1896
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)


 As we close out tasks in the HDFS-1073 branch, there are a few places where 
 I've noticed that we lack some test coverage. Creating this ticket just as a 
 place to jot down some notes on things that we ought to make sure are tested, 
 preferably by automated (unit) tests.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1780) reduce need to rewrite fsimage on statrtup

2011-07-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1780:
--

Attachment: hdfs-1780.txt

One small fix -- needed to fix 
FSImageTransactionalStorageInspector.needToSave() so that it forces a save if 
any of the image directories are missing VERSION file. This bug was causing 
TestNameEditsConfigs to fail.

 reduce need to rewrite fsimage on statrtup
 --

 Key: HDFS-1780
 URL: https://issues.apache.org/jira/browse/HDFS-1780
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Daryn Sharp
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)

 Attachments: hdfs-1780.txt, hdfs-1780.txt


 On startup, the namenode will read the fs image, apply edits, then rewrite 
 the fs image.  This requires a non-trivial amount of time for very large 
 directory structures.  Perhaps the namenode should employ some logic to 
 decide that the edits are simple enough that it doesn't warrant rewriting the 
 image back out to disk.
 A few ideas:
 Use the size of the edit logs, if the size is below a threshold, assume it's 
 cheaper to reprocess the edit log instead of writing the image back out.
 Time the processing of the edits and if the time is below a defined 
 threshold, the image isn't rewritten.
 Timing the reading of the image, and the processing of the edits.  Base the 
 decision on the time it would take to write the image (a multiplier is 
 applied to the read time?) versus the time it would take to reprocess the 
 edits.  If a certain threshold (perhaps percentage or expected time to 
 rewrite) is exceeded, rewrite the image.
 Somethingalong the lines of the last suggestion may allow for defaults that 
 adapt for any size cluster, thus eliminating the need to keep tweaking a 
 cluster's settings based on its size.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-07-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2011:
--

Attachment: elfos-close-patch-on-1073.txt

Here's the patch I'm planning to commit to 1073 branch. Look good?

I will also do some stress testing similar to what Ravi described on the branch 
to see if I can reproduce the issue he saw.

 Removal and restoration of storage directories on checkpointing failure 
 doesn't work properly
 -

 Key: HDFS-2011
 URL: https://issues.apache.org/jira/browse/HDFS-2011
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Fix For: 0.23.0

 Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, 
 HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch, 
 HDFS-2011.patch, HDFS-2011.patch, elfos-close-patch-on-1073.txt


 Removal and restoration of storage directories on checkpointing failure 
 doesn't work properly. Sometimes it throws a NullPointerException and 
 sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2133) 1073: address remaining TODOs and pre-merge cleanup

2011-07-06 Thread Todd Lipcon (JIRA)
1073: address remaining TODOs and pre-merge cleanup
---

 Key: HDFS-2133
 URL: https://issues.apache.org/jira/browse/HDFS-2133
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Todd Lipcon
Assignee: Todd Lipcon




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2133) 1073: address remaining TODOs and pre-merge cleanup

2011-07-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2133:
--

  Component/s: name-node
  Description: There are a few TODOs still in the code and a bit of 
cleanup to be done before merging HDFS-1073. This JIRA is for this misc cleanup.
Affects Version/s: Edit log branch (HDFS-1073)
Fix Version/s: Edit log branch (HDFS-1073)

 1073: address remaining TODOs and pre-merge cleanup
 ---

 Key: HDFS-2133
 URL: https://issues.apache.org/jira/browse/HDFS-2133
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)


 There are a few TODOs still in the code and a bit of cleanup to be done 
 before merging HDFS-1073. This JIRA is for this misc cleanup.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2133) 1073: address remaining TODOs and pre-merge cleanup

2011-07-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2133:
--

Attachment: hdfs-2133.txt

Patch addresses the following:



- removes an extra setReadyToFlush/flush call in 
EditLogFileOutputStream.close. This snuck in when we did some refactoring, but 
seems unnecessary, since we always flush before closing a stream anyway. Tests 
seem to be passing even when I remove this (and the next bit of that same 
function already verifies that there isn't any unflushed data in the buffer)

{code:title=FSImage.java}
-// TODO need to discuss what the correct logic is for determing which
-// storage directory to read properties from
 sdForProperties.read();
{code}
This TODO is invalid -- when inspecting the dirs at startup, we already call 
{{read()}} for each directory. That means that we've verified that they all 
contain the same data. Since VERSION files are now just namespace info, and 
nothing related to checkpoint times or versions, it doesn't matter which one we 
read() from here.

{code:title=FSImage.java}
-  storage.writeAll(); // TODO is this a good spot for this?
-  
+  storage.writeAll();
{code}
Yes, I think it's a good spot :) Eli had commented that he agreed in an earlier 
code review, but I hadn't removed it at that point. This {{writeAll}} call is 
necessary when adding new directories to a NN, for example.

- various changes to remove checkpointTxId from NameNodeRegistration and 
CheckpointCommand. A checkpoint txid is no longer relevant when deciding 
whether to allow a checkpoint to take place, since we can distinguish between 
different checkpoints at different txids.

- various javadoc additions where things were incorrect or incomplete


 1073: address remaining TODOs and pre-merge cleanup
 ---

 Key: HDFS-2133
 URL: https://issues.apache.org/jira/browse/HDFS-2133
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)

 Attachments: hdfs-2133.txt


 There are a few TODOs still in the code and a bit of cleanup to be done 
 before merging HDFS-1073. This JIRA is for this misc cleanup.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1896) Additional QA tasks for Edit Log branch

2011-07-06 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060964#comment-13060964
 ] 

Aaron T. Myers commented on HDFS-1896:
--

Oh, also, though it's not committed yet, HDFS-2132 likely will be soon. This 
will also need to be ported to the HDFS-1073 branch.

 Additional QA tasks for Edit Log branch
 ---

 Key: HDFS-1896
 URL: https://issues.apache.org/jira/browse/HDFS-1896
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)


 As we close out tasks in the HDFS-1073 branch, there are a few places where 
 I've noticed that we lack some test coverage. Creating this ticket just as a 
 place to jot down some notes on things that we ought to make sure are tested, 
 preferably by automated (unit) tests.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-503) Implement erasure coding as a layer on HDFS

2011-07-06 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060975#comment-13060975
 ] 

dhruba borthakur commented on HDFS-503:
---

1. Raid has no impact on dfsadmin -report command.

2. You won't be able to set a replication factor to 0. You would have to 
manually pull the plug (kill it) on a datanode to see how raid works.

3. stripe locations do not contribute to split locations of a block, thus they 
are not used for map-reduce locality.

 Implement erasure coding as a layer on HDFS
 ---

 Key: HDFS-503
 URL: https://issues.apache.org/jira/browse/HDFS-503
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: contrib/raid
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.21.0

 Attachments: raid1.txt, raid2.txt


 The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
 file system can be reduced. Keeping three copies of the same data is very 
 costly, especially when the size of storage is huge. One idea is to reduce 
 the replication factor and do erasure coding of a set of blocks so that the 
 over probability of failure of a block remains the same as before.
 Many forms of error-correcting codes are available, see 
 http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
 described DiskReduce 
 https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
 My opinion is to discuss implementation strategies that are not part of base 
 HDFS, but is a layer on top of HDFS.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1896) Additional QA tasks for Edit Log branch

2011-07-06 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060981#comment-13060981
 ] 

Aaron T. Myers commented on HDFS-1896:
--

I've gone through the list. Most things appear to be fine. The only issues I 
found were the following:

# HDFS-1955 - This appears to have entirely regressed in 1073.
# HDFS-1149 - I bet the change to {{NNStorage.setFields(...)}} will cause the 
upgrade tests to break. At the very least, there are now some unused imports in 
{{NNStorage}} on the 1073 branch.

 Additional QA tasks for Edit Log branch
 ---

 Key: HDFS-1896
 URL: https://issues.apache.org/jira/browse/HDFS-1896
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)


 As we close out tasks in the HDFS-1073 branch, there are a few places where 
 I've noticed that we lack some test coverage. Creating this ticket just as a 
 place to jot down some notes on things that we ought to make sure are tested, 
 preferably by automated (unit) tests.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2134) Move DecommissionManager to block management

2011-07-06 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-2134:
-

Affects Version/s: 0.23.0
   Status: Patch Available  (was: Open)

 Move DecommissionManager to block management
 

 Key: HDFS-2134
 URL: https://issues.apache.org/jira/browse/HDFS-2134
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: 0.23.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h2134_20110706.patch


 Datanode management including {{DecommissionManager}} should belong to block 
 management.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2134) Move DecommissionManager to block management

2011-07-06 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-2134:
-

Attachment: h2134_20110706.patch

h2134_20110706.patch: moving the codes.

 Move DecommissionManager to block management
 

 Key: HDFS-2134
 URL: https://issues.apache.org/jira/browse/HDFS-2134
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: 0.23.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h2134_20110706.patch


 Datanode management including {{DecommissionManager}} should belong to block 
 management.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2134) Move DecommissionManager to block management

2011-07-06 Thread Tsz Wo (Nicholas), SZE (JIRA)
Move DecommissionManager to block management


 Key: HDFS-2134
 URL: https://issues.apache.org/jira/browse/HDFS-2134
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE


Datanode management including {{DecommissionManager}} should belong to block 
management.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2104) 1073: Add a flag to 2NN to format its checkpoint dirs on startup

2011-07-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2104:
--

Attachment: hdfs-2104.txt

Added a -format flag. While I was at it, I replaced the ad-hoc parsing code 
with Commons-CLI.

I elected _not_ to have it ask for confirmation, since this is only formatting 
the secondary and not a primary. Our old behavior was basically to overwrite 
the local image anyway, so this isn't a regression in safety.

 1073: Add a flag to 2NN to format its checkpoint dirs on startup
 

 Key: HDFS-2104
 URL: https://issues.apache.org/jira/browse/HDFS-2104
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Todd Lipcon
 Attachments: hdfs-2104.txt




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1896) Additional QA tasks for Edit Log branch

2011-07-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061008#comment-13061008
 ] 

Todd Lipcon commented on HDFS-1896:
---

bq.  I bet the change to NNStorage.setFields(...) will cause the upgrade tests 
to break.

hmm, you sure you mean 1149 (the lease reassignment fix?) I'm not sure I see 
how that relates to NNStorage.

 Additional QA tasks for Edit Log branch
 ---

 Key: HDFS-1896
 URL: https://issues.apache.org/jira/browse/HDFS-1896
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)


 As we close out tasks in the HDFS-1073 branch, there are a few places where 
 I've noticed that we lack some test coverage. Creating this ticket just as a 
 place to jot down some notes on things that we ought to make sure are tested, 
 preferably by automated (unit) tests.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1896) Additional QA tasks for Edit Log branch

2011-07-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061009#comment-13061009
 ] 

Todd Lipcon commented on HDFS-1896:
---

btw, thanks for looking into these details. I'll file a JIRA about fixing 1955.

 Additional QA tasks for Edit Log branch
 ---

 Key: HDFS-1896
 URL: https://issues.apache.org/jira/browse/HDFS-1896
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)


 As we close out tasks in the HDFS-1073 branch, there are a few places where 
 I've noticed that we lack some test coverage. Creating this ticket just as a 
 place to jot down some notes on things that we ought to make sure are tested, 
 preferably by automated (unit) tests.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2135) 1073: fix regression of HDFS-1955 in branch

2011-07-06 Thread Todd Lipcon (JIRA)
1073: fix regression of HDFS-1955 in branch
---

 Key: HDFS-2135
 URL: https://issues.apache.org/jira/browse/HDFS-2135
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Todd Lipcon
Assignee: Todd Lipcon




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2135) 1073: fix regression of HDFS-1955 in branch

2011-07-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2135:
--

  Description: atm went through the NN storage-related JIRAs committed 
in trunk since HDFS-1073 was branched, and noted that it looked like HDFS-1955 
is regressed on the branch. This JIRA is to investigate and fix as necessary.
Affects Version/s: Edit log branch (HDFS-1073)
Fix Version/s: Edit log branch (HDFS-1073)

 1073: fix regression of HDFS-1955 in branch
 ---

 Key: HDFS-2135
 URL: https://issues.apache.org/jira/browse/HDFS-2135
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)


 atm went through the NN storage-related JIRAs committed in trunk since 
 HDFS-1073 was branched, and noted that it looked like HDFS-1955 is regressed 
 on the branch. This JIRA is to investigate and fix as necessary.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1896) Additional QA tasks for Edit Log branch

2011-07-06 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061011#comment-13061011
 ] 

Aaron T. Myers commented on HDFS-1896:
--

Sorry, I meant HDFS-1969 not HDFS-1149.

 Additional QA tasks for Edit Log branch
 ---

 Key: HDFS-1896
 URL: https://issues.apache.org/jira/browse/HDFS-1896
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)


 As we close out tasks in the HDFS-1073 branch, there are a few places where 
 I've noticed that we lack some test coverage. Creating this ticket just as a 
 place to jot down some notes on things that we ought to make sure are tested, 
 preferably by automated (unit) tests.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2135) 1073: fix regression of HDFS-1955 in branch

2011-07-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2135:
--

Attachment: hdfs-2135.txt

Was a fairly trivial error - it was using the errorSDs list, which was getting 
cleared halfway through the function. I switched it to look at how many storage 
dirs got removed by asking the storage instead.

Unfortunately it's difficult to write a unit test, as was observed in the 
original JIRA. So, I tested by hand by adding a fault in the saving code for 
one of the dirs.

 1073: fix regression of HDFS-1955 in branch
 ---

 Key: HDFS-2135
 URL: https://issues.apache.org/jira/browse/HDFS-2135
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)

 Attachments: hdfs-2135.txt


 atm went through the NN storage-related JIRAs committed in trunk since 
 HDFS-1073 was branched, and noted that it looked like HDFS-1955 is regressed 
 on the branch. This JIRA is to investigate and fix as necessary.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1896) Additional QA tasks for Edit Log branch

2011-07-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061014#comment-13061014
 ] 

Todd Lipcon commented on HDFS-1896:
---

I just manually tested HDFS-1969's new functionality and it seems to be working:
- formatted NN using 0.20
- ran NN from 1073 branch - it complained about wrong layout
- ran 1073 NN with -upgrade flag, it started with upgrade
- ran 1073 NN with -rollback flag, it correctly complained
- ran 0.20 NN with -rollback flag, it rolled back to old namespace

Once all of the currently outstanding patches are applied, the upgrade tests 
also seem to be passing, so I think we're OK on that one.

I agree there are lots of unused imports. I'll do a pass to clean them up right 
before we merge.

 Additional QA tasks for Edit Log branch
 ---

 Key: HDFS-1896
 URL: https://issues.apache.org/jira/browse/HDFS-1896
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)


 As we close out tasks in the HDFS-1073 branch, there are a few places where 
 I've noticed that we lack some test coverage. Creating this ticket just as a 
 place to jot down some notes on things that we ought to make sure are tested, 
 preferably by automated (unit) tests.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1794) Add code to list which edit logs are available on a remote NN

2011-07-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1794:
--

Hadoop Flags: [Reviewed]

 Add code to list which edit logs are available on a remote NN
 -

 Key: HDFS-1794
 URL: https://issues.apache.org/jira/browse/HDFS-1794
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)

 Attachments: hdfs-1794.txt, hdfs-1794.txt


 When the 2NN or BN needs to sync up with the primary NN, it may need to 
 download several different edits files since the NN may roll whenever it 
 likes. This JIRA adds a new type called RemoteEditLogManifest to list the 
 available edit log files since a given transaction ID. This may also be 
 useful for monitoring or backup tools down the road.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1993) TestCheckpoint needs to clean up between cases

2011-07-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1993:
--

Hadoop Flags: [Reviewed]

 TestCheckpoint needs to clean up between cases
 --

 Key: HDFS-1993
 URL: https://issues.apache.org/jira/browse/HDFS-1993
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node, test
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)

 Attachments: hdfs-1993.txt


 TestCheckpoint currently relies on some test ordering in order to pass 
 correctly. Instead it should clean itself up in a setUp() method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1792) Add code to detect valid length of an edits file

2011-07-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1792:
--

Hadoop Flags: [Reviewed]

 Add code to detect valid length of an edits file
 

 Key: HDFS-1792
 URL: https://issues.apache.org/jira/browse/HDFS-1792
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)

 Attachments: hdfs-1792.txt


 In some edit log corruption situations, it's useful to be able to determine 
 the valid length of an edit log. For this JIRA we define valid as the 
 length of the file excluding any trailing 0x00 bytes, usually left there by 
 the preallocation done while writing. In the future this API can be extended 
 to look at edit checksums, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1894) Add constants for LAYOUT_VERSIONs in edits log branch

2011-07-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1894:
--

Hadoop Flags: [Reviewed]

 Add constants for LAYOUT_VERSIONs in edits log branch
 -

 Key: HDFS-1894
 URL: https://issues.apache.org/jira/browse/HDFS-1894
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)

 Attachments: hdfs-1894.txt, hdfs-1894.txt


 When merging from trunk into branch, it's pretty difficult to resolve 
 conflicts around the layout versions, since trunk keeps swallowing whatever 
 layout version I've picked in the branch. Adding a couple of constants will 
 make the merges much easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira