[jira] Commented: (HDFS-453) XML-based metrics as JSP servlet for NameNode

2010-06-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883053#action_12883053
 ] 

Hadoop QA commented on HDFS-453:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12446660/HDFS-453.7.patch
  against trunk revision 957669.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 24 javac compiler warnings (more 
than the trunk's current 23 warnings).

-1 findbugs.  The patch appears to introduce 2 new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/405/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/405/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/405/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/405/console

This message is automatically generated.

 XML-based metrics as JSP servlet for NameNode
 -

 Key: HDFS-453
 URL: https://issues.apache.org/jira/browse/HDFS-453
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: name-node
Affects Versions: 0.21.0, 0.22.0
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Fix For: 0.21.0, 0.22.0

 Attachments: dfshealth.xml.jspx, example-dfshealth.xml, 
 HDFS-453.2.patch, HDFS-453.3.patch, HDFS-453.4.patch, HDFS-453.5.patch, 
 HDFS-453.6.patch, HDFS-453.7.patch, HDFS-453.patch


 In HADOOP-4559, a general REST API for reporting metrics was proposed but 
 work seems to have stalled. In the interim, we have a simple XML translation 
 of the existing NameNode status page which provides the same metrics as the 
 human-readable page. This is a relatively lightweight addition to provide 
 some machine-understandable metrics reporting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1205) FSDatasetAsyncDiskService should name its threads

2010-06-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883056#action_12883056
 ] 

Hadoop QA commented on HDFS-1205:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12447006/hdfs-1205-0.20.txt
  against trunk revision 957669.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/406/console

This message is automatically generated.

 FSDatasetAsyncDiskService should name its threads
 -

 Key: HDFS-1205
 URL: https://issues.apache.org/jira/browse/HDFS-1205
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-1205-0.20.txt, hdfs-1205.txt


 FSDatasetAsyncService creates threads but doesn't name them. The 
 ThreadFactory should name them with the volume they work on as well as a 
 thread index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1203) DataNode should sleep before reentering service loop after an exception

2010-06-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883059#action_12883059
 ] 

Hadoop QA commented on HDFS-1203:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12446925/hdfs-1203.txt
  against trunk revision 957669.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/197/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/197/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/197/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/197/console

This message is automatically generated.

 DataNode should sleep before reentering service loop after an exception
 ---

 Key: HDFS-1203
 URL: https://issues.apache.org/jira/browse/HDFS-1203
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-1203.txt


 When the DN gets an exception in response to a heartbeat, it logs it and 
 continues, but there is no sleep. I've occasionally seen bugs produce a case 
 where heartbeats continuously produce exceptions, and thus the DN floods the 
 NN with bad heartbeats. Adding a 1 second sleep at least throttles the error 
 messages for easier debugging and error isolation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1071) savenamespace should write the fsimage to all configured fs.name.dir in parallel

2010-06-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883100#action_12883100
 ] 

Hadoop QA commented on HDFS-1071:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12447486/HDFS-1071.5.patch
  against trunk revision 957669.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/410/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/410/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/410/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/410/console

This message is automatically generated.

 savenamespace should write the fsimage to all configured fs.name.dir in 
 parallel
 

 Key: HDFS-1071
 URL: https://issues.apache.org/jira/browse/HDFS-1071
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: dhruba borthakur
Assignee: Dmytro Molkov
 Attachments: HDFS-1071.2.patch, HDFS-1071.3.patch, HDFS-1071.4.patch, 
 HDFS-1071.5.patch, HDFS-1071.patch


 If you have a large number of files in HDFS, the fsimage file is very big. 
 When the namenode restarts, it writes a copy of the fsimage to all 
 directories configured in fs.name.dir. This takes a long time, especially if 
 there are many directories in fs.name.dir. Make the NN write the fsimage to 
 all these directories in parallel.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1202) DataBlockScanner throws NPE when updated before initialized

2010-06-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883147#action_12883147
 ] 

Hadoop QA commented on HDFS-1202:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12447746/hdfs-1202.txt
  against trunk revision 957669.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/411/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/411/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/411/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/411/console

This message is automatically generated.

 DataBlockScanner throws NPE when updated before initialized
 ---

 Key: HDFS-1202
 URL: https://issues.apache.org/jira/browse/HDFS-1202
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20-append, 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 0.20-append, 0.22.0

 Attachments: hdfs-1202-0.20-append.txt, hdfs-1202.txt


 Missing an isInitialized() check in updateScanStatusInternal

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1268) Extract blockInvalidateLimit as a seperated configuration

2010-06-28 Thread jinglong.liujl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883176#action_12883176
 ] 

jinglong.liujl commented on HDFS-1268:
--

In my case, if I want to delete 600 blocks,  I have to wait 6 heartbeats 
periods. During this period,  disk maybe reach its capacity. Then, too slow 
block fetching will cause write failure. 
In general case, default value (100 can work well), but in this extremely case, 
default value is not enough. Currently, this parameter can be computed by 
heartbeatInterval, but in the case before, slower heartbeat + per heartbeat 
carry more blocks  can not carry more blocks in the same period.
Why not make this parameter can be configured?





 Extract blockInvalidateLimit as a seperated configuration
 -

 Key: HDFS-1268
 URL: https://issues.apache.org/jira/browse/HDFS-1268
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: jinglong.liujl
 Attachments: patch.diff


   If there're many file piled up in recentInvalidateSets, only 
 Math.max(blockInvalidateLimit, 
 20*(int)(heartbeatInterval/1000)) invalid blocks can be carried in a 
 heartbeat.(By default, It's 100). In high write stress, it'll cause process 
 of invalidate blocks removing can not catch up with  speed of writing. 
 We extract blockInvalidateLimit  to a sperate config parameter that user 
 can make the right configure for your cluster. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1268) Extract blockInvalidateLimit as a seperated configuration

2010-06-28 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883277#action_12883277
 ] 

Konstantin Shvachko commented on HDFS-1268:
---

I was actually in favor of introducing the parameter, see 
[here|https://issues.apache.org/jira/browse/HADOOP-774?focusedCommentId=12455413page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12455413]
So it is mostly about clear motivation, and making sure that the solution will 
actually work for you.
So you are talking about a corner case, when a DN is almost full and needs to 
remove blocks faster in order to free space for subsequent writes, right?
How does this parameter help on a running cluster? Configuration change takes 
effect only when you restart the name-node. Do you plan to restart cluster when 
you see data-nodes are getting close to full? 

 Extract blockInvalidateLimit as a seperated configuration
 -

 Key: HDFS-1268
 URL: https://issues.apache.org/jira/browse/HDFS-1268
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: jinglong.liujl
 Attachments: patch.diff


   If there're many file piled up in recentInvalidateSets, only 
 Math.max(blockInvalidateLimit, 
 20*(int)(heartbeatInterval/1000)) invalid blocks can be carried in a 
 heartbeat.(By default, It's 100). In high write stress, it'll cause process 
 of invalidate blocks removing can not catch up with  speed of writing. 
 We extract blockInvalidateLimit  to a sperate config parameter that user 
 can make the right configure for your cluster. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1262) Failed pipeline creation during append leaves lease hanging on NN

2010-06-28 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883296#action_12883296
 ] 

Todd Lipcon commented on HDFS-1262:
---

bq. so it really is a glorified 'cleanup and close' which has the same behavior 
as if the lease expired--nice and tidy imo. It does have the slight delay of 
lease recovery, though.

I think that makes sense - best to do recovery since we might have gotten 
halfway through creating the pipeline, for example, and this will move the 
blocks back to finalized state on the DNs. Performance shouldn't be a concern, 
since this is such a rare case.

bq. While in theory it could happen on the NN side, right now, the namenode RPC 
for create happens and then all we do is start the streamer (hence i don't have 
a test case for it yet).

What happens if we have a transient network error? For example, let's say the 
client is on the same machine as the NN, but it got partitioned from the 
network for a bit. When we call create(), it succeeds, but then when we 
actually try to write the blocks, it fails temporarily. This currently leaves a 
0-length file, but does it also orphan the lease for that file?

 Failed pipeline creation during append leaves lease hanging on NN
 -

 Key: HDFS-1262
 URL: https://issues.apache.org/jira/browse/HDFS-1262
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client, name-node
Affects Versions: 0.20-append
Reporter: Todd Lipcon
Assignee: sam rash
Priority: Critical
 Fix For: 0.20-append


 Ryan Rawson came upon this nasty bug in HBase cluster testing. What happened 
 was the following:
 1) File's original writer died
 2) Recovery client tried to open file for append - looped for a minute or so 
 until soft lease expired, then append call initiated recovery
 3) Recovery completed successfully
 4) Recovery client calls append again, which succeeds on the NN
 5) For some reason, the block recovery that happens at the start of append 
 pipeline creation failed on all datanodes 6 times, causing the append() call 
 to throw an exception back to HBase master. HBase assumed the file wasn't 
 open and put it back on a queue to try later
 6) Some time later, it tried append again, but the lease was still assigned 
 to the same DFS client, so it wasn't able to recover.
 The recovery failure in step 5 is a separate issue, but the problem for this 
 JIRA is that the NN can think it failed to open a file for append when the NN 
 thinks the writer holds a lease. Since the writer keeps renewing its lease, 
 recovery never happens, and no one can open or recover the file until the DFS 
 client shuts down.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1202) DataBlockScanner throws NPE when updated before initialized

2010-06-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883311#action_12883311
 ] 

Hadoop QA commented on HDFS-1202:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12447746/hdfs-1202.txt
  against trunk revision 957669.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/412/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/412/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/412/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/412/console

This message is automatically generated.

 DataBlockScanner throws NPE when updated before initialized
 ---

 Key: HDFS-1202
 URL: https://issues.apache.org/jira/browse/HDFS-1202
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20-append, 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 0.20-append, 0.22.0

 Attachments: hdfs-1202-0.20-append.txt, hdfs-1202.txt


 Missing an isInitialized() check in updateScanStatusInternal

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1108) ability to create a file whose newly allocated blocks are automatically persisted immediately

2010-06-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883320#action_12883320
 ] 

Hadoop QA commented on HDFS-1108:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12447758/HDFS-1108.patch
  against trunk revision 957669.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/203/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/203/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/203/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/203/console

This message is automatically generated.

 ability to create a file whose newly allocated blocks are automatically 
 persisted immediately
 -

 Key: HDFS-1108
 URL: https://issues.apache.org/jira/browse/HDFS-1108
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: dhruba borthakur
Assignee: Dmytro Molkov
 Attachments: HDFS-1108.patch


 The current HDFS design says that newly allocated blocks for a file are not 
 persisted in the NN transaction log when the block is allocated. Instead, a 
 hflush() or a close() on the file persists the blocks into the transaction 
 log. It would be nice if we can immediately persist newly allocated blocks 
 (as soon as they are allocated) for specific files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1262) Failed pipeline creation during append leaves lease hanging on NN

2010-06-28 Thread sam rash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883319#action_12883319
 ] 

sam rash commented on HDFS-1262:


in the 2nd case, can't the client still call close?  or will it hang forever 
waiting for blocks?

either way, i've got test cases for create() + append() and the fix.  took a 
little longer to clean up today, but will post the patch by end of day



 Failed pipeline creation during append leaves lease hanging on NN
 -

 Key: HDFS-1262
 URL: https://issues.apache.org/jira/browse/HDFS-1262
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client, name-node
Affects Versions: 0.20-append
Reporter: Todd Lipcon
Assignee: sam rash
Priority: Critical
 Fix For: 0.20-append


 Ryan Rawson came upon this nasty bug in HBase cluster testing. What happened 
 was the following:
 1) File's original writer died
 2) Recovery client tried to open file for append - looped for a minute or so 
 until soft lease expired, then append call initiated recovery
 3) Recovery completed successfully
 4) Recovery client calls append again, which succeeds on the NN
 5) For some reason, the block recovery that happens at the start of append 
 pipeline creation failed on all datanodes 6 times, causing the append() call 
 to throw an exception back to HBase master. HBase assumed the file wasn't 
 open and put it back on a queue to try later
 6) Some time later, it tried append again, but the lease was still assigned 
 to the same DFS client, so it wasn't able to recover.
 The recovery failure in step 5 is a separate issue, but the problem for this 
 JIRA is that the NN can think it failed to open a file for append when the NN 
 thinks the writer holds a lease. Since the writer keeps renewing its lease, 
 recovery never happens, and no one can open or recover the file until the DFS 
 client shuts down.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1272) HDFS changes corresponding to rename of TokenStorage to Credentials

2010-06-28 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-1272:
---

Attachment: HDFS-1272.1.patch

Patch for trunk.

 HDFS changes corresponding to rename of TokenStorage to Credentials
 ---

 Key: HDFS-1272
 URL: https://issues.apache.org/jira/browse/HDFS-1272
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HDFS-1272.1.patch


 TokenStorage is renamed to Credentials as part of MAPREDUCE-1528 and 
 HADOOP-6845. This jira tracks hdfs changes corresponding to that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1272) HDFS changes corresponding to rename of TokenStorage to Credentials

2010-06-28 Thread Jitendra Nath Pandey (JIRA)
HDFS changes corresponding to rename of TokenStorage to Credentials
---

 Key: HDFS-1272
 URL: https://issues.apache.org/jira/browse/HDFS-1272
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey


TokenStorage is renamed to Credentials as part of MAPREDUCE-1528 and 
HADOOP-6845. This jira tracks hdfs changes corresponding to that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1258) Clearing namespace quota on / corrupts FS image

2010-06-28 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-1258:
-

Attachment: clear-quota.patch

This patch doesn't actually solve the root problem of clearing the root 
directory quota causing a corrupt FS image, but it will prevent people from 
accidentally borking their file system in the mean time, until that gets fixed.

 Clearing namespace quota on / corrupts FS image
 -

 Key: HDFS-1258
 URL: https://issues.apache.org/jira/browse/HDFS-1258
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: Aaron T. Myers
Priority: Blocker
 Fix For: 0.20.3, 0.21.0, 0.22.0

 Attachments: clear-quota.patch


 The HDFS root directory starts out with a default namespace quota of 
 Integer.MAX_VALUE. If you clear this quota (using hadoop dfsadmin -clrQuota 
 /), the fsimage gets corrupted immediately. Subsequent 2NN rolls will fail, 
 and the NN will not come back up from a restart.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1108) ability to create a file whose newly allocated blocks are automatically persisted immediately

2010-06-28 Thread Dmytro Molkov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883347#action_12883347
 ] 

Dmytro Molkov commented on HDFS-1108:
-

Suresh:
1. Yes, the block information will essentially be persisted twice, on each 
block allocation and on file close. Do you think that can be a problem for us? 
Since it is a configurable change and this will only happen for specifically 
configured clusters I do not feel like this is bad.
2. This part is tricky. I guess what can happen is: New block is allocated and 
then the client immediately dies without writing data + The namenode crashes 
and needs a restart. When the namenode is restarted it will have this last 
block as UnderConstruction and when NN tries to release the lease on this file 
it will try to recover the block and will never succeed because the block is 
not present on the datanodes.
However it seems that it is the same case now, when namenode is not restarted 
the existence of this block in memory and absence of it on the datanodes will 
lead to the same problem.
Or another case that is similar is when client calls hflush and then  Namenode 
+ client + all datanodes that are receiving the new block crash.

Please correct me if I am wrong on this one.
All in all it seems that if the namenode crashes it may lead to the client 
dying and so the probability of this happening might be higher than my first 
example of what might happen today?

3. I am not really sure what you mean by primary flagging this to standby, but 
in our case the only channel of communication between primary and standby is in 
fact the edits log, so this seemed like a reasonable way to go.

 ability to create a file whose newly allocated blocks are automatically 
 persisted immediately
 -

 Key: HDFS-1108
 URL: https://issues.apache.org/jira/browse/HDFS-1108
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: dhruba borthakur
Assignee: Dmytro Molkov
 Attachments: HDFS-1108.patch


 The current HDFS design says that newly allocated blocks for a file are not 
 persisted in the NN transaction log when the block is allocated. Instead, a 
 hflush() or a close() on the file persists the blocks into the transaction 
 log. It would be nice if we can immediately persist newly allocated blocks 
 (as soon as they are allocated) for specific files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1140) Speedup INode.getPathComponents

2010-06-28 Thread Dmytro Molkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytro Molkov updated HDFS-1140:


Status: Open  (was: Patch Available)

 Speedup INode.getPathComponents
 ---

 Key: HDFS-1140
 URL: https://issues.apache.org/jira/browse/HDFS-1140
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Dmytro Molkov
Assignee: Dmytro Molkov
Priority: Minor
 Attachments: HDFS-1140.2.patch, HDFS-1140.3.patch, HDFS-1140.4.patch, 
 HDFS-1140.patch


 When the namenode is loading the image there is a significant amount of time 
 being spent in the DFSUtil.string2Bytes. We have a very specific workload 
 here. The path that namenode does getPathComponents for shares N - 1 
 component with the previous path this method was called for (assuming current 
 path has N components).
 Hence we can improve the image load time by caching the result of previous 
 conversion.
 We thought of using some simple LRU cache for components, but the reality is, 
 String.getBytes gets optimized during runtime and LRU cache doesn't perform 
 as well, however using just the latest path components and their translation 
 to bytes in two arrays gives quite a performance boost.
 I could get another 20% off of the time to load the image on our cluster (30 
 seconds vs 24) and I wrote a simple benchmark that tests performance with and 
 without caching.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1140) Speedup INode.getPathComponents

2010-06-28 Thread Dmytro Molkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytro Molkov updated HDFS-1140:


Attachment: HDFS-1140.4.patch

Thanks for your comments, Konstantin.
I addressed all of them in a new version of the patch.

 Speedup INode.getPathComponents
 ---

 Key: HDFS-1140
 URL: https://issues.apache.org/jira/browse/HDFS-1140
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Dmytro Molkov
Assignee: Dmytro Molkov
Priority: Minor
 Attachments: HDFS-1140.2.patch, HDFS-1140.3.patch, HDFS-1140.4.patch, 
 HDFS-1140.patch


 When the namenode is loading the image there is a significant amount of time 
 being spent in the DFSUtil.string2Bytes. We have a very specific workload 
 here. The path that namenode does getPathComponents for shares N - 1 
 component with the previous path this method was called for (assuming current 
 path has N components).
 Hence we can improve the image load time by caching the result of previous 
 conversion.
 We thought of using some simple LRU cache for components, but the reality is, 
 String.getBytes gets optimized during runtime and LRU cache doesn't perform 
 as well, however using just the latest path components and their translation 
 to bytes in two arrays gives quite a performance boost.
 I could get another 20% off of the time to load the image on our cluster (30 
 seconds vs 24) and I wrote a simple benchmark that tests performance with and 
 without caching.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1071) savenamespace should write the fsimage to all configured fs.name.dir in parallel

2010-06-28 Thread Dmytro Molkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytro Molkov updated HDFS-1071:


Attachment: HDFS-1071.6.patch

I added a documentation for FSImageSaver that describes initial assumptions for 
how the parallel writes are being done.

As far as writing the image to the single disk in multiple directories. If you 
do it in parallel that might only hurt the performance, since the disk will do 
seeks all the times.

 savenamespace should write the fsimage to all configured fs.name.dir in 
 parallel
 

 Key: HDFS-1071
 URL: https://issues.apache.org/jira/browse/HDFS-1071
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: dhruba borthakur
Assignee: Dmytro Molkov
 Attachments: HDFS-1071.2.patch, HDFS-1071.3.patch, HDFS-1071.4.patch, 
 HDFS-1071.5.patch, HDFS-1071.6.patch, HDFS-1071.patch


 If you have a large number of files in HDFS, the fsimage file is very big. 
 When the namenode restarts, it writes a copy of the fsimage to all 
 directories configured in fs.name.dir. This takes a long time, especially if 
 there are many directories in fs.name.dir. Make the NN write the fsimage to 
 all these directories in parallel.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1267) fuse-dfs does not compile

2010-06-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883373#action_12883373
 ] 

Hadoop QA commented on HDFS-1267:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12448022/1267-1.patch
  against trunk revision 957669.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/206/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/206/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/206/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/206/console

This message is automatically generated.

 fuse-dfs does not compile
 -

 Key: HDFS-1267
 URL: https://issues.apache.org/jira/browse/HDFS-1267
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: contrib/fuse-dfs
Reporter: Tom White
Priority: Critical
 Fix For: 0.21.0

 Attachments: 1267-1.patch


 Looks like since libhdfs was updated to use the new UGI (HDFS-1000) fuse-dfs 
 no longer compiles:
 {noformat}
  [exec] fuse_connect.c: In function 'doConnectAsUser':
  [exec] fuse_connect.c:40: error: too many arguments to function 
 'hdfsConnectAsUser'
 {noformat}
 Any takers to fix this please?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1212) Harmonize HDFS JAR library versions with Common

2010-06-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883374#action_12883374
 ] 

Hadoop QA commented on HDFS-1212:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12448147/HDFS-1212.patch
  against trunk revision 957669.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/413/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/413/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/413/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/413/console

This message is automatically generated.

 Harmonize HDFS JAR library versions with Common
 ---

 Key: HDFS-1212
 URL: https://issues.apache.org/jira/browse/HDFS-1212
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Reporter: Tom White
Assignee: Tom White
Priority: Blocker
 Fix For: 0.21.0

 Attachments: HDFS-1212.patch, HDFS-1212.patch, HDFS-1212.patch, 
 HDFS-1212.patch


 HDFS part of HADOOP-6800.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1262) Failed pipeline creation during append leaves lease hanging on NN

2010-06-28 Thread sam rash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sam rash updated HDFS-1262:
---

Attachment: hdfs-1262-1.txt

-test case for append and create failures.
-tried to get it so both cases fail fast, but create will hit the test timeout 
(default for create that gets AlreadyBeingCreatedException is 5 retries with 
60s sleep)
-append case fails in 30s w/o the fix worst case


 Failed pipeline creation during append leaves lease hanging on NN
 -

 Key: HDFS-1262
 URL: https://issues.apache.org/jira/browse/HDFS-1262
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client, name-node
Affects Versions: 0.20-append
Reporter: Todd Lipcon
Assignee: sam rash
Priority: Critical
 Fix For: 0.20-append

 Attachments: hdfs-1262-1.txt


 Ryan Rawson came upon this nasty bug in HBase cluster testing. What happened 
 was the following:
 1) File's original writer died
 2) Recovery client tried to open file for append - looped for a minute or so 
 until soft lease expired, then append call initiated recovery
 3) Recovery completed successfully
 4) Recovery client calls append again, which succeeds on the NN
 5) For some reason, the block recovery that happens at the start of append 
 pipeline creation failed on all datanodes 6 times, causing the append() call 
 to throw an exception back to HBase master. HBase assumed the file wasn't 
 open and put it back on a queue to try later
 6) Some time later, it tried append again, but the lease was still assigned 
 to the same DFS client, so it wasn't able to recover.
 The recovery failure in step 5 is a separate issue, but the problem for this 
 JIRA is that the NN can think it failed to open a file for append when the NN 
 thinks the writer holds a lease. Since the writer keeps renewing its lease, 
 recovery never happens, and no one can open or recover the file until the DFS 
 client shuts down.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1258) Clearing namespace quota on / corrupts FS image

2010-06-28 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883388#action_12883388
 ] 

Todd Lipcon commented on HDFS-1258:
---

Patch looks good. Can you reupload it with the --no-prefix option to git diff, 
and then change to Patch Available status so the Hudson QA bot runs?

 Clearing namespace quota on / corrupts FS image
 -

 Key: HDFS-1258
 URL: https://issues.apache.org/jira/browse/HDFS-1258
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: Aaron T. Myers
Priority: Blocker
 Fix For: 0.20.3, 0.21.0, 0.22.0

 Attachments: clear-quota.patch


 The HDFS root directory starts out with a default namespace quota of 
 Integer.MAX_VALUE. If you clear this quota (using hadoop dfsadmin -clrQuota 
 /), the fsimage gets corrupted immediately. Subsequent 2NN rolls will fail, 
 and the NN will not come back up from a restart.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1258) Clearing namespace quota on / corrupts FS image

2010-06-28 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-1258:
-

Status: Patch Available  (was: Open)

Patch prevents user's from clearing namespace quota on /.

 Clearing namespace quota on / corrupts FS image
 -

 Key: HDFS-1258
 URL: https://issues.apache.org/jira/browse/HDFS-1258
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: Aaron T. Myers
Priority: Blocker
 Fix For: 0.20.3, 0.21.0, 0.22.0

 Attachments: clear-quota.patch, clear-quota.patch


 The HDFS root directory starts out with a default namespace quota of 
 Integer.MAX_VALUE. If you clear this quota (using hadoop dfsadmin -clrQuota 
 /), the fsimage gets corrupted immediately. Subsequent 2NN rolls will fail, 
 and the NN will not come back up from a restart.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1258) Clearing namespace quota on / corrupts FS image

2010-06-28 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-1258:
-

Attachment: clear-quota.patch

Same patch, but with the --no-prefix option to git diff.

 Clearing namespace quota on / corrupts FS image
 -

 Key: HDFS-1258
 URL: https://issues.apache.org/jira/browse/HDFS-1258
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: Aaron T. Myers
Priority: Blocker
 Fix For: 0.20.3, 0.21.0, 0.22.0

 Attachments: clear-quota.patch, clear-quota.patch


 The HDFS root directory starts out with a default namespace quota of 
 Integer.MAX_VALUE. If you clear this quota (using hadoop dfsadmin -clrQuota 
 /), the fsimage gets corrupted immediately. Subsequent 2NN rolls will fail, 
 and the NN will not come back up from a restart.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1250) Namenode accepts block report from dead datanodes

2010-06-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883407#action_12883407
 ] 

Hadoop QA commented on HDFS-1250:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12448107/HDFS-1250.1.patch
  against trunk revision 957669.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/414/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/414/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/414/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/414/console

This message is automatically generated.

 Namenode accepts block report from dead datanodes
 -

 Key: HDFS-1250
 URL: https://issues.apache.org/jira/browse/HDFS-1250
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.2, 0.22.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: HDFS-1250.1.patch, HDFS-1250.patch


 When a datanode heartbeat times out namenode marks it dead. The subsequent 
 heartbeat from the datanode is rejected with a command to datanode to 
 re-register. However namenode accepts block report from the datanode although 
 it is marked dead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file

2010-06-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883406#action_12883406
 ] 

Hadoop QA commented on HDFS-1057:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12448081/hdfs-1057-trunk-5.txt
  against trunk revision 957669.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 8 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 30 javac compiler warnings (more 
than the trunk's current 23 warnings).

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/207/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/207/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/207/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/207/console

This message is automatically generated.

 Concurrent readers hit ChecksumExceptions if following a writer to very end 
 of file
 ---

 Key: HDFS-1057
 URL: https://issues.apache.org/jira/browse/HDFS-1057
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node
Affects Versions: 0.20-append, 0.21.0, 0.22.0
Reporter: Todd Lipcon
Assignee: sam rash
Priority: Blocker
 Fix For: 0.20-append

 Attachments: conurrent-reader-patch-1.txt, 
 conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, 
 HDFS-1057-0.20-append.patch, hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, 
 hdfs-1057-trunk-3.txt, hdfs-1057-trunk-4.txt, hdfs-1057-trunk-5.txt


 In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before 
 calling flush(). Therefore, if there is a concurrent reader, it's possible to 
 race here - the reader will see the new length while those bytes are still in 
 the buffers of BlockReceiver. Thus the client will potentially see checksum 
 errors or EOFs. Additionally, the last checksum chunk of the file is made 
 accessible to readers even though it is not stable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file

2010-06-28 Thread sam rash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883413#action_12883413
 ] 

sam rash commented on HDFS-1057:


the one test that failed from my new tests had an fd leak.  i've corrected 
that.  the other failed tests I cannot reproduce:

1. 
org.apache.hadoop.hdfs.TestFileConcurrentReader.testUnfinishedBlockCRCErrorNormalTransferVerySmallWrite
 
-had fd leak, fixed

2. org.apache.hadoop.hdfs.security.token.block.TestBlockToken.testBlockTokenRpc

[junit] Running org.apache.hadoop.hdfs.security.token.block.TestBlockToken
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 1.305 sec

3. org.apache.hadoop.hdfs.server.common.TestJspHelper.testGetUgi 

[junit] Running org.apache.hadoop.hdfs.server.common.TestJspHelper
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.309 sec


I can submit the patch with the fix for #1 plus warning fixes


 Concurrent readers hit ChecksumExceptions if following a writer to very end 
 of file
 ---

 Key: HDFS-1057
 URL: https://issues.apache.org/jira/browse/HDFS-1057
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node
Affects Versions: 0.20-append, 0.21.0, 0.22.0
Reporter: Todd Lipcon
Assignee: sam rash
Priority: Blocker
 Fix For: 0.20-append

 Attachments: conurrent-reader-patch-1.txt, 
 conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, 
 HDFS-1057-0.20-append.patch, hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, 
 hdfs-1057-trunk-3.txt, hdfs-1057-trunk-4.txt, hdfs-1057-trunk-5.txt


 In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before 
 calling flush(). Therefore, if there is a concurrent reader, it's possible to 
 race here - the reader will see the new length while those bytes are still in 
 the buffers of BlockReceiver. Thus the client will potentially see checksum 
 errors or EOFs. Additionally, the last checksum chunk of the file is made 
 accessible to readers even though it is not stable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.