[jira] Commented: (HDFS-1204) 0.20: Lease expiration should recover single files, not entire lease holder

2010-06-13 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878358#action_12878358
 ] 

dhruba borthakur commented on HDFS-1204:


+1

> 0.20: Lease expiration should recover single files, not entire lease holder
> ---
>
> Key: HDFS-1204
> URL: https://issues.apache.org/jira/browse/HDFS-1204
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
> Fix For: 0.20-append
>
> Attachments: hdfs-1204.txt
>
>
> This was brought up in HDFS-200 but didn't make it into the branch on Apache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1202) DataBlockScanner throws NPE when updated before initialized

2010-06-13 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878359#action_12878359
 ] 

dhruba borthakur commented on HDFS-1202:


There seems to be many lines that have whitespace changes only.

> DataBlockScanner throws NPE when updated before initialized
> ---
>
> Key: HDFS-1202
> URL: https://issues.apache.org/jira/browse/HDFS-1202
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20-append, 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.20-append
>
> Attachments: hdfs-1202-0.20-append.txt
>
>
> Missing an isInitialized() check in updateScanStatusInternal

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1202) DataBlockScanner throws NPE when updated before initialized

2010-06-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878415#action_12878415
 ] 

Todd Lipcon commented on HDFS-1202:
---

If you look carefully, those lines fix a spelling error. It's a private method 
so I figured it was worth fixing the typo in the method name.

> DataBlockScanner throws NPE when updated before initialized
> ---
>
> Key: HDFS-1202
> URL: https://issues.apache.org/jira/browse/HDFS-1202
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20-append, 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.20-append
>
> Attachments: hdfs-1202-0.20-append.txt
>
>
> Missing an isInitialized() check in updateScanStatusInternal

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HDFS-1204) 0.20: Lease expiration should recover single files, not entire lease holder

2010-06-13 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned HDFS-1204:
-

Assignee: sam rash

Assigning to Sam Rash, since I just made the patch based on his comment.

> 0.20: Lease expiration should recover single files, not entire lease holder
> ---
>
> Key: HDFS-1204
> URL: https://issues.apache.org/jira/browse/HDFS-1204
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
>Assignee: sam rash
> Fix For: 0.20-append
>
> Attachments: hdfs-1204.txt
>
>
> This was brought up in HDFS-200 but didn't make it into the branch on Apache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1111) getCorruptFiles() should give some hint that the list is not complete

2010-06-13 Thread Rodrigo Schmidt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rodrigo Schmidt updated HDFS-:
--

Status: Patch Available  (was: Open)

> getCorruptFiles() should give some hint that the list is not complete
> -
>
> Key: HDFS-
> URL: https://issues.apache.org/jira/browse/HDFS-
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Rodrigo Schmidt
>Assignee: Rodrigo Schmidt
> Attachments: HADFS-.0.patch
>
>
> If the list of corruptfiles returned by the namenode doesn't say anything if 
> the number of corrupted files is larger than the call output limit (which 
> means the list is not complete). There should be a way to hint incompleteness 
> to clients.
> A simple hack would be to add an extra entry to the array returned with the 
> value null. Clients could interpret this as a sign that there are other 
> corrupt files in the system.
> We should also do some rephrasing of the fsck output to make it more 
> confident when the list is not complete and less confident when the list is 
> known to be incomplete.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1111) getCorruptFiles() should give some hint that the list is not complete

2010-06-13 Thread Rodrigo Schmidt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rodrigo Schmidt updated HDFS-:
--

Status: Open  (was: Patch Available)

> getCorruptFiles() should give some hint that the list is not complete
> -
>
> Key: HDFS-
> URL: https://issues.apache.org/jira/browse/HDFS-
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Rodrigo Schmidt
>Assignee: Rodrigo Schmidt
> Attachments: HADFS-.0.patch
>
>
> If the list of corruptfiles returned by the namenode doesn't say anything if 
> the number of corrupted files is larger than the call output limit (which 
> means the list is not complete). There should be a way to hint incompleteness 
> to clients.
> A simple hack would be to add an extra entry to the array returned with the 
> value null. Clients could interpret this as a sign that there are other 
> corrupt files in the system.
> We should also do some rephrasing of the fsck output to make it more 
> confident when the list is not complete and less confident when the list is 
> known to be incomplete.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly

2010-06-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878447#action_12878447
 ] 

Todd Lipcon commented on HDFS-1172:
---

I think reusing PendingReplicationBlocks is probably the best idea so far - we 
already have confidence in that code, and should only be a very small patch.

> Blocks in newly completed files are considered under-replicated too quickly
> ---
>
> Key: HDFS-1172
> URL: https://issues.apache.org/jira/browse/HDFS-1172
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0
>Reporter: Todd Lipcon
>
> I've seen this for a long time, and imagine it's a known issue, but couldn't 
> find an existing JIRA. It often happens that we see the NN schedule 
> replication on the last block of files very quickly after they're completed, 
> before the other DNs in the pipeline have a chance to report the new block. 
> This results in a lot of extra replication work on the cluster, as we 
> replicate the block and then end up with multiple excess replicas which are 
> very quickly deleted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1204) 0.20: Lease expiration should recover single files, not entire lease holder

2010-06-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878449#action_12878449
 ] 

Todd Lipcon commented on HDFS-1204:
---

ah, I didn't see but apparently Sam also posted a unit test here: 
https://issues.apache.org/jira/secure/attachment/12444747/checkLeases-fix-unit-test-1.txt

(should commit the unit test too)

> 0.20: Lease expiration should recover single files, not entire lease holder
> ---
>
> Key: HDFS-1204
> URL: https://issues.apache.org/jira/browse/HDFS-1204
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
>Assignee: sam rash
> Fix For: 0.20-append
>
> Attachments: hdfs-1204.txt
>
>
> This was brought up in HDFS-200 but didn't make it into the branch on Apache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1111) getCorruptFiles() should give some hint that the list is not complete

2010-06-13 Thread Rodrigo Schmidt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rodrigo Schmidt updated HDFS-:
--

Status: Open  (was: Patch Available)

> getCorruptFiles() should give some hint that the list is not complete
> -
>
> Key: HDFS-
> URL: https://issues.apache.org/jira/browse/HDFS-
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Rodrigo Schmidt
>Assignee: Rodrigo Schmidt
> Attachments: HADFS-.0.patch
>
>
> If the list of corruptfiles returned by the namenode doesn't say anything if 
> the number of corrupted files is larger than the call output limit (which 
> means the list is not complete). There should be a way to hint incompleteness 
> to clients.
> A simple hack would be to add an extra entry to the array returned with the 
> value null. Clients could interpret this as a sign that there are other 
> corrupt files in the system.
> We should also do some rephrasing of the fsck output to make it more 
> confident when the list is not complete and less confident when the list is 
> known to be incomplete.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1111) getCorruptFiles() should give some hint that the list is not complete

2010-06-13 Thread Rodrigo Schmidt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rodrigo Schmidt updated HDFS-:
--

Status: Patch Available  (was: Open)

I don't know why Hudson is not picking up this patch.

> getCorruptFiles() should give some hint that the list is not complete
> -
>
> Key: HDFS-
> URL: https://issues.apache.org/jira/browse/HDFS-
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Rodrigo Schmidt
>Assignee: Rodrigo Schmidt
> Attachments: HADFS-.0.patch
>
>
> If the list of corruptfiles returned by the namenode doesn't say anything if 
> the number of corrupted files is larger than the call output limit (which 
> means the list is not complete). There should be a way to hint incompleteness 
> to clients.
> A simple hack would be to add an extra entry to the array returned with the 
> value null. Clients could interpret this as a sign that there are other 
> corrupt files in the system.
> We should also do some rephrasing of the fsck output to make it more 
> confident when the list is not complete and less confident when the list is 
> known to be incomplete.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1114) Reducing NameNode memory usage by an alternate hash table

2010-06-13 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878462#action_12878462
 ] 

Scott Carey commented on HDFS-1114:
---

if you are using a power of two hash table, you can avoid problems caused by 
hash value clustering by using a Fibonacci Hash.   Essentially, use the 
multiplicative hash with a special value g:

(h * g) & mask

where h is the hash value and g is the 'golden ratio' number for the size of 
the table used.  Since multiplication on today's processors is far faster than 
division or remainders, this can be used to 'uncluster' hash values.  A single 
consecutive run of values gets maximally distributed into the space, and high 
and low bits are redistributed evenly so that the mask does not increase 
collisions.  Whether this is a desired property or not will depend on the 
properties of the hash values and whether or not an open addressing solution is 
used.

Open addressing can further reduce the memory footprint by allowing the raw 
object to be placed in the map instead of a container object or list.

some links found from a few searches:
http://www.brpreiss.com/books/opus4/html/page214.html
http://staff.newport.ac.uk/ctubb01/ct/advp/hashtables.pdf

> Reducing NameNode memory usage by an alternate hash table
> -
>
> Key: HDFS-1114
> URL: https://issues.apache.org/jira/browse/HDFS-1114
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: GSet20100525.pdf, gset20100608.pdf, h1114_20100607.patch
>
>
> NameNode uses a java.util.HashMap to store BlockInfo objects.  When there are 
> many blocks in HDFS, this map uses a lot of memory in the NameNode.  We may 
> optimize the memory usage by a light weight hash table implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1197) 0.20: TestFileAppend3.testTC2 failure

2010-06-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878468#action_12878468
 ] 

Todd Lipcon commented on HDFS-1197:
---

I think this was introduced by the following part of the HDFS-200 patch:
{code}
-}
-  }
-  if (closeFile) {
-// the file is getting closed. Insert block locations into blocksMap.
-// Otherwise fsck will report these blocks as MISSING, especially if 
the
-// blocksReceived from Datanodes take a long time to arrive.
-for (int i = 0; i < descriptors.length; i++) {
   descriptors[i].addBlock(newblockinfo);
 }
-pendingFile.setLastBlock(newblockinfo, null);
-  } else {
-// add locations into the INodeUnderConstruction
-pendingFile.setLastBlock(newblockinfo, descriptors);
   }
+  // add locations into the INodeUnderConstruction
+  pendingFile.setLastBlock(newblockinfo, descriptors);
 }
{code}

I have some unit tests to show the issue, and working on a fix. I think this 
should be considered a blocker for 0.20-append

> 0.20: TestFileAppend3.testTC2 failure
> -
>
> Key: HDFS-1197
> URL: https://issues.apache.org/jira/browse/HDFS-1197
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client, name-node
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
> Attachments: testTC2-failure.txt
>
>
> I saw this failure once on my internal Hudson job that runs the append tests 
> 48 times a day:
> junit.framework.AssertionFailedError: expected:<114688> but was:<98304>
>   at org.apache.hadoop.hdfs.AppendTestUtil.check(AppendTestUtil.java:112)
>   at 
> org.apache.hadoop.hdfs.TestFileAppend3.testTC2(TestFileAppend3.java:116)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-27) RPC on Datanode blocked forever.

2010-06-13 Thread Uma Mahesh (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878490#action_12878490
 ] 

Uma Mahesh commented on HDFS-27:


We are using hadoop 0.20.1 version.Whether this issue will be applicable to 
that version also?
can you give more deatils on thjis 2 points.

> RPC on Datanode blocked forever.
> 
>
> Key: HDFS-27
> URL: https://issues.apache.org/jira/browse/HDFS-27
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: Java SE 1.6.0-b105 on Linux 2.6.x
>Reporter: Raghu Angadi
> Attachments: datanode-jstack
>
>
> We recently noticed a number of datanodes got stuck. The main thread that 
> sends heartbeats and block reports is blocked in select() in side 
> blockReport() RPC.  I will add a stack trace in the next comment.
> I am not sure why select was blocked forever since there is no connection 
> open to NameNode. In fact, NN was restarted in between. It could be some JDK 
> bug or a Hadoop bug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (HDFS-1141) completeFile does not check lease ownership

2010-06-13 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reopened HDFS-1141:
---


Oh wait, not unnecesary in trunk. But *also* necessary in branch-0.20 append :) 
I was thinking of that other similar JIRA about leases. Reopening for commit to 
0.20-append

> completeFile does not check lease ownership
> ---
>
> Key: HDFS-1141
> URL: https://issues.apache.org/jira/browse/HDFS-1141
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: hdfs-1141-branch20.txt, hdfs-1141.txt, hdfs-1141.txt
>
>
> completeFile should check that the caller still owns the lease of the file 
> that it's completing. This is for the 'testCompleteOtherLeaseHoldersFile' 
> case in HDFS-1139.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1141) completeFile does not check lease ownership

2010-06-13 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1141:
--

Affects Version/s: (was: 0.21.0)
   0.20-append

This was determined to be invalid in trunk, but is necessary in 
branch-0.20-append

> completeFile does not check lease ownership
> ---
>
> Key: HDFS-1141
> URL: https://issues.apache.org/jira/browse/HDFS-1141
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20-append
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: hdfs-1141-branch20.txt, hdfs-1141.txt, hdfs-1141.txt
>
>
> completeFile should check that the caller still owns the lease of the file 
> that it's completing. This is for the 'testCompleteOtherLeaseHoldersFile' 
> case in HDFS-1139.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1205) FSDatasetAsyncDiskService should name its threads

2010-06-13 Thread Todd Lipcon (JIRA)
FSDatasetAsyncDiskService should name its threads
-

 Key: HDFS-1205
 URL: https://issues.apache.org/jira/browse/HDFS-1205
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon


FSDatasetAsyncService creates threads but doesn't name them. The ThreadFactory 
should name them with the volume they work on as well as a thread index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1205) FSDatasetAsyncDiskService should name its threads

2010-06-13 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1205:
--

Status: Patch Available  (was: Open)

> FSDatasetAsyncDiskService should name its threads
> -
>
> Key: HDFS-1205
> URL: https://issues.apache.org/jira/browse/HDFS-1205
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-1205-0.20.txt, hdfs-1205.txt
>
>
> FSDatasetAsyncService creates threads but doesn't name them. The 
> ThreadFactory should name them with the volume they work on as well as a 
> thread index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1205) FSDatasetAsyncDiskService should name its threads

2010-06-13 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1205:
--

Attachment: hdfs-1205.txt
hdfs-1205-0.20.txt

Attaching patches for trunk and 0.20

> FSDatasetAsyncDiskService should name its threads
> -
>
> Key: HDFS-1205
> URL: https://issues.apache.org/jira/browse/HDFS-1205
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-1205-0.20.txt, hdfs-1205.txt
>
>
> FSDatasetAsyncService creates threads but doesn't name them. The 
> ThreadFactory should name them with the volume they work on as well as a 
> thread index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.