[jira] [Commented] (HDFS-1780) reduce need to rewrite fsimage on statrtup

2011-03-24 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010568#comment-13010568
 ] 

Konstantin Shvachko commented on HDFS-1780:
---

Writing image is a (small) fraction of the other start up components. You can 
find the startup timeline numbers in other jiras. What is the high level 
problem you are solving?

 reduce need to rewrite fsimage on statrtup
 --

 Key: HDFS-1780
 URL: https://issues.apache.org/jira/browse/HDFS-1780
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Daryn Sharp

 On startup, the namenode will read the fs image, apply edits, then rewrite 
 the fs image.  This requires a non-trivial amount of time for very large 
 directory structures.  Perhaps the namenode should employ some logic to 
 decide that the edits are simple enough that it doesn't warrant rewriting the 
 image back out to disk.
 A few ideas:
 Use the size of the edit logs, if the size is below a threshold, assume it's 
 cheaper to reprocess the edit log instead of writing the image back out.
 Time the processing of the edits and if the time is below a defined 
 threshold, the image isn't rewritten.
 Timing the reading of the image, and the processing of the edits.  Base the 
 decision on the time it would take to write the image (a multiplier is 
 applied to the read time?) versus the time it would take to reprocess the 
 edits.  If a certain threshold (perhaps percentage or expected time to 
 rewrite) is exceeded, rewrite the image.
 Somethingalong the lines of the last suggestion may allow for defaults that 
 adapt for any size cluster, thus eliminating the need to keep tweaking a 
 cluster's settings based on its size.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1734) 'Chunk size to view' option is not working in Name Node UI.

2011-03-24 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010620#comment-13010620
 ] 

Uma Maheswara Rao G commented on HDFS-1734:
---

Thanks Jithendra for Review,
 Here this fix will solve this problem, because When we click on the file name 
in NN UI, generateFileDetails will write all the parameters in to page as 
hidden.

out.print(input type=\*hidden*\ name=\genstamp\ value=\ + *genStamp*
+   + \)

 When we click refresh, the same parameters will be resubmitted. 


The initial problem is, when we click on file to open from UI, we missed to 
write genstamp as hidden parameter. So, when the refresh happens that parameter 
is missing.

With this patch refresh button is working.

 'Chunk size to view' option is not working in Name Node UI.
 ---

 Key: HDFS-1734
 URL: https://issues.apache.org/jira/browse/HDFS-1734
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: ChunkSizeToView.jpg, HDFS-1734.patch


   1. Write a file to DFS
   2. Browse the file using Name Node UI.
   3. give the chunk size to view as 100 and click the refresh.
   It will say Invalid input ( getnstamp absent )

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1739) When DataNode throws DiskOutOfSpaceException, it will be helpfull to the user if we log the available volume size and configured block size.

2011-03-24 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-1739:
--

Attachment: HDFS-1739.1.patch

 When DataNode throws DiskOutOfSpaceException, it will be helpfull to the user 
 if we log the available volume size and configured block size.
 

 Key: HDFS-1739
 URL: https://issues.apache.org/jira/browse/HDFS-1739
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor
 Attachments: HDFS-1739.1.patch, HDFS-1739.patch


 DataNode will throw DiskOutOfSpaceException for new blcok write if available 
 volume size is less than configured blcok size.
  So, it will be helpfull to the user if we log this details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1774) Optimization in org.apache.hadoop.hdfs.server.datanode.FSDataset class.

2011-03-24 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-1774:
--

Attachment: HDFS-1774.patch

 Optimization in org.apache.hadoop.hdfs.server.datanode.FSDataset class.
 ---

 Key: HDFS-1774
 URL: https://issues.apache.org/jira/browse/HDFS-1774
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-1774.patch


  Inner class FSDir constructor is doing duplicate iterations over the listed 
 files in the passed directory. We can optimize this to single loop and also 
 we can avoid isDirectory check which will perform some native invocations. 
   Consider a case: one directory has only one child directory and 1 
 files. 
 1) First loop will get the number of children directories.
 2) if (numChildren  0) , This condition will satisfy and again it will 
 iterate 10001 times and also will check isDirectory.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1778) Log Improvements in org.apache.hadoop.hdfs.server.datanode.BlockReceiver.

2011-03-24 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-1778:
--

Attachment: HDFS-1778.patch

 Log Improvements in org.apache.hadoop.hdfs.server.datanode.BlockReceiver.
 -

 Key: HDFS-1778
 URL: https://issues.apache.org/jira/browse/HDFS-1778
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-1778.patch


 Here we used so many '+'  operators to construct the log messages. To Avoid 
 the unnecessary string objects creation, we can use string builder and append.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1480) All replicas for a block with repl=2 end up in same rack

2011-03-24 Thread T Meyarivan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010727#comment-13010727
 ] 

T Meyarivan commented on HDFS-1480:
---

Sequence seems to be:

chooseTarget() - chooseLocalRack() - chooseRandom() - isGoodTarget()
(IIUC numOfResults=2 if both replicas for a block with repl=2 are available)

Multiple paths:

[1] In chooseTarget(), writer may not be the decommissioning node
[2] Whole rack (including the writer) is decommissioning = chooseLocalRack() 
picks a node from the same rack as other replica

--



 All replicas for a block with repl=2 end up in same rack
 

 Key: HDFS-1480
 URL: https://issues.apache.org/jira/browse/HDFS-1480
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.2
Reporter: T Meyarivan

 It appears that all replicas of a block can end up in the same rack. The 
 likelihood of such replicas seems to be directly related to decommissioning 
 of nodes. 
 Post rolling OS upgrade (decommission 3-10% of nodes, re-install etc, add 
 them back) of a running cluster, all replicas of about 0.16% of blocks ended 
 up in the same rack.
 Hadoop Namenode UI etc doesn't seem to know about such incorrectly replicated 
 blocks. hadoop fsck .. does report that the blocks must be replicated on 
 additional racks.
 Looking at ReplicationTargetChooser.java, following seem suspect:
 snippet-01:
 {code}
 int maxNodesPerRack =
   (totalNumOfReplicas-1)/clusterMap.getNumOfRacks()+2;
 {code}
 snippet-02:
 {code}
   case 2:
 if (clusterMap.isOnSameRack(results.get(0), results.get(1))) {
   chooseRemoteRack(1, results.get(0), excludedNodes,
blocksize, maxNodesPerRack, results);
 } else if (newBlock){
   chooseLocalRack(results.get(1), excludedNodes, blocksize,
   maxNodesPerRack, results);
 } else {
   chooseLocalRack(writer, excludedNodes, blocksize,
   maxNodesPerRack, results);
 }
 if (--numOfReplicas == 0) {
   break;
 }
 {code}
 snippet-03:
 {code}
 do {
   DatanodeDescriptor[] selectedNodes =
 chooseRandom(1, nodes, excludedNodes);
   if (selectedNodes.length == 0) {
 throw new NotEnoughReplicasException(
  Not able to place enough 
 replicas);
   }
   result = (DatanodeDescriptor)(selectedNodes[0]);
 } while(!isGoodTarget(result, blocksize, maxNodesPerRack, results));
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1780) reduce need to rewrite fsimage on statrtup

2011-03-24 Thread Rajiv Chittajallu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010733#comment-13010733
 ] 

Rajiv Chittajallu commented on HDFS-1780:
-

For those who have been running hadoop for 4+ years, Namenode being able to 
write back updated fsimage back saved us during upgrades. Please don't remove 
this completely, make it optional. 

 reduce need to rewrite fsimage on statrtup
 --

 Key: HDFS-1780
 URL: https://issues.apache.org/jira/browse/HDFS-1780
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Daryn Sharp

 On startup, the namenode will read the fs image, apply edits, then rewrite 
 the fs image.  This requires a non-trivial amount of time for very large 
 directory structures.  Perhaps the namenode should employ some logic to 
 decide that the edits are simple enough that it doesn't warrant rewriting the 
 image back out to disk.
 A few ideas:
 Use the size of the edit logs, if the size is below a threshold, assume it's 
 cheaper to reprocess the edit log instead of writing the image back out.
 Time the processing of the edits and if the time is below a defined 
 threshold, the image isn't rewritten.
 Timing the reading of the image, and the processing of the edits.  Base the 
 decision on the time it would take to write the image (a multiplier is 
 applied to the read time?) versus the time it would take to reprocess the 
 edits.  If a certain threshold (perhaps percentage or expected time to 
 rewrite) is exceeded, rewrite the image.
 Somethingalong the lines of the last suggestion may allow for defaults that 
 adapt for any size cluster, thus eliminating the need to keep tweaking a 
 cluster's settings based on its size.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1725) Cleanup FSImage construction

2011-03-24 Thread Ivan Kelly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Kelly updated HDFS-1725:
-

Attachment: HDFS-1725.diff

 Cleanup FSImage construction
 

 Key: HDFS-1725
 URL: https://issues.apache.org/jira/browse/HDFS-1725
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ivan Kelly
Assignee: Ivan Kelly
 Fix For: 0.23.0

 Attachments: HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, 
 HDFS-1725.diff


 FSImage construction is messy. Sometimes the storagedirectories in use are 
 set straight away, sometimes they are not. This makes it hard for anything 
 under FSImage (i.e. FSEditLog) to make assumptions about what it can use. 
 Therefore, this patch makes FSImage set the storage directories in use during 
 construction, and never allows them to change. If you want to change 
 storagedirectories you create a new image.
 Also, all the construction code should be the same with the only difference 
 being the parameters passed. When not passed, these should get sensible 
 defaults.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1725) Cleanup FSImage construction

2011-03-24 Thread Ivan Kelly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Kelly updated HDFS-1725:
-

Status: Patch Available  (was: Open)

 Cleanup FSImage construction
 

 Key: HDFS-1725
 URL: https://issues.apache.org/jira/browse/HDFS-1725
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ivan Kelly
Assignee: Ivan Kelly
 Fix For: 0.23.0

 Attachments: HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, 
 HDFS-1725.diff


 FSImage construction is messy. Sometimes the storagedirectories in use are 
 set straight away, sometimes they are not. This makes it hard for anything 
 under FSImage (i.e. FSEditLog) to make assumptions about what it can use. 
 Therefore, this patch makes FSImage set the storage directories in use during 
 construction, and never allows them to change. If you want to change 
 storagedirectories you create a new image.
 Also, all the construction code should be the same with the only difference 
 being the parameters passed. When not passed, these should get sensible 
 defaults.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1725) Cleanup FSImage construction

2011-03-24 Thread Ivan Kelly (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010748#comment-13010748
 ] 

Ivan Kelly commented on HDFS-1725:
--

I've address some of your comments in the newest patch. The others I have 
responded to below.

{quote}1. FSImage.java: Make members storage, conf final{quote}

storage cannot be final due to TestSaveNamespace needing to spy.

{quote}1. The log {{NameNode.LOG.info(set FSImage.restoreFailedStorage);}} is 
removed. Is this not necessary?{quote}
NNStorage.restoreFailedStorage already logs this, so the log statement in 
FSImage is redundant. 

{quote}1. The old set of constructors and new are not equivalent. For example: 
FSImage(Configuration) constructor in the old code never set editLog. Similarly 
only one conustros called setCheckpointDirectories(), now all constructors do. 
Is this fine?

Where is the functionality of {{FSImage(URI imageDir)}} imeplemented? 
Corresponding change in CreateEditsLog.java could have been avoided. I think 
this is a valid use case and a constructor and it should nto be removed.
{quote}

FSImage(Configuration) calls FSImage() which calls FSImage(FSNameSystem) which 
creates FSEditLog. It's not very clear, but all constructors eventually call 
FSImage(FSNameSystem). 

Setting the checkpoint directories in all constructors is harmless. It is only 
used for import and SecondaryNameNode (which is there FSImage(Configuration) is 
used). Setting it doesn't cause problems and removes unnecessary branching from 
the code. 

FSImage(Configuration) is also called from FSDirectory(FSNamesystem ns, 
Configuration conf) which is what is called in the default case for primary 
NameNode. 

FSImage(URI) and FSImage(CollectionURI, CollectionURI) are only ever used 
in test code. I don't think it's good to keep divergent code in production 
solely for the purpose of testing, if it can be removed with minimal hassle.

{quote}2. Why are you calling attemptRestoreRemovedStorage() in #reset() 
method? {quote}

reset() should bring the image back to it's initial state. If a storage has 
been removed, then it is not in it's initial state. The actual affect of this 
call is the same as it was before. Before, recoverCreateRead called 
setStorageDirectory, which unlocked which did the same thing. I moved it into 
the call to reset (instead of putting it in setStorageDirectory) because it is 
only needed on reset. The other call to recoverCreateRead is during 
initialisation, so no storage should have been removed by then.

{quote} 4. NameNode.java - why did you remove in #format method, adding 
editDirsToFormat to dirsToFormat? The list of directories passed into FSImage 
in this method are now changed to the list of directories in conf? {quote}

editDirsToFormat is added to dirsToFormat so that it confirms with the user if 
they want to format the directory. Before, you could format an edits dir 
without confirming with the user, which is risky.

The list of directories are the same in both cases. getNamespaceDirs(conf) is 
used in FSImage if no directories are specified. That said, it would be 
clearer, and safer, that the names are explicitly passed to FSImage. Changed. 

{quote}6. SecondaryNameNode.java - in #startCheckpoint() - why did you remove 
unlockAll() call? Also recoverCreate() has attemptRestoreRemovedStorage() and 
unlockAll() call? {quote}

This is similar to the case in BackupNode. setStorageDirectories was only being 
called to reset the state of the FSImage. attemptRestoreRemovedStorage is used 
to try to bring back any removed directories. unlockAll() is only there to 
allow analyseStorage to run (it tries to lock).

{quote}8. TestSaveNamespace.java - why is close() method call to unlock storage 
not required any more? {quote}

I assume this is referring to testSaveWhileEditsRolled(). The whole spy() is 
unnecessary in this test as nothing is spied on. I've not removed it.

The calls to close() and then setStorageDirectories() in these tests are only 
there because mockito doesn't like inner classes, so when you spy FSImage, some 
things go missing. It's because StorageDirectory is an innerclass of Storage, 
so when you initially create the StorageDirectories, they refer to this 
original instance of Storage (NNStorage in this case). Then you set a spy on 
the storage, creating a new instance and replacing the original. The 
storageDirectories still refer to the original, so when you call write and 
things like that, they'll write invalid values for checksums and such. It's a 
really annoying issue, but I'm not sure how to resolve it, so for now we just 
call close() and setStorageDirectories() on NNStorage if we want to spy.




 Cleanup FSImage construction
 

 Key: HDFS-1725
 URL: https://issues.apache.org/jira/browse/HDFS-1725
 Project: Hadoop HDFS
  Issue 

Re: [jira] [Commented] (HDFS-1780) reduce need to rewrite fsimage on statrtup

2011-03-24 Thread Matthew Foley
 Writing image is a (small) fraction of the other start up components. You 
 can find the startup timeline 
 numbers in other jiras. What is the high level problem you are solving?

With other improvements underway, large cluster startup time is down to about 
30 minutes.  Of this, 5 minutes is writing the new FSImage files, even after 
the improvements of HDFS-1071.  So this has become a significant, if not huge, 
part of the startup time.

 For those who have been running hadoop for 4+ years, Namenode being able to 
 write back updated 
 fsimage back saved us during upgrades. Please don't remove this completely, 
 make it optional.

Completely agree that backup copies of this info are vital.  However:

(1) Since the Edits files are also replicated, it is reasonable to think that 
having a matched set of FSImage  Edits is sufficient for this protection; it 
is not vital to have them compacted into an updated FSImage.  I believe the 
proposal is not to eliminate redundant backups, the proposal is simply to not 
view the compacting operation as something vital to do at startup time.

(2) Since most of us running production clusters use Checkpoint Namenodes to do 
the compacting operation (combining the FSImage + Edits = new FSImage, and 
writing out redundant copies of the new FSImage) in background, in order to 
keep the size of the Edits logs under control, it is even less important to do 
a compaction operation during startup.  In fact, it seems to me that only sites 
that do NOT use any sort of Checkpoint Namenode actually have any need to do 
compaction from the Primary Namenode, at startup or otherwise.

So I think Daryn's suggestion is worthwhile.  Doing the compaction at startup 
should still remain an option, for sites not using Checkpoint Namenodes.

[jira] [Commented] (HDFS-1780) reduce need to rewrite fsimage on statrtup

2011-03-24 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010757#comment-13010757
 ] 

Daryn Sharp commented on HDFS-1780:
---

The high level problem is brainstorming how we might short out costly  
unnecessary work.  After Matt Foley's NN startup improvements are complete, it 
appears the image rewrite will become non-trivial.  There's a suggestion to 
background the rewrite, but it's worth considering if/when the rewrite might be 
avoided entirely.

I really like Todd's suggestion, although I think the NN would have to know 
whether it has a reliable and functional 2NN?  I'm still coming up to speed on 
this project, so please forgive any (seemingly obvious) misunderstandings on my 
part.

 reduce need to rewrite fsimage on statrtup
 --

 Key: HDFS-1780
 URL: https://issues.apache.org/jira/browse/HDFS-1780
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Daryn Sharp

 On startup, the namenode will read the fs image, apply edits, then rewrite 
 the fs image.  This requires a non-trivial amount of time for very large 
 directory structures.  Perhaps the namenode should employ some logic to 
 decide that the edits are simple enough that it doesn't warrant rewriting the 
 image back out to disk.
 A few ideas:
 Use the size of the edit logs, if the size is below a threshold, assume it's 
 cheaper to reprocess the edit log instead of writing the image back out.
 Time the processing of the edits and if the time is below a defined 
 threshold, the image isn't rewritten.
 Timing the reading of the image, and the processing of the edits.  Base the 
 decision on the time it would take to write the image (a multiplier is 
 applied to the read time?) versus the time it would take to reprocess the 
 edits.  If a certain threshold (perhaps percentage or expected time to 
 rewrite) is exceeded, rewrite the image.
 Somethingalong the lines of the last suggestion may allow for defaults that 
 adapt for any size cluster, thus eliminating the need to keep tweaking a 
 cluster's settings based on its size.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1774) Optimization in org.apache.hadoop.hdfs.server.datanode.FSDataset class.

2011-03-24 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-1774:
--

Status: Patch Available  (was: Open)

 Optimization in org.apache.hadoop.hdfs.server.datanode.FSDataset class.
 ---

 Key: HDFS-1774
 URL: https://issues.apache.org/jira/browse/HDFS-1774
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-1774.patch


  Inner class FSDir constructor is doing duplicate iterations over the listed 
 files in the passed directory. We can optimize this to single loop and also 
 we can avoid isDirectory check which will perform some native invocations. 
   Consider a case: one directory has only one child directory and 1 
 files. 
 1) First loop will get the number of children directories.
 2) if (numChildren  0) , This condition will satisfy and again it will 
 iterate 10001 times and also will check isDirectory.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1580) Add interface for generic Write Ahead Logging mechanisms

2011-03-24 Thread Ivan Kelly (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010767#comment-13010767
 ] 

Ivan Kelly commented on HDFS-1580:
--

@Todd
Ah ok, since rolling has to be exposed, we should expose it fully. Your segment 
suggestion sounds good. It will require a separate array in FSEditLog though, 
one for JournalManagers, one for EditLogOutputStreams. I'll have to think about 
this. Perhaps we could get rid of the journal manager abstraction completely 
and extend EditLogOutputStream to handle rolling, as rolling is the only 
operation that changes for output with transactions.

Then we could have a separate class that handles stuff like format and getting 
a list of the logs. I'll think about this more and put up a new design.

@Jitendra
1. I think I mentioned the second alternative earlier in the thread, but yes, 
#transfer will go away.
2. A single bookkeeper setup can be shared by many writers. The more writers 
you have the slower the reads, as ledgers are interleaved when written to disk.
3. There's two layouts to think about, the storage layout and the data layout. 
The first will be entirely internal to the JournalManager. The second will need 
to be saved though. 
4. Very good point.
5. Ok
6. Will do

 Add interface for generic Write Ahead Logging mechanisms
 

 Key: HDFS-1580
 URL: https://issues.apache.org/jira/browse/HDFS-1580
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ivan Kelly
 Attachments: HDFS-1580+1521.diff, HDFS-1580.diff, 
 generic_wal_iface.pdf, generic_wal_iface.pdf, generic_wal_iface.txt




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1725) Cleanup FSImage construction

2011-03-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010791#comment-13010791
 ] 

Hadoop QA commented on HDFS-1725:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12474524/HDFS-1725.diff
  against trunk revision 1083958.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 9 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.cli.TestHDFSCLI
  org.apache.hadoop.hdfs.TestDFSShell
  org.apache.hadoop.hdfs.TestFileConcurrentReader

-1 contrib tests.  The patch failed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/281//testReport/
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/281//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/281//console

This message is automatically generated.

 Cleanup FSImage construction
 

 Key: HDFS-1725
 URL: https://issues.apache.org/jira/browse/HDFS-1725
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ivan Kelly
Assignee: Ivan Kelly
 Fix For: 0.23.0

 Attachments: HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, 
 HDFS-1725.diff


 FSImage construction is messy. Sometimes the storagedirectories in use are 
 set straight away, sometimes they are not. This makes it hard for anything 
 under FSImage (i.e. FSEditLog) to make assumptions about what it can use. 
 Therefore, this patch makes FSImage set the storage directories in use during 
 construction, and never allows them to change. If you want to change 
 storagedirectories you create a new image.
 Also, all the construction code should be the same with the only difference 
 being the parameters passed. When not passed, these should get sensible 
 defaults.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1780) reduce need to rewrite fsimage on statrtup

2011-03-24 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010794#comment-13010794
 ] 

dhruba borthakur commented on HDFS-1780:


There is another piece of work being done in another JIRA that compresses the 
fsimage file. So, the time taken to write it out to disk has reduced a lot 
(compared to earlier numbers).

But it still makes sense to make the saving of the fsimage at namenode startup 
time be optional via a config.

 reduce need to rewrite fsimage on statrtup
 --

 Key: HDFS-1780
 URL: https://issues.apache.org/jira/browse/HDFS-1780
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Daryn Sharp

 On startup, the namenode will read the fs image, apply edits, then rewrite 
 the fs image.  This requires a non-trivial amount of time for very large 
 directory structures.  Perhaps the namenode should employ some logic to 
 decide that the edits are simple enough that it doesn't warrant rewriting the 
 image back out to disk.
 A few ideas:
 Use the size of the edit logs, if the size is below a threshold, assume it's 
 cheaper to reprocess the edit log instead of writing the image back out.
 Time the processing of the edits and if the time is below a defined 
 threshold, the image isn't rewritten.
 Timing the reading of the image, and the processing of the edits.  Base the 
 decision on the time it would take to write the image (a multiplier is 
 applied to the read time?) versus the time it would take to reprocess the 
 edits.  If a certain threshold (perhaps percentage or expected time to 
 rewrite) is exceeded, rewrite the image.
 Somethingalong the lines of the last suggestion may allow for defaults that 
 adapt for any size cluster, thus eliminating the need to keep tweaking a 
 cluster's settings based on its size.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1739) When DataNode throws DiskOutOfSpaceException, it will be helpfull to the user if we log the available volume size and configured block size.

2011-03-24 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010802#comment-13010802
 ] 

Uma Maheswara Rao G commented on HDFS-1739:
---

Hi Nicholas,
 Thanks for your comments.
  I fixed them and submitted the patch.

 When DataNode throws DiskOutOfSpaceException, it will be helpfull to the user 
 if we log the available volume size and configured block size.
 

 Key: HDFS-1739
 URL: https://issues.apache.org/jira/browse/HDFS-1739
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor
 Attachments: HDFS-1739.1.patch, HDFS-1739.patch


 DataNode will throw DiskOutOfSpaceException for new blcok write if available 
 volume size is less than configured blcok size.
  So, it will be helpfull to the user if we log this details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1774) Optimization in org.apache.hadoop.hdfs.server.datanode.FSDataset class.

2011-03-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010805#comment-13010805
 ] 

Hadoop QA commented on HDFS-1774:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12474509/HDFS-1774.patch
  against trunk revision 1083958.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.cli.TestHDFSCLI
  org.apache.hadoop.hdfs.server.datanode.TestBlockReport
  org.apache.hadoop.hdfs.server.datanode.TestTransferRbw
  org.apache.hadoop.hdfs.TestDFSShell

-1 contrib tests.  The patch failed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/282//testReport/
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/282//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/282//console

This message is automatically generated.

 Optimization in org.apache.hadoop.hdfs.server.datanode.FSDataset class.
 ---

 Key: HDFS-1774
 URL: https://issues.apache.org/jira/browse/HDFS-1774
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-1774.patch


  Inner class FSDir constructor is doing duplicate iterations over the listed 
 files in the passed directory. We can optimize this to single loop and also 
 we can avoid isDirectory check which will perform some native invocations. 
   Consider a case: one directory has only one child directory and 1 
 files. 
 1) First loop will get the number of children directories.
 2) if (numChildren  0) , This condition will satisfy and again it will 
 iterate 10001 times and also will check isDirectory.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-03-24 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010814#comment-13010814
 ] 

Kihwal Lee commented on HDFS-941:
-

+1 The patch looks good. I was unsure about the new dependency on Guava, but 
apparently people have already agreed on adding it to hadoop-common, so I guess 
it's not an issue.

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1725) Cleanup FSImage construction

2011-03-24 Thread Ivan Kelly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Kelly updated HDFS-1725:
-

Status: Open  (was: Patch Available)

 Cleanup FSImage construction
 

 Key: HDFS-1725
 URL: https://issues.apache.org/jira/browse/HDFS-1725
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ivan Kelly
Assignee: Ivan Kelly
 Fix For: 0.23.0

 Attachments: HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, 
 HDFS-1725.diff


 FSImage construction is messy. Sometimes the storagedirectories in use are 
 set straight away, sometimes they are not. This makes it hard for anything 
 under FSImage (i.e. FSEditLog) to make assumptions about what it can use. 
 Therefore, this patch makes FSImage set the storage directories in use during 
 construction, and never allows them to change. If you want to change 
 storagedirectories you create a new image.
 Also, all the construction code should be the same with the only difference 
 being the parameters passed. When not passed, these should get sensible 
 defaults.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1725) Cleanup FSImage construction

2011-03-24 Thread Ivan Kelly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Kelly updated HDFS-1725:
-

Attachment: HDFS-1725.diff

 Cleanup FSImage construction
 

 Key: HDFS-1725
 URL: https://issues.apache.org/jira/browse/HDFS-1725
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ivan Kelly
Assignee: Ivan Kelly
 Fix For: 0.23.0

 Attachments: HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, 
 HDFS-1725.diff, HDFS-1725.diff


 FSImage construction is messy. Sometimes the storagedirectories in use are 
 set straight away, sometimes they are not. This makes it hard for anything 
 under FSImage (i.e. FSEditLog) to make assumptions about what it can use. 
 Therefore, this patch makes FSImage set the storage directories in use during 
 construction, and never allows them to change. If you want to change 
 storagedirectories you create a new image.
 Also, all the construction code should be the same with the only difference 
 being the parameters passed. When not passed, these should get sensible 
 defaults.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-03-24 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-941:


Attachment: hdfs941-1.png

This is from my own pread test. Local/Remote preads benefit from this patch if 
the content is cached in the page cache. 

 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1773) Remove a datanode from cluster if include list is not empty and this datanode is removed from both include and exclude lists

2011-03-24 Thread Tanping Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010848#comment-13010848
 ] 

Tanping Wang commented on HDFS-1773:


I run test-patch with my patch
 [exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no tests are needed for 
this patch.
 [exec] 
 [exec] -1 javadoc.  The javadoc tool appears to have generated 1 
warning messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] -1 Eclipse classpath. The patch causes the Eclipse classpath to 
differ from the contents of the lib directories.
 [exec] 
 [exec] 
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

So I went ahead run another test-patch with a fake patch whose content is 
empty.  And I got the same result.  I have also run $ant javadoc with and 
without my patch and I got total of 31 javadoc warnings with out without my 
patch.

I also run ant test.  With and without my patch, I have seen the follow tests 
failed on branch-20-security,

[junit] Test org.apache.hadoop.util.TestQueueProcessingStatistics FAILED
[junit] Test org.apache.hadoop.hdfsproxy.TestHdfsProxy FAILED

These two tests are not related with my patch.

 Remove a datanode from cluster if include list is not empty and this datanode 
 is removed from both include and exclude lists
 

 Key: HDFS-1773
 URL: https://issues.apache.org/jira/browse/HDFS-1773
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.20.203.1
 Environment: branch-20-security
Reporter: Tanping Wang
Assignee: Tanping Wang
Priority: Minor
 Fix For: 0.20.4

 Attachments: HDFS-1773-2.patch, HDFS-1773-3.patch, HDFS-1773.patch


 Our service engineering team who operates the clusters on a daily basis 
 founds it is confusing that after a data node is decommissioned, there is no 
 way to make the cluster forget about this data node and it always remains in 
 the dead node list.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1725) Cleanup FSImage construction

2011-03-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010868#comment-13010868
 ] 

Hadoop QA commented on HDFS-1725:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12474536/HDFS-1725.diff
  against trunk revision 1083958.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 9 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.cli.TestHDFSCLI
  org.apache.hadoop.hdfs.server.datanode.TestBlockReport
  org.apache.hadoop.hdfs.TestDFSShell
  org.apache.hadoop.hdfs.TestFileConcurrentReader

-1 contrib tests.  The patch failed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/283//testReport/
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/283//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/283//console

This message is automatically generated.

 Cleanup FSImage construction
 

 Key: HDFS-1725
 URL: https://issues.apache.org/jira/browse/HDFS-1725
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ivan Kelly
Assignee: Ivan Kelly
 Fix For: 0.23.0

 Attachments: HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, 
 HDFS-1725.diff, HDFS-1725.diff


 FSImage construction is messy. Sometimes the storagedirectories in use are 
 set straight away, sometimes they are not. This makes it hard for anything 
 under FSImage (i.e. FSEditLog) to make assumptions about what it can use. 
 Therefore, this patch makes FSImage set the storage directories in use during 
 construction, and never allows them to change. If you want to change 
 storagedirectories you create a new image.
 Also, all the construction code should be the same with the only difference 
 being the parameters passed. When not passed, these should get sensible 
 defaults.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-03-24 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010879#comment-13010879
 ] 

stack commented on HDFS-941:


+1 on commit. Patch looks great though a bit hard to read because its mostly 
white-space changes.  I like the tests.  Im good w/ adding guava.

If a v6, here a few minor comment:

Javadoc on BlockReader is not properly formatted (will show as mess after 
html'ing) -- same for class comment on DN.

gotEOS is odd name for a boolean, would think eos better?

Hard-codings like this, +final int MAX_RETRIES = 3;, should be instead 
gotten from config. even if not declared in hdfs-default.xml?  Same for 
DN_KEEPALIVE_TIMEOUT.

Why would we retry a socket that is throwing an IOE?  Why not close and move on 
with new socket?

Is SocketCache missing a copyright notice?

Is this the right thing to do?

{code}
+SocketAddress remoteAddr = sock.getRemoteSocketAddress();
+if (remoteAddr == null) {
+  return;
+}
{code}

The socket is not cached because it does not have a remote address.  Why does 
it not have a remote address.  Is there something wrong w/ the socket?  Should 
we throw and exception or close and throw away the socket?

There is a tab at #1242 in patch:

{code}+ // restore normal timeout{code}






 Datanode xceiver protocol should allow reuse of a connection
 

 Key: HDFS-941
 URL: https://issues.apache.org/jira/browse/HDFS-941
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: bc Wong
 Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
 HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, hdfs941-1.png


 Right now each connection into the datanode xceiver only processes one 
 operation.
 In the case that an operation leaves the stream in a well-defined state (eg a 
 client reads to the end of a block successfully) the same connection could be 
 reused for a second operation. This should improve random read performance 
 significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1750) fs -ls hftp://file not working

2011-03-24 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-1750:
-

Fix Version/s: 0.20.204

 fs -ls hftp://file not working
 --

 Key: HDFS-1750
 URL: https://issues.apache.org/jira/browse/HDFS-1750
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.21.1
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 0.20.204, 0.21.1, 0.22.0, 0.23.0

 Attachments: h1750_20110314.patch, 
 h1750_20110314_0.20-security.patch, h1750_20110314_0.21.patch


 {noformat}
 hadoop dfs -touchz /tmp/file1 # create file. OK
 hadoop dfs -ls /tmp/file1  # OK
 hadoop dfs -ls hftp://namenode:50070/tmp/file1 # FAILED: not seeing the file
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1734) 'Chunk size to view' option is not working in Name Node UI.

2011-03-24 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010889#comment-13010889
 ] 

Jitendra Nath Pandey commented on HDFS-1734:


Correct, I missed that genstamp was being added as hidden input.

A few more comments:
 1. Please use junit4 for test. You don't have to extend from TestCase.
 2. testGenerateFileDetailsShouldPrintTheGetstampInJSPWriter is too long a 
name. A shorter name like testGenStamp might suffice.
 3. The patch has ^M at the end of every line, probably because the file was 
generated on a windows box. Can you clean that up?
 4.  FileSystem fs = FileSystem.get(CONF);
 I will recommend using fs = cluster.getFileSystem().


 'Chunk size to view' option is not working in Name Node UI.
 ---

 Key: HDFS-1734
 URL: https://issues.apache.org/jira/browse/HDFS-1734
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: ChunkSizeToView.jpg, HDFS-1734.patch


   1. Write a file to DFS
   2. Browse the file using Name Node UI.
   3. give the chunk size to view as 100 and click the refresh.
   It will say Invalid input ( getnstamp absent )

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HDFS-1120) Make DataNode's block-to-device placement policy pluggable

2011-03-24 Thread Harsh J Chouraria (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J Chouraria reassigned HDFS-1120:
---

Assignee: Harsh J Chouraria

 Make DataNode's block-to-device placement policy pluggable
 --

 Key: HDFS-1120
 URL: https://issues.apache.org/jira/browse/HDFS-1120
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Jeff Hammerbacher
Assignee: Harsh J Chouraria

 As discussed on the mailing list, as the number of disk drives per server 
 increases, it would be useful to allow the DataNode's policy for new block 
 placement to grow in sophistication from the current round-robin strategy.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-1773) Remove a datanode from cluster if include list is not empty and this datanode is removed from both include and exclude lists

2011-03-24 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE resolved HDFS-1773.
--

   Resolution: Fixed
Fix Version/s: (was: 0.20.4)
   0.20.204
 Hadoop Flags: [Reviewed]

I have committed this to 0.20-security.  Thanks, Tanping!

 Remove a datanode from cluster if include list is not empty and this datanode 
 is removed from both include and exclude lists
 

 Key: HDFS-1773
 URL: https://issues.apache.org/jira/browse/HDFS-1773
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.20.203.1
 Environment: branch-20-security
Reporter: Tanping Wang
Assignee: Tanping Wang
Priority: Minor
 Fix For: 0.20.204

 Attachments: HDFS-1773-2.patch, HDFS-1773-3.patch, HDFS-1773.patch


 Our service engineering team who operates the clusters on a daily basis 
 founds it is confusing that after a data node is decommissioned, there is no 
 way to make the cluster forget about this data node and it always remains in 
 the dead node list.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1758) Web UI JSP pages thread safety issue

2011-03-24 Thread Tanping Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010902#comment-13010902
 ] 

Tanping Wang commented on HDFS-1758:


It should only be committed to branch-0.20-security only.  HDFS does not have a 
version that exact matches branch-20-security.  The closest was 0.20.203.1.  

 Web UI JSP pages thread safety issue
 

 Key: HDFS-1758
 URL: https://issues.apache.org/jira/browse/HDFS-1758
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 0.20.203.1
 Environment: branch-20-security
Reporter: Tanping Wang
Assignee: Tanping Wang
Priority: Minor
 Fix For: 0.20.4

 Attachments: HDFS-1758.patch


 The set of JSP pages that web UI uses are not thread safe.  We have observed 
 some problems when requesting Live/Dead/Decommissioning pages from the web 
 UI, incorrect page is displayed.  To be more specific, requesting Dead node 
 list page, sometimes, Live node page is returned.  Requesting decommissioning 
 page, sometimes, dead page is returned.
 The root cause of this problem is that JSP page is not thread safe by 
 default.  When multiple requests come in,  each request is assigned to a 
 different thread, multiple threads access the same instance of the servlet 
 class resulted from a JSP page.  A class variable is shared by multiple 
 threads.  The JSP code in 20 branche, for example, dfsnodelist.jsp has
 {code}
 !%
   int rowNum = 0;
   int colNum = 0;
   String sorterField = null;
   String sorterOrder = null;
   String whatNodes = LIVE;
   ...
 %
 {code}
 declared as  class variables.  ( These set of variables are declared within 
 %! code % directives which made them class members. )  Multiple threads 
 share the same set of class member variables, one request would step on 
 anther's toe. 
 However, due to the JSP code refactor, HADOOP-5857, all of these class member 
 variables are moved to become function local variables.  So this bug does not 
 appear in Apache trunk.  Hence, we have proposed to take a simple fix for 
 this bug on 20 branch alone, to be more specific, branch-0.20-security.
 The simple fix is to add jsp ThreadSafe=false directive into the related 
 JSP pages, dfshealth.jsp and dfsnodelist.jsp to make them thread safe, i.e. 
 only on request is processed at each time. 
 We did evaluate the thread safety issue for other JSP pages on trunk, we 
 noticed a potential problem is that when we retrieving some statistics from 
 namenode, for example, we make the call to 
 {code}
 NamenodeJspHelper.getInodeLimitText(fsn);
 {code}
 in dfshealth.jsp, which eventuality is 
 {code}
   static String getInodeLimitText(FSNamesystem fsn) {
 long inodes = fsn.dir.totalInodes();
 long blocks = fsn.getBlocksTotal();
 long maxobjects = fsn.getMaxObjects();
 
 {code}
 some of the function calls are already guarded by readwritelock, e.g. 
 dir.totalInodes, but others are not.  As a result of this, the web ui results 
 are not 100% thread safe.  But after evaluating the prons and cons of adding 
 a giant lock into the JSP pages, we decided not to issue FSNamesystem 
 ReadWrite locks into JSPs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-1781) jsvc executable delivered into wrong package...

2011-03-24 Thread John George (JIRA)
jsvc executable delivered into wrong package...
---

 Key: HDFS-1781
 URL: https://issues.apache.org/jira/browse/HDFS-1781
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: John George
Assignee: John George


The jsvc executable is delivered in the 0.22 hdfs package, but the script that 
uses it (bin/hdfs) refers to
$HADOOP_HOME/bin/jsvc to find it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-1782) FSNamesystem.startFileInternal(..) throws NullPointerException

2011-03-24 Thread John George (JIRA)
FSNamesystem.startFileInternal(..) throws NullPointerException
--

 Key: HDFS-1782
 URL: https://issues.apache.org/jira/browse/HDFS-1782
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: John George
Assignee: John George
 Fix For: 0.22.0


I'm observing when there is one balancer running trying to run another one 
results in
Java.lang.NullPointerException error. I was hoping to see message Another 
balancer is running. 
Exiting  Exiting  This is a reproducible issue.

Details


1) Cluster -elrond

[hdfs@gsbl90568 smilli]$ hadoop version
Hadoop 0.22.0.1102280202
Subversion 
git://hadoopre5.corp.sk1.yahoo.com/home/y/var/builds/thread2/workspace/Cloud-HadoopCOMMON-0.22-Secondary
 -r
c7c9a21d7289e29f0133452acf8b761e455a84b5
Compiled by hadoopqa on Mon Feb 28 02:12:38 PST 2011
From source with checksum 9ecbc6f17e8847a1cddca2282dbd9b31
[hdfs@gsbl90568 smilli]$


2) Run first balancer
[hdfs@gsbl90565 smilli]$ hdfs balancer
11/03/09 16:33:56 INFO balancer.Balancer: namenodes = 
[gsbl90565.blue.ygrid.yahoo.com/98.137.97.57:8020,
gsbl90569.blue.ygrid.yahoo.com/98.137.97.53:8020]
11/03/09 16:33:56 INFO balancer.Balancer: p = 
Balancer.Parameters[BalancingPolicy.Node, threshold=10.0]
Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
Bytes Being Moved
11/03/09 16:33:57 WARN conf.Configuration: mapred.task.id is deprecated. 
Instead, use mapreduce.task.attempt.id
11/03/09 16:33:57 INFO balancer.Balancer: Block token params received from NN: 
keyUpdateInterval=600 min(s),
tokenLifetime=600 min(s)
11/03/09 16:33:57 INFO block.BlockTokenSecretManager: Setting block keys
11/03/09 16:33:57 INFO balancer.Balancer: Balancer will update its block keys 
every 150 minute(s)
11/03/09 16:33:57 INFO block.BlockTokenSecretManager: Setting block keys
11/03/09 16:33:57 INFO balancer.Balancer: Block token params received from NN: 
keyUpdateInterval=600 min(s),
tokenLifetime=600 min(s)
11/03/09 16:33:57 INFO block.BlockTokenSecretManager: Setting block keys
11/03/09 16:33:57 INFO balancer.Balancer: Balancer will update its block keys 
every 150 minute(s)
11/03/09 16:33:57 INFO block.BlockTokenSecretManager: Setting block keys
11/03/09 16:33:57 INFO net.NetworkTopology: Adding a new node: 
/98.137.97.0/98.137.97.62:1004
11/03/09 16:33:57 INFO net.NetworkTopology: Adding a new node: 
/98.137.97.0/98.137.97.58:1004
11/03/09 16:33:57 INFO net.NetworkTopology: Adding a new node: 
/98.137.97.0/98.137.97.60:1004
11/03/09 16:33:57 INFO net.NetworkTopology: Adding a new node: 
/98.137.97.0/98.137.97.59:1004
11/03/09 16:33:57 INFO balancer.Balancer: 1 over-utilized: 
[Source[98.137.97.62:1004, utilization=24.152507825759344]]
11/03/09 16:33:57 INFO balancer.Balancer: 0 underutilized: []
11/03/09 16:33:57 INFO balancer.Balancer: Need to move 207.98 GB to make the 
cluster balanced.
11/03/09 16:33:57 INFO balancer.Balancer: Decided to move 10 GB bytes from 
98.137.97.62:1004 to 98.137.97.58:1004
11/03/09 16:33:57 INFO balancer.Balancer: Will move 10 GB in this iteration
Mar 9, 2011 4:33:57 PM0 0 KB   207.98 GB
  10 GB



.
.
.
11/03/09 16:34:36 INFO balancer.Balancer: Moving block -63570336576981940 from 
98.137.97.62:1004 to 98.137.97.59:1004
through 98.137.97.62:1004 is succeeded.
11/03/09 16:34:39 INFO balancer.Balancer: Moving block 2379736326585824737 from 
98.137.97.62:1004 to 98.137.97.59:1004
through 98.137.97.62:1004 is succeeded.
11/03/09 16:35:21 INFO balancer.Balancer: Moving block 8884583953927078028 from 
98.137.97.62:1004 to 98.137.97.59:1004
through 98.137.97.62:1004 is succeeded.
11/03/09 16:35:24 INFO balancer.Balancer: Moving block -135758138424743964 from 
98.137.97.62:1004 to 98.137.97.59:1004
through 98.137.97.62:1004 is succeeded.
11/03/09 16:35:27 INFO balancer.Balancer: Moving block -4598153351946352185 
from 98.137.97.62:1004 to 98.137.97.59:1004
through 98.137.97.62:1004 is succeeded.
11/03/09 16:35:33 INFO balancer.Balancer: Moving block 2966087210491094643 from 
98.137.97.62:1004 to 98.137.97.59:1004
through 98.137.97.62:1004 is succeeded.
11/03/09 16:35:42 INFO balancer.Balancer: Moving block -5573983508500804184 
from 98.137.97.62:1004 to 98.137.97.59:1004
through 98.137.97.62:1004 is succeeded.
11/03/09 16:35:58 INFO balancer.Balancer: Moving block -6222779741597113957 
from 98.137.97.62:1004 to 98.137.97.59:1004
through 98.137.97.62:1004 is succeeded.







3) Run another balancer observe
[hdfs@gsbl90568 smilli]$ hdfs balancer
11/03/09 16:34:32 INFO balancer.Balancer: namenodes = 
[gsbl90565.blue.ygrid.yahoo.com/98.137.97.57:8020,
gsbl90569.blue.ygrid.yahoo.com/98.137.97.53:8020]
11/03/09 16:34:32 INFO balancer.Balancer: p = 
Balancer.Parameters[BalancingPolicy.Node, threshold=10.0]
Time Stamp   Iteration#  Bytes Already 

[jira] [Created] (HDFS-1783) Ability for HDFS client to write replicas in parallel

2011-03-24 Thread dhruba borthakur (JIRA)
Ability for HDFS client to write replicas in parallel
-

 Key: HDFS-1783
 URL: https://issues.apache.org/jira/browse/HDFS-1783
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Reporter: dhruba borthakur
Assignee: dhruba borthakur


The current implementation of HDFS pipelines the writes to the three replicas. 
This introduces some latency for realtime latency sensitive applications. An 
alternate implementation that allows the client to write all replicas in 
parallel gives much better response times to these applications. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel

2011-03-24 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010955#comment-13010955
 ] 

dhruba borthakur commented on HDFS-1783:


I have a HBase workload that has the hot dataset in cache and the bottleneck 
for this load is that writes to the HDFS transaction log have long latency. 

 Ability for HDFS client to write replicas in parallel
 -

 Key: HDFS-1783
 URL: https://issues.apache.org/jira/browse/HDFS-1783
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Reporter: dhruba borthakur
Assignee: dhruba borthakur

 The current implementation of HDFS pipelines the writes to the three 
 replicas. This introduces some latency for realtime latency sensitive 
 applications. An alternate implementation that allows the client to write all 
 replicas in parallel gives much better response times to these applications. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1758) Web UI JSP pages thread safety issue

2011-03-24 Thread Tanping Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanping Wang updated HDFS-1758:
---

Affects Version/s: (was: 0.20.203.1)
   0.20.204
Fix Version/s: (was: 0.20.4)
   0.20.204

 Web UI JSP pages thread safety issue
 

 Key: HDFS-1758
 URL: https://issues.apache.org/jira/browse/HDFS-1758
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 0.20.204
 Environment: branch-20-security
Reporter: Tanping Wang
Assignee: Tanping Wang
Priority: Minor
 Fix For: 0.20.204

 Attachments: HDFS-1758.patch


 The set of JSP pages that web UI uses are not thread safe.  We have observed 
 some problems when requesting Live/Dead/Decommissioning pages from the web 
 UI, incorrect page is displayed.  To be more specific, requesting Dead node 
 list page, sometimes, Live node page is returned.  Requesting decommissioning 
 page, sometimes, dead page is returned.
 The root cause of this problem is that JSP page is not thread safe by 
 default.  When multiple requests come in,  each request is assigned to a 
 different thread, multiple threads access the same instance of the servlet 
 class resulted from a JSP page.  A class variable is shared by multiple 
 threads.  The JSP code in 20 branche, for example, dfsnodelist.jsp has
 {code}
 !%
   int rowNum = 0;
   int colNum = 0;
   String sorterField = null;
   String sorterOrder = null;
   String whatNodes = LIVE;
   ...
 %
 {code}
 declared as  class variables.  ( These set of variables are declared within 
 %! code % directives which made them class members. )  Multiple threads 
 share the same set of class member variables, one request would step on 
 anther's toe. 
 However, due to the JSP code refactor, HADOOP-5857, all of these class member 
 variables are moved to become function local variables.  So this bug does not 
 appear in Apache trunk.  Hence, we have proposed to take a simple fix for 
 this bug on 20 branch alone, to be more specific, branch-0.20-security.
 The simple fix is to add jsp ThreadSafe=false directive into the related 
 JSP pages, dfshealth.jsp and dfsnodelist.jsp to make them thread safe, i.e. 
 only on request is processed at each time. 
 We did evaluate the thread safety issue for other JSP pages on trunk, we 
 noticed a potential problem is that when we retrieving some statistics from 
 namenode, for example, we make the call to 
 {code}
 NamenodeJspHelper.getInodeLimitText(fsn);
 {code}
 in dfshealth.jsp, which eventuality is 
 {code}
   static String getInodeLimitText(FSNamesystem fsn) {
 long inodes = fsn.dir.totalInodes();
 long blocks = fsn.getBlocksTotal();
 long maxobjects = fsn.getMaxObjects();
 
 {code}
 some of the function calls are already guarded by readwritelock, e.g. 
 dir.totalInodes, but others are not.  As a result of this, the web ui results 
 are not 100% thread safe.  But after evaluating the prons and cons of adding 
 a giant lock into the JSP pages, we decided not to issue FSNamesystem 
 ReadWrite locks into JSPs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1758) Web UI JSP pages thread safety issue

2011-03-24 Thread Tanping Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011012#comment-13011012
 ] 

Tanping Wang commented on HDFS-1758:


Correct Fix Version/s: 
0.20.204 

 Web UI JSP pages thread safety issue
 

 Key: HDFS-1758
 URL: https://issues.apache.org/jira/browse/HDFS-1758
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 0.20.204
 Environment: branch-20-security
Reporter: Tanping Wang
Assignee: Tanping Wang
Priority: Minor
 Fix For: 0.20.204

 Attachments: HDFS-1758.patch


 The set of JSP pages that web UI uses are not thread safe.  We have observed 
 some problems when requesting Live/Dead/Decommissioning pages from the web 
 UI, incorrect page is displayed.  To be more specific, requesting Dead node 
 list page, sometimes, Live node page is returned.  Requesting decommissioning 
 page, sometimes, dead page is returned.
 The root cause of this problem is that JSP page is not thread safe by 
 default.  When multiple requests come in,  each request is assigned to a 
 different thread, multiple threads access the same instance of the servlet 
 class resulted from a JSP page.  A class variable is shared by multiple 
 threads.  The JSP code in 20 branche, for example, dfsnodelist.jsp has
 {code}
 !%
   int rowNum = 0;
   int colNum = 0;
   String sorterField = null;
   String sorterOrder = null;
   String whatNodes = LIVE;
   ...
 %
 {code}
 declared as  class variables.  ( These set of variables are declared within 
 %! code % directives which made them class members. )  Multiple threads 
 share the same set of class member variables, one request would step on 
 anther's toe. 
 However, due to the JSP code refactor, HADOOP-5857, all of these class member 
 variables are moved to become function local variables.  So this bug does not 
 appear in Apache trunk.  Hence, we have proposed to take a simple fix for 
 this bug on 20 branch alone, to be more specific, branch-0.20-security.
 The simple fix is to add jsp ThreadSafe=false directive into the related 
 JSP pages, dfshealth.jsp and dfsnodelist.jsp to make them thread safe, i.e. 
 only on request is processed at each time. 
 We did evaluate the thread safety issue for other JSP pages on trunk, we 
 noticed a potential problem is that when we retrieving some statistics from 
 namenode, for example, we make the call to 
 {code}
 NamenodeJspHelper.getInodeLimitText(fsn);
 {code}
 in dfshealth.jsp, which eventuality is 
 {code}
   static String getInodeLimitText(FSNamesystem fsn) {
 long inodes = fsn.dir.totalInodes();
 long blocks = fsn.getBlocksTotal();
 long maxobjects = fsn.getMaxObjects();
 
 {code}
 some of the function calls are already guarded by readwritelock, e.g. 
 dir.totalInodes, but others are not.  As a result of this, the web ui results 
 are not 100% thread safe.  But after evaluating the prons and cons of adding 
 a giant lock into the JSP pages, we decided not to issue FSNamesystem 
 ReadWrite locks into JSPs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1767) Delay second Block Reports until after cluster finishes startup, to improve startup times

2011-03-24 Thread Matt Foley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011036#comment-13011036
 ] 

Matt Foley commented on HDFS-1767:
--

We could, but I think this is just as effective and the code complexity 
increment is much lower.  Less chance for unforeseen consequences and hidden 
bugs.

 Delay second Block Reports until after cluster finishes startup, to improve 
 startup times
 -

 Key: HDFS-1767
 URL: https://issues.apache.org/jira/browse/HDFS-1767
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Affects Versions: 0.22.0
Reporter: Matt Foley
Assignee: Matt Foley
 Fix For: 0.23.0

 Attachments: DelaySecondBR_v1.patch, table.csv, table_tab.csv


 Consider a large cluster that takes 40 minutes to start up.  The datanodes 
 compete to register and send their Initial Block Reports (IBRs) as fast as 
 they can after startup (subject to a small sub-two-minute random delay, which 
 isn't relevant to this discussion).  
 As each datanode succeeds in sending its IBR, it schedules the starting time 
 for its regular cycle of reports, every hour (or other configured value of 
 dfs.blockreport.intervalMsec). In order to spread the reports evenly across 
 the block report interval, each datanode picks a random fraction of that 
 interval, for the starting point of its regular report cycle.  For example, 
 if a particular datanode ends up randomly selecting 18 minutes after the 
 hour, then that datanode will send a Block Report at 18 minutes after the 
 hour every hour as long as it remains up.  Other datanodes will start their 
 cycles at other randomly selected times.  This code is in 
 DataNode.blockReport() and DataNode.scheduleBlockReport().
 The second Block Report (2BR), is the start of these hourly reports.  The 
 problem is that some of these 2BRs get scheduled sooner rather than later, 
 and actually occur within the startup period.  For example, if the cluster 
 takes 40 minutes (2/3 of an hour) to start up, then out of the datanodes that 
 succeed in sending their IBRs during the first 10 minutes, between 1/2 and 
 2/3 of them will send their 2BR before the 40-minute startup time has 
 completed!
 2BRs sent within the startup time actually compete with the remaining IBRs, 
 and thereby slow down the overall startup process.  This can be seen in the 
 following data, which shows the startup process for a 3700-node cluster that 
 took about 17 minutes to finish startup:
 {noformat}
   timestarts  sum   regs  sum   IBR  sum  2nd_BR sum total_BRs/min
 0   1299799498  3042  3042  1969  1969  151   151  0  151
 1   1299799558   665  3707  1470  3439  248   399  0  248
 2   12997996183707   224  3663  270   669  0  270
 3   1299799678370714  3677  261   9303 3  264
 4   1299799738370723  3700  288  12181 4  289
 5   12997997983707 7  3707  258  14763 7  261
 6   129979985837073707  317  1793411  321
 7   129979991837073707  292  2085617  298
 8   129979997837073707  292  2377825  300
 9   129980003837073707  272  2649 25  272
 10  129980009837073707  280  2929   1540  295
 11  129980015837073707  223  3152   1454  237
 12  129980021837073707  143  3295 54  143
 13  129980027837073707  141  3436   2074  161
 14  129980033837073707  195  3631   78   152  273
 15  129980039837073707   51  3682  209   361  260
 16  129980045837073707   25  3707  369   730  394
 17  129980051837073707   3707  166   896  166
 18  129980057837073707   3707   72   968   72
 19  129980063837073707   3707   67  1035   67
 20  129980069837073707   3707   75  1110   75
 21  129980075837073707   3707   71  1181   71
 22  129980081837073707   3707   67  1248   67
 23  129980087837073707   3707   62  1310   62
 24  129980093837073707   3707   56  1366   56
 25  129980099837073707   3707   60  1426   60
 {noformat}
 This data was harvested from the startup logs of all the datanodes, and 
 correlated into one-minute buckets.  Each row of the table represents the 
 progress during one elapsed minute of clock time.  It seems that every 
 cluster startup is different, but this one showed the effect fairly well.
 The starts column shows that all the nodes started up within the first 2 
 minutes, and the regs column shows 

[jira] [Moved] (HDFS-1784) Extensions to FsShell

2011-03-24 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas moved MAPREDUCE-2404 to HDFS-1784:


Affects Version/s: (was: 0.20.3)
   0.20.3
  Key: HDFS-1784  (was: MAPREDUCE-2404)
  Project: Hadoop HDFS  (was: Hadoop Map/Reduce)

 Extensions to FsShell
 -

 Key: HDFS-1784
 URL: https://issues.apache.org/jira/browse/HDFS-1784
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.20.3
Reporter: Olga Natkovich

 Our project, Pig, exposes FsShell functionality to our end users through a 
 shell command. We want to use this command with no modifications to make sure 
 that whether you work with HDFS through Hadoop or Pig you get identical 
 semantics.
 The main concern that has been recently raised by our users is that there is 
 no way to ignore certain failures that they consider to be benign, for 
 instance, removing a non-existent directory.
 We have 2 asks related to this issue:
 (1) Meaningful error code returned from FsShell (we use java class) so that 
 we can take different actions on different errors
 (2) Unix like ways to tell the command to ignore certain behavior. Here are 
 the commands that we would like to be expanded/implemented:
* rm -f
* rmdir ---ignore-fail-on-non-empty
* mkdir -p 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1767) Delay second Block Reports until after cluster finishes startup, to improve startup times

2011-03-24 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011062#comment-13011062
 ] 

Suresh Srinivas commented on HDFS-1767:
---

I agree with Konstantin on not adding a new configuration for delaying second 
block report as this would be an additional config that needs to be tweaked 
based on the size of the system and startup time it takes. For the first cut, I 
am fine ignoring the second block report. This has no bad side effect.

I liked Dhruba's suggestion. However I feel Dhruba's change is much harder to 
get right, with NN having to do the flow control of block reports from all the 
datanodes.

Another option we could consider is, to send HeartBeatResponse instead of 
DatanodeCommand[] in response to datanode heartbeat request. This response 
could include namenode state such as in safemode, out of safemode etc. This 
information could be used by the datanode to decide when to send the second BR. 
Additionally namenode could communicate other information with datanode, in the 
future, such as load etc. which could help throttle the load on namenode at the 
source.

 Delay second Block Reports until after cluster finishes startup, to improve 
 startup times
 -

 Key: HDFS-1767
 URL: https://issues.apache.org/jira/browse/HDFS-1767
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Affects Versions: 0.22.0
Reporter: Matt Foley
Assignee: Matt Foley
 Fix For: 0.23.0

 Attachments: DelaySecondBR_v1.patch, table.csv, table_tab.csv


 Consider a large cluster that takes 40 minutes to start up.  The datanodes 
 compete to register and send their Initial Block Reports (IBRs) as fast as 
 they can after startup (subject to a small sub-two-minute random delay, which 
 isn't relevant to this discussion).  
 As each datanode succeeds in sending its IBR, it schedules the starting time 
 for its regular cycle of reports, every hour (or other configured value of 
 dfs.blockreport.intervalMsec). In order to spread the reports evenly across 
 the block report interval, each datanode picks a random fraction of that 
 interval, for the starting point of its regular report cycle.  For example, 
 if a particular datanode ends up randomly selecting 18 minutes after the 
 hour, then that datanode will send a Block Report at 18 minutes after the 
 hour every hour as long as it remains up.  Other datanodes will start their 
 cycles at other randomly selected times.  This code is in 
 DataNode.blockReport() and DataNode.scheduleBlockReport().
 The second Block Report (2BR), is the start of these hourly reports.  The 
 problem is that some of these 2BRs get scheduled sooner rather than later, 
 and actually occur within the startup period.  For example, if the cluster 
 takes 40 minutes (2/3 of an hour) to start up, then out of the datanodes that 
 succeed in sending their IBRs during the first 10 minutes, between 1/2 and 
 2/3 of them will send their 2BR before the 40-minute startup time has 
 completed!
 2BRs sent within the startup time actually compete with the remaining IBRs, 
 and thereby slow down the overall startup process.  This can be seen in the 
 following data, which shows the startup process for a 3700-node cluster that 
 took about 17 minutes to finish startup:
 {noformat}
   timestarts  sum   regs  sum   IBR  sum  2nd_BR sum total_BRs/min
 0   1299799498  3042  3042  1969  1969  151   151  0  151
 1   1299799558   665  3707  1470  3439  248   399  0  248
 2   12997996183707   224  3663  270   669  0  270
 3   1299799678370714  3677  261   9303 3  264
 4   1299799738370723  3700  288  12181 4  289
 5   12997997983707 7  3707  258  14763 7  261
 6   129979985837073707  317  1793411  321
 7   129979991837073707  292  2085617  298
 8   129979997837073707  292  2377825  300
 9   129980003837073707  272  2649 25  272
 10  129980009837073707  280  2929   1540  295
 11  129980015837073707  223  3152   1454  237
 12  129980021837073707  143  3295 54  143
 13  129980027837073707  141  3436   2074  161
 14  129980033837073707  195  3631   78   152  273
 15  129980039837073707   51  3682  209   361  260
 16  129980045837073707   25  3707  369   730  394
 17  129980051837073707   3707  166   896  166
 18  129980057837073707   3707   72   968   72
 19  129980063837073707   3707   67  1035   67
 20  

[jira] [Created] (HDFS-1785) Cleanup BlockReceiver and DataXceiver

2011-03-24 Thread Tsz Wo (Nicholas), SZE (JIRA)
Cleanup BlockReceiver and DataXceiver
-

 Key: HDFS-1785
 URL: https://issues.apache.org/jira/browse/HDFS-1785
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Tsz Wo (Nicholas), SZE


{{clientName.length()}} is used multiple times for determining whether the 
source is a client or a datanode.
{code}
if (clientName.length() == 0) {
//it is a datanode
}
{code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1785) Cleanup BlockReceiver and DataXceiver

2011-03-24 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-1785:
-

Attachment: h1785_20110324.patch

h1785_20110324.patch: simple code cleanup

 Cleanup BlockReceiver and DataXceiver
 -

 Key: HDFS-1785
 URL: https://issues.apache.org/jira/browse/HDFS-1785
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Tsz Wo (Nicholas), SZE
 Attachments: h1785_20110324.patch


 {{clientName.length()}} is used multiple times for determining whether the 
 source is a client or a datanode.
 {code}
 if (clientName.length() == 0) {
 //it is a datanode
 }
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1785) Cleanup BlockReceiver and DataXceiver

2011-03-24 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-1785:
-

Assignee: Tsz Wo (Nicholas), SZE
  Status: Patch Available  (was: Open)

 Cleanup BlockReceiver and DataXceiver
 -

 Key: HDFS-1785
 URL: https://issues.apache.org/jira/browse/HDFS-1785
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h1785_20110324.patch


 {{clientName.length()}} is used multiple times for determining whether the 
 source is a client or a datanode.
 {code}
 if (clientName.length() == 0) {
 //it is a datanode
 }
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira