[jira] [Updated] (HDFS-5324) Make Namespace implementation pluggable in the namenode

2015-02-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-5324:
---
Status: Open  (was: Patch Available)

Cancelling patch as it no longer applies.

 Make Namespace implementation pluggable in the namenode
 ---

 Key: HDFS-5324
 URL: https://issues.apache.org/jira/browse/HDFS-5324
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.1.1-beta
 Environment: All
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Attachments: AbstractNamesystem.java, Checklist Of Changes.docx, 
 trunk_1544305_12-12-13.patch


 For the last couple of months, we have been working on making Namespace
 implementation in the namenode pluggable. We have demonstrated that it can
 be done without major surgery on the namenode, and does not have noticeable
 performance impact. We would like to contribute it back to Apache HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-5324) Make Namespace implementation pluggable in the namenode

2015-02-06 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-5324:
---
Status: Open  (was: Patch Available)

 Make Namespace implementation pluggable in the namenode
 ---

 Key: HDFS-5324
 URL: https://issues.apache.org/jira/browse/HDFS-5324
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.1.1-beta
 Environment: All
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Attachments: AbstractNamesystem.java, Checklist Of Changes.docx, 
 trunk_1544305_12-12-13.patch


 For the last couple of months, we have been working on making Namespace
 implementation in the namenode pluggable. We have demonstrated that it can
 be done without major surgery on the namenode, and does not have noticeable
 performance impact. We would like to contribute it back to Apache HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-5324) Make Namespace implementation pluggable in the namenode

2015-02-06 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-5324:
---
Status: Patch Available  (was: Open)

 Make Namespace implementation pluggable in the namenode
 ---

 Key: HDFS-5324
 URL: https://issues.apache.org/jira/browse/HDFS-5324
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.1.1-beta
 Environment: All
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Attachments: AbstractNamesystem.java, Checklist Of Changes.docx, 
 trunk_1544305_12-12-13.patch


 For the last couple of months, we have been working on making Namespace
 implementation in the namenode pluggable. We have demonstrated that it can
 be done without major surgery on the namenode, and does not have noticeable
 performance impact. We would like to contribute it back to Apache HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-5324) Make Namespace implementation pluggable in the namenode

2014-09-14 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-5324:
---
Fix Version/s: (was: 3.0.0)

 Make Namespace implementation pluggable in the namenode
 ---

 Key: HDFS-5324
 URL: https://issues.apache.org/jira/browse/HDFS-5324
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.1.1-beta
 Environment: All
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Attachments: AbstractNamesystem.java, Checklist Of Changes.docx, 
 trunk_1544305_12-12-13.patch


 For the last couple of months, we have been working on making Namespace
 implementation in the namenode pluggable. We have demonstrated that it can
 be done without major surgery on the namenode, and does not have noticeable
 performance impact. We would like to contribute it back to Apache HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-5324) Make Namespace implementation pluggable in the namenode

2014-01-15 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated HDFS-5324:


Attachment: Checklist Of Changes.docx

This document lists all the changes to existing trunk code to introduce 
AbstractNameSystem, which can be overridden ( new concrete class specified via 
configuration variable) to make namespace pluggable.

 Make Namespace implementation pluggable in the namenode
 ---

 Key: HDFS-5324
 URL: https://issues.apache.org/jira/browse/HDFS-5324
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.1.1-beta
 Environment: All
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Fix For: 3.0.0

 Attachments: AbstractNamesystem.java, Checklist Of Changes.docx


 For the last couple of months, we have been working on making Namespace
 implementation in the namenode pluggable. We have demonstrated that it can
 be done without major surgery on the namenode, and does not have noticeable
 performance impact. We would like to contribute it back to Apache HDFS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5324) Make Namespace implementation pluggable in the namenode

2014-01-15 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated HDFS-5324:


Attachment: trunk_1544305_12-12-13.patch

Current patch for trunk. Even if the size of the patch is huge, most of it is 
actually refactoring (i.e. moving code from FSNameSystem etc to 
AbstractNameSystem). In our setup, most tests run. A handful of tests that 
fail, also fail on unpatched trunk.

 Make Namespace implementation pluggable in the namenode
 ---

 Key: HDFS-5324
 URL: https://issues.apache.org/jira/browse/HDFS-5324
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.1.1-beta
 Environment: All
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Fix For: 3.0.0

 Attachments: AbstractNamesystem.java, Checklist Of Changes.docx, 
 trunk_1544305_12-12-13.patch


 For the last couple of months, we have been working on making Namespace
 implementation in the namenode pluggable. We have demonstrated that it can
 be done without major surgery on the namenode, and does not have noticeable
 performance impact. We would like to contribute it back to Apache HDFS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5324) Make Namespace implementation pluggable in the namenode

2014-01-15 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated HDFS-5324:


Status: Patch Available  (was: Open)

 Make Namespace implementation pluggable in the namenode
 ---

 Key: HDFS-5324
 URL: https://issues.apache.org/jira/browse/HDFS-5324
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.1.1-beta
 Environment: All
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Fix For: 3.0.0

 Attachments: AbstractNamesystem.java, Checklist Of Changes.docx, 
 trunk_1544305_12-12-13.patch


 For the last couple of months, we have been working on making Namespace
 implementation in the namenode pluggable. We have demonstrated that it can
 be done without major surgery on the namenode, and does not have noticeable
 performance impact. We would like to contribute it back to Apache HDFS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5324) Make Namespace implementation pluggable in the namenode

2013-10-12 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated HDFS-5324:


Attachment: AbstractNamesystem.java

Although this file is big (5400 lines), it is essentially code exported from 
existing FSNamesystem.java, with some methods marked abstract.

 Make Namespace implementation pluggable in the namenode
 ---

 Key: HDFS-5324
 URL: https://issues.apache.org/jira/browse/HDFS-5324
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.1.1-beta
 Environment: All
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Fix For: 3.0.0

 Attachments: AbstractNamesystem.java


 For the last couple of months, we have been working on making Namespace
 implementation in the namenode pluggable. We have demonstrated that it can
 be done without major surgery on the namenode, and does not have noticeable
 performance impact. We would like to contribute it back to Apache HDFS.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5324) Make Namespace implementation pluggable in the namenode

2013-10-08 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated HDFS-5324:


Description: 
For the last couple of months, we have been working on making Namespace
implementation in the namenode pluggable. We have demonstrated that it can
be done without major surgery on the namenode, and does not have noticeable
performance impact. We would like to contribute it back to Apache HDFS.


  was:
[For more details on the proposal, and feedback from the community, please 
refer to the discussions on hdfs-dev mailing list: 
http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201310.mbox/%3CCAAQrVk4NKUtLh36pdFQmS77n%2Bfe49wcy5-DoQW7EZMPLSJPzmQ%40mail.gmail.com%3E]

Exec Summary: For the last couple of months, we have been working on making 
Namespace
implementation in the namenode pluggable. We have demonstrated that it can
be done without major surgery on the namenode, and does not have noticeable
performance impact. We would like to contribute it back to Apache HDFS.

Rationale:

In a Hadoop cluster, Namenode roughly has following main responsibilities.
• Catering to RPC calls from clients.
• Managing the HDFS namespace tree.
• Managing block report, heartbeat and other communication from data nodes.

For Hadoop clusters having large number of files and large number of nodes,
name node gets bottlenecked. Mainly for two reasons
• All the information is kept in name node’s main memory.
• Namenode has to cater to all the request from clients / data nodes.
• And also perform some operations for backup and check pointing node.

A possible solution is to add more main memory but there are certain issues
with this approach
• Namenode being Java application, garbage collection cycles execute
periodically to reclaim unreferenced heap space. When the heap space grows
very large, despite of GC policy  chosen, application stalls during the GC
activity. This creates a bunch of issues since DNs and  clients may
perceive this stall as NN crash.
• There will always be a practical limit on how much physical memory a
single machine can accommodate.

Proposed Solution:

Out of the three responsibilities listed above, we can refactor namespace
management from the namenode codebase in such a way that there is provision
to implement and plug other name systems other than existing in-process
memory-based name system. Particularly a name system backed by a
distributed key-value store will significantly reduce namenode memory
requirement.To achieve this, a new generic interface will be introduced
[Let’s call it AbstractNameSystem] which defines set of operations using
which we perform the namespace management. Namenode code that used to
manipulate some java objects maintained in namenode’s heap will now operate
on this interface. There will be provision for others to extend this
interface and plug their own NameSystem implementation.

To get started, we have implemented the same memory-based namespace
implementation in a remote process, outside of the namenode JVM. In
addition, work is undergoing to implement the namesystem using HBase.

Details of Changes:

Created new class called AbstractNamesystem, existing FSNamesystem is a
subclass of this class. Some code from FSNamesystem has been moved to its
parent. Created a Factory class to create object of NS management
class.Factory refers to newly added config properties to support pluggable
name space management class. Added unit tests for Factory. Replaced
constructors with factory calls, this is  because the namesystem instances
should now be created based on configuration. Added new config properties
to support pluggable name space management class. This property will decide
which Namesystem class will be instantiated by the factory. This change is
also reflected in some DFS related webapps [JSP files] where namesystem
instance is used to obtain DFS health and other stats.

These changes aim to make the namesystem pluggable without changing high
level interfaces, this is particularly tricky since memory-based name
system functionality is currently baked into these interfaces, and ultimate
goal is to make the high level interface free from memory-based name system.

Consideration for Upgrade and Rollback:

Current memory based implementation already has code to read from and write
to fsimage , we will have to make them publicly accessible which will
enable us to upgrade an existing cluster from FSNamespace to newly added
name system in future version.

a. Upgrades: By making use of existing Loader class for reading fsimage we
can write some code load this image into the future name system
implementation.

b. Rollback: Are even simpler, we can preserve the old fsimage and start
the cluster with that image by configuring the cluster to use current file
system based name system.

Future work

Current HDFS design is such that FSNameSystem is baked into even