[ 
https://issues.apache.org/jira/browse/HDFS-5909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jay vyas updated HDFS-5909:
---------------------------

    Description: 
The HDFS NameNode is tricky to keep track of in terms of HCFS FileSystem 
compatibility.  Magic that occurs inside of it is :

1) Essential to certain performant applications of hadoop (i.e. hbase).

2) An extension of FileSystem interface properties which isn't necessarily 
dependant on HDFS.  

For example, in HDFS-5902, the idea of atomic directory swapping comes up.  
This can be done in HDFS, but also, maybe in a generic YARN app which provides 
general distributed FS metadata services to any HCFS provider.

In addition to making NameNode easier to maintain, this would gaurantee broader 
compatibility of the HCFS ecosystem. 

Now, just brainstorming here, but... Lets take a typical FS operation:  High 
volume writes.   In a distributed file system, these can cause performance 
issues when lots of files are created.  YARN itself provides a framework which 
is generic enough , that we can write a YARN service which, when running, could 
generically implement certain distributed FS metadata operations.  Then, HCFS 
providers would be obligated to provide hooks in their API implementations to 
update the running YARN distributed metadata services.

Obviously implementation details are complex.

But would it be possible to offload some of the complexity of NameNode into a 
YARN utility which the HCFS community can leverage?

This is a very raw idea, and if its impossible or not practical, I'm open to 
feedback.  But in the end I think it could be significant innovation both for 
HDFS (it would make it more maintainable by offloading shared meta data 
operations), as well as other HCFS tools (because we would be able to share 
some of the logic that HDFS implements for synchronizing distribtued FS 
metadata).

  was:
The HDFS NameNode is a major barrier to HCFS FileSystem compatibility.  Magic 
that occurs inside of it is :

1) Essential to certain performant applications of hadoop (i.e. hbase).

2) An extension of FileSystem interface properties which isn't necessarily 
dependant on HDFS.  

For example, in HDFS-5902, the idea of atomic directory swapping comes up.  
This can be done in HDFS, but also, maybe in a generic YARN app which provides 
general distributed FS metadata services to any HCFS provider.

In addition to making NameNode easier to maintain, this would gaurantee broader 
compatibility of the HCFS ecosystem. 

Now, just brainstorming here, but... Lets take a typical FS operation:  High 
volume writes.   In a distributed file system, these can cause performance 
issues when lots of files are created.  YARN itself provides a framework which 
is generic enough , that we can write a YARN service which, when running, could 
generically implement certain distributed FS metadata operations.  Then, HCFS 
providers would be obligated to provide hooks in their API implementations to 
update the running YARN distributed metadata services.

Obviously implementation details are complex.

But would it be possible to offload some of the complexity of NameNode into a 
YARN utility which the HCFS community can leverage?

This is a very raw idea, and if its impossible or not practical, I'm open to 
feedback.  But in the end I think it could be significant innovation both for 
HDFS (it would make it more maintainable by offloading shared meta data 
operations), as well as other HCFS tools (because we would be able to share 
some of the logic that HDFS implements for synchronizing distribtued FS 
metadata).


> NameNode : Can some features be implemented in a pure YARN app, to synergize 
> broader uniformity for HCFS distributed FS metadata operations? 
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-5909
>                 URL: https://issues.apache.org/jira/browse/HDFS-5909
>             Project: Hadoop HDFS
>          Issue Type: Wish
>            Reporter: jay vyas
>
> The HDFS NameNode is tricky to keep track of in terms of HCFS FileSystem 
> compatibility.  Magic that occurs inside of it is :
> 1) Essential to certain performant applications of hadoop (i.e. hbase).
> 2) An extension of FileSystem interface properties which isn't necessarily 
> dependant on HDFS.  
> For example, in HDFS-5902, the idea of atomic directory swapping comes up.  
> This can be done in HDFS, but also, maybe in a generic YARN app which 
> provides general distributed FS metadata services to any HCFS provider.
> In addition to making NameNode easier to maintain, this would gaurantee 
> broader compatibility of the HCFS ecosystem. 
> Now, just brainstorming here, but... Lets take a typical FS operation:  High 
> volume writes.   In a distributed file system, these can cause performance 
> issues when lots of files are created.  YARN itself provides a framework 
> which is generic enough , that we can write a YARN service which, when 
> running, could generically implement certain distributed FS metadata 
> operations.  Then, HCFS providers would be obligated to provide hooks in 
> their API implementations to update the running YARN distributed metadata 
> services.
> Obviously implementation details are complex.
> But would it be possible to offload some of the complexity of NameNode into a 
> YARN utility which the HCFS community can leverage?
> This is a very raw idea, and if its impossible or not practical, I'm open to 
> feedback.  But in the end I think it could be significant innovation both for 
> HDFS (it would make it more maintainable by offloading shared meta data 
> operations), as well as other HCFS tools (because we would be able to share 
> some of the logic that HDFS implements for synchronizing distribtued FS 
> metadata).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to