[jira] [Commented] (HDFS-4945) A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS

2013-07-02 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698281#comment-13698281
 ] 

Konstantin Shvachko commented on HDFS-4945:
---

Yonghwan, the ideas sound interesting. But it looks to me like a new file 
system rather than a new feature of HDFS. Do you plan to replace HDFS or evolve 
it?
I've been working on 
[Giraffa|http://code.google.com/a/apache-extras.org/p/giraffa/source/browse/?name=trunk]
 project. Is it similar to your ideas?
You were saying "we" on several occasions. Who do you mean?

> A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS
> --
>
> Key: HDFS-4945
> URL: https://issues.apache.org/jira/browse/HDFS-4945
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: auto-failover
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Yonghwan Kim
>  Labels: documentation
>
> See the following comment for detailed description.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4945) A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS

2013-07-01 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696782#comment-13696782
 ] 

Uma Maheswara Rao G commented on HDFS-4945:
---

Most of the Questions Suresh already asked for more clarity on this feature.

I have one question to know:

{quote}
 When each fragment has k replicas, the file system can tolerate up to 
floor(k/2 - 1) faulty NameNodes.
{quote}
How/where you will manage this fragment details metadata?


Regards,
Uma

> A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS
> --
>
> Key: HDFS-4945
> URL: https://issues.apache.org/jira/browse/HDFS-4945
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: auto-failover
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Yonghwan Kim
>  Labels: documentation
>
> See the following comment for detailed description.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4945) A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS

2013-06-29 Thread Yonghwan Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696278#comment-13696278
 ] 

Yonghwan Kim commented on HDFS-4945:


Dear Suresh Srinivas,

Thank you for your helpful comments.

1. As you said, in the case of HDFS 2.0, an NFS is necessary to share edit logs 
between 2 NNs.
However, we simple thought an NFS can be a new SPOF, so some way to guarantee 
the fault-tolerance of an NFS should be needed (i.e. RAID or network 
multiplexing). We just express it(fault-tolerant technique) in a word 'reliable 
sophisticated storage'.
Actually, latest version of HDFS 2.0(maybe 2.0.5) adopts QJM(HDFS-3077), and 
one of its requirements is "No requirement for special hardware". Likewise, we 
want to propose a new architecture using only commodity hardwares.

2&3. I deeply apologize for absence of detail information. I'm writing a 
technical document (paper) (including background, related works, design, proof 
of consistency) but not completed yet. Please wait for a while and I'll add 
some information here before complete the writing our paper.

4. We just had tested the system's overhead for synchronizing among NNs with 
some dummy data. After the completion of writing document, we will implement a 
prototype with utmost effort.
(We already had proved that our system is consistent in theory)


Affects Version is my mistake. I'll edit it to HDFS-1623. Thanks.

Sorry for my poor English.


> A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS
> --
>
> Key: HDFS-4945
> URL: https://issues.apache.org/jira/browse/HDFS-4945
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: auto-failover
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Yonghwan Kim
>  Labels: documentation
>
> See the following comment for detailed description.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4945) A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS

2013-06-29 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696269#comment-13696269
 ] 

Suresh Srinivas commented on HDFS-4945:
---

Some comments:
# I have hardtime understanding "two NameNodes have to share a highly-reliable 
sophisticated storage" - what is sophisticated about having an NFS mount point, 
currently used for storing FSImage?
# For the sake of completeness please include information related federation 
which can support multiple namenodes in a cluster
# Please add a design document.
# Do you have a prototype working?


Why is Affects Version/s set to HDFS-1623?

> A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS
> --
>
> Key: HDFS-4945
> URL: https://issues.apache.org/jira/browse/HDFS-4945
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: auto-failover
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Yonghwan Kim
>  Labels: documentation
>
> See the following comment for detailed description.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4945) A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS

2013-06-29 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696268#comment-13696268
 ] 

Suresh Srinivas commented on HDFS-4945:
---

[~chaika1015] I have moved the long description to a subsequent comment.

> A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS
> --
>
> Key: HDFS-4945
> URL: https://issues.apache.org/jira/browse/HDFS-4945
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: auto-failover
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Yonghwan Kim
>  Labels: documentation
>
> See the following comment for detailed description.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4945) A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS

2013-06-29 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696267#comment-13696267
 ] 

Suresh Srinivas commented on HDFS-4945:
---

Recently, Hadoop attracts much attention of engineers and researchers as an 
emerging and effective framework for Big Data.
HDFS(Hadoop Distributed File System) can manage huge amount of data with 
guaranteeing high performance and reliability 
with only commodity hardware. 

However, HDFS requires a single master node, called NameNode, to manage the 
entire namespace (or all the i-nodes) 
of a file system. This causes SPOF (Single Point Of Failure) problem because 
the file system becomes inaccessible 
when the NameNode fails. (HDFS-2064)

This also causes a bottleneck of efficiency since all the access requests to 
the file system have to contact the 
NameNode. Hadoop 2.0 resolves the SPOF problem by introducing manual failover 
based on two NameNodes, Active and Standby.
However, it still has the efficiency bottleneck problem since all the access 
requests have to contact the Active 
in ordinary executions. It may also lose an advantage of using commodity 
hardware since the two NameNodes have to 
share a highly-reliable sophisticated storage.

We here propose a new HDFS architecture to resolve all the problems mentioned 
above.
The proposed architecture has the following features and advantages.

1. Multiple NameNodes (not restricted to two) can be utilized to improve 
availability.  
The entire namespace of a file system is partitioned into several fragments, 
and replicas of each fragment are 
dispersed among the NameNodes.  When each fragment has k replicas, the file 
system can tolerate up to 
floor(k/2 - 1) faulty NameNodes.

2. Multiple NameNodes can be utilized to improve performance. The performance 
bottleneck caused by a single 
NameNode can be circumvented by assigning different NameNodes to different 
fragments as the primary ones 
(or the entry points).

3. The highly-reliable storage shared by the NameNodes is removed by 
introducing message-based consistency 
mechanism among the NameNodes.  The architecture requires only commodity 
hardware.


> A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS
> --
>
> Key: HDFS-4945
> URL: https://issues.apache.org/jira/browse/HDFS-4945
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: auto-failover
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Yonghwan Kim
>  Labels: documentation
>
> See the following comment for detailed description.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira