Re: [DISCUSSION] Merging HDFS-7240 Object Store (Ozone) to trunk

Konstantin Shvachko Sat, 04 Nov 2017 22:29:31 -0700

Hi Sanjay,

Read your doc. I clearly see the value of Ozone with your use cases, but I
agree with Stack and others the question why it should be a part of Hadoop
isn't clear. More details in the jira:


https://issues.apache.org/jira/browse/HDFS-7240?focusedCommentId=16239313&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16239313

Thanks,
--Konstantin

On Fri, Nov 3, 2017 at 1:56 PM, sanjay Radia <[email protected]> wrote:

> Konstantine,
>  Thanks for your comments, questions and feedback. I have attached a
> document to the HDFS-7240 jira
>  that explains a design for scaling HDFS and how Ozone paves the way
> towards the full solution.
>
>
> https://issues.apache.org/jira/secure/attachment/
> 12895963/HDFS%20Scalability%20and%20Ozone.pdf
>
>
> sanjay
>
>
>
>
> On Oct 28, 2017, at 2:00 PM, Konstantin Shvachko <[email protected]>
> wrote:
>
> Hey guys,
>
> It is an interesting question whether Ozone should be a part of Hadoop.
> There are two main reasons why I think it should not.
>
> 1. With close to 500 sub-tasks, with 6 MB of code changes, and with a
> sizable community behind, it looks to me like a whole new project.
> It is essentially a new storage system, with different (than HDFS)
> architecture, separate S3-like APIs. This is really great - the World sure
> needs more distributed file systems. But it is not clear why Ozone should
> co-exist with HDFS under the same roof.
>
> 2. Ozone is probably just the first step in rebuilding HDFS under a new
> architecture. With the next steps presumably being HDFS-10419 and
> HDFS-11118.
> The design doc for the new architecture has never been published. I can
> only assume based on some presentations and personal communications that
> the idea is to use Ozone as a block storage, and re-implement NameNode, so
> that it stores only a partial namesapce in memory, while the bulk of it
> (cold data) is persisted to a local storage.
> Such architecture makes me wonder if it solves Hadoop's main problems.
> There are two main limitations in HDFS:
>  a. The throughput of Namespace operations. Which is limited by the number
> of RPCs the NameNode can handle
>  b. The number of objects (files + blocks) the system can maintain. Which
> is limited by the memory size of the NameNode.
> The RPC performance (a) is more important for Hadoop scalability than the
> object count (b). The read RPCs being the main priority.
> The new architecture targets the object count problem, but in the expense
> of the RPC throughput. Which seems to be a wrong resolution of the
> tradeoff.
> Also based on the use patterns on our large clusters we read up to 90% of
> the data we write, so cold data is a small fraction and most of it must be
> cached.
>
> To summarize:
> - Ozone is a big enough system to deserve its own project.
> - The architecture that Ozone leads to does not seem to solve the intrinsic
> problems of current HDFS.
>
> I will post my opinion in the Ozone jira. Should be more convenient to
> discuss it there for further reference.
>
> Thanks,
> --Konstantin
>
>
>
> On Wed, Oct 18, 2017 at 6:54 PM, Yang Weiwei <[email protected]>
> wrote:
>
> Hello everyone,
>
>
> I would like to start this thread to discuss merging Ozone (HDFS-7240) to
> trunk. This feature implements an object store which can co-exist with
> HDFS. Ozone is disabled by default. We have tested Ozone with cluster sizes
> varying from 1 to 100 data nodes.
>
>
>
> The merge payload includes the following:
>
>  1.  All services, management scripts
>  2.  Object store APIs, exposed via both REST and RPC
>  3.  Master service UIs, command line interfaces
>  4.  Pluggable pipeline Integration
>  5.  Ozone File System (Hadoop compatible file system implementation,
> passes all FileSystem contract tests)
>  6.  Corona - a load generator for Ozone.
>  7.  Essential documentation added to Hadoop site.
>  8.  Version specific Ozone Documentation, accessible via service UI.
>  9.  Docker support for ozone, which enables faster development cycles.
>
>
> To build Ozone and run ozone using docker, please follow instructions in
> this wiki page. https://cwiki.apache.org/confl
> uence/display/HADOOP/Dev+cluster+with+docker.
>
>
> We have built a passionate and diverse community to drive this feature
> development. As a team, we have achieved significant progress in past 3
> years since first JIRA for HDFS-7240 was opened on Oct 2014. So far, we
> have resolved almost 400 JIRAs by 20+ contributors/committers from
> different countries and affiliations. We also want to thank the large
> number of community members who were supportive of our efforts and
> contributed ideas and participated in the design of ozone.
>
>
> Please share your thoughts, thanks!
>
>
> -- Weiwei Yang
>
>
>
> On Wed, Oct 18, 2017 at 6:54 PM, Yang Weiwei <[email protected]>
> wrote:
>
> Hello everyone,
>
>
> I would like to start this thread to discuss merging Ozone (HDFS-7240) to
> trunk. This feature implements an object store which can co-exist with
> HDFS. Ozone is disabled by default. We have tested Ozone with cluster sizes
> varying from 1 to 100 data nodes.
>
>
>
> The merge payload includes the following:
>
>  1.  All services, management scripts
>  2.  Object store APIs, exposed via both REST and RPC
>  3.  Master service UIs, command line interfaces
>  4.  Pluggable pipeline Integration
>  5.  Ozone File System (Hadoop compatible file system implementation,
> passes all FileSystem contract tests)
>  6.  Corona - a load generator for Ozone.
>  7.  Essential documentation added to Hadoop site.
>  8.  Version specific Ozone Documentation, accessible via service UI.
>  9.  Docker support for ozone, which enables faster development cycles.
>
>
> To build Ozone and run ozone using docker, please follow instructions in
> this wiki page. https://cwiki.apache.org/confluence/display/HADOOP/Dev+
> cluster+with+docker.
>
>
> We have built a passionate and diverse community to drive this feature
> development. As a team, we have achieved significant progress in past 3
> years since first JIRA for HDFS-7240 was opened on Oct 2014. So far, we
> have resolved almost 400 JIRAs by 20+ contributors/committers from
> different countries and affiliations. We also want to thank the large
> number of community members who were supportive of our efforts and
> contributed ideas and participated in the design of ozone.
>
>
> Please share your thoughts, thanks!
>
>
> -- Weiwei Yang
>
>
>

Re: [DISCUSSION] Merging HDFS-7240 Object Store (Ozone) to trunk

Reply via email to