Konstantine, 
 Thanks for your comments, questions and feedback. I have attached a document 
to the HDFS-7240 jira 
 that explains a design for scaling HDFS and how Ozone paves the way towards 
the full solution.


https://issues.apache.org/jira/secure/attachment/12895963/HDFS%20Scalability%20and%20Ozone.pdf


sanjay




> On Oct 28, 2017, at 2:00 PM, Konstantin Shvachko <shv.had...@gmail.com> wrote:
> 
> Hey guys,
> 
> It is an interesting question whether Ozone should be a part of Hadoop.
> There are two main reasons why I think it should not.
> 
> 1. With close to 500 sub-tasks, with 6 MB of code changes, and with a
> sizable community behind, it looks to me like a whole new project.
> It is essentially a new storage system, with different (than HDFS)
> architecture, separate S3-like APIs. This is really great - the World sure
> needs more distributed file systems. But it is not clear why Ozone should
> co-exist with HDFS under the same roof.
> 
> 2. Ozone is probably just the first step in rebuilding HDFS under a new
> architecture. With the next steps presumably being HDFS-10419 and
> HDFS-11118.
> The design doc for the new architecture has never been published. I can
> only assume based on some presentations and personal communications that
> the idea is to use Ozone as a block storage, and re-implement NameNode, so
> that it stores only a partial namesapce in memory, while the bulk of it
> (cold data) is persisted to a local storage.
> Such architecture makes me wonder if it solves Hadoop's main problems.
> There are two main limitations in HDFS:
>  a. The throughput of Namespace operations. Which is limited by the number
> of RPCs the NameNode can handle
>  b. The number of objects (files + blocks) the system can maintain. Which
> is limited by the memory size of the NameNode.
> The RPC performance (a) is more important for Hadoop scalability than the
> object count (b). The read RPCs being the main priority.
> The new architecture targets the object count problem, but in the expense
> of the RPC throughput. Which seems to be a wrong resolution of the tradeoff.
> Also based on the use patterns on our large clusters we read up to 90% of
> the data we write, so cold data is a small fraction and most of it must be
> cached.
> 
> To summarize:
> - Ozone is a big enough system to deserve its own project.
> - The architecture that Ozone leads to does not seem to solve the intrinsic
> problems of current HDFS.
> 
> I will post my opinion in the Ozone jira. Should be more convenient to
> discuss it there for further reference.
> 
> Thanks,
> --Konstantin
> 
> 
> 
> On Wed, Oct 18, 2017 at 6:54 PM, Yang Weiwei <cheersy...@hotmail.com> wrote:
> 
>> Hello everyone,
>> 
>> 
>> I would like to start this thread to discuss merging Ozone (HDFS-7240) to
>> trunk. This feature implements an object store which can co-exist with
>> HDFS. Ozone is disabled by default. We have tested Ozone with cluster sizes
>> varying from 1 to 100 data nodes.
>> 
>> 
>> 
>> The merge payload includes the following:
>> 
>>  1.  All services, management scripts
>>  2.  Object store APIs, exposed via both REST and RPC
>>  3.  Master service UIs, command line interfaces
>>  4.  Pluggable pipeline Integration
>>  5.  Ozone File System (Hadoop compatible file system implementation,
>> passes all FileSystem contract tests)
>>  6.  Corona - a load generator for Ozone.
>>  7.  Essential documentation added to Hadoop site.
>>  8.  Version specific Ozone Documentation, accessible via service UI.
>>  9.  Docker support for ozone, which enables faster development cycles.
>> 
>> 
>> To build Ozone and run ozone using docker, please follow instructions in
>> this wiki page. https://cwiki.apache.org/confl
>> uence/display/HADOOP/Dev+cluster+with+docker.
>> 
>> 
>> We have built a passionate and diverse community to drive this feature
>> development. As a team, we have achieved significant progress in past 3
>> years since first JIRA for HDFS-7240 was opened on Oct 2014. So far, we
>> have resolved almost 400 JIRAs by 20+ contributors/committers from
>> different countries and affiliations. We also want to thank the large
>> number of community members who were supportive of our efforts and
>> contributed ideas and participated in the design of ozone.
>> 
>> 
>> Please share your thoughts, thanks!
>> 
>> 
>> -- Weiwei Yang
>> 
> 
> 
> On Wed, Oct 18, 2017 at 6:54 PM, Yang Weiwei <cheersy...@hotmail.com> wrote:
> 
>> Hello everyone,
>> 
>> 
>> I would like to start this thread to discuss merging Ozone (HDFS-7240) to
>> trunk. This feature implements an object store which can co-exist with
>> HDFS. Ozone is disabled by default. We have tested Ozone with cluster sizes
>> varying from 1 to 100 data nodes.
>> 
>> 
>> 
>> The merge payload includes the following:
>> 
>>  1.  All services, management scripts
>>  2.  Object store APIs, exposed via both REST and RPC
>>  3.  Master service UIs, command line interfaces
>>  4.  Pluggable pipeline Integration
>>  5.  Ozone File System (Hadoop compatible file system implementation,
>> passes all FileSystem contract tests)
>>  6.  Corona - a load generator for Ozone.
>>  7.  Essential documentation added to Hadoop site.
>>  8.  Version specific Ozone Documentation, accessible via service UI.
>>  9.  Docker support for ozone, which enables faster development cycles.
>> 
>> 
>> To build Ozone and run ozone using docker, please follow instructions in
>> this wiki page. https://cwiki.apache.org/confluence/display/HADOOP/Dev+
>> cluster+with+docker.
>> 
>> 
>> We have built a passionate and diverse community to drive this feature
>> development. As a team, we have achieved significant progress in past 3
>> years since first JIRA for HDFS-7240 was opened on Oct 2014. So far, we
>> have resolved almost 400 JIRAs by 20+ contributors/committers from
>> different countries and affiliations. We also want to thank the large
>> number of community members who were supportive of our efforts and
>> contributed ideas and participated in the design of ozone.
>> 
>> 
>> Please share your thoughts, thanks!
>> 
>> 
>> -- Weiwei Yang
>> 

Reply via email to