Re: [Proposal]New storage project: HBlock

Sheng Wu Mon, 09 Mar 2020 19:46:03 -0700

Hi

Personally, and basically, I am feeling the team has misunderstood
the meaning of incubator and the requirements of building the community.
Same as the last time discussion, I still think they will be in a big
pressure as they have to deal with the basic feature development, community
build and following ASF incubator requirements at the same time if they are
accepted into the incubator. And at the same time, the team lacks the
experiences of open source community in or out of ASF.
I am not sure whether this is good for the project. Seem like a little
hurry to join the incubator.
More Comments inline.


Willing to listen to what other IPMCs think.

<zhangguoc...@chinatelecom.cn> 于2020年3月10日周二 上午10:21写道：

> Hi, All,
>
> We are China Telecom Corporation Limited Cloud Computing Branch
> Corporation.
> We hope to contribute one of our projects named 'HBlock' to Apache.
> Here is the proposal of HBlock project, please feel free to let me know
> what
> the concerns and suggestions from you. Thank you so much.
>
> HBlock Proposal
>
> 1.Abstract
> The HBlock project will be an enterprise distributed block storage.
>
> 2.Proposal
> HBlock provides a distributed block storage with the following features:
> 2.1.User-space iSCSI target: HBlock will implement an iSCSI target that is
> RFC-7143 (https://tools.ietf.org/html/rfc7143) compliant written in pure
> Java designed to run on top of any mainstream Operating System, including
> Windows and Linux, as a user-space process.
> 2.2.Enterprise level features: HBlock will implement comprehensive
> enterprise level features, such as
> Asymmetric Logical Unit Access (ALUA, Information technology -SCSI Primary
> Commands - 4 (SPC-4), https://www.t10.org/cgi-bin/ac.pl?t=f&f=spc4r37.pdf),
>
> Persistent Reservations (PR, Information technology -SCSI Primary Commands
> -
> 4 (SPC-4), https://www.t10.org/cgi-bin/ac.pl?t=f&f=spc4r37.pdf),
> VMware vSphere Storage APIs - Array Integration(VAAI,
>
> https://www.vmware.com/techpapers/2012/vmware-vsphere-storage-apis-array-int
> egration-10337.html
> <https://www.vmware.com/techpapers/2012/vmware-vsphere-storage-apis-array-integration-10337.html>
> ),
> Offloaded Data Transfer(ODX,
>
> https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-se
> rver-2012-R2-and-2012/hh831628(v=ws.11)
> <https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-R2-and-2012/hh831628(v=ws.11)>),
> so that it will support
> session-level fail-over,
> Oracle Real Application Cluster(Oracle RAC,
> https://www.oracle.com/database/technologies/rac.html) ,
> Cluster File System (CFS), VMware cluster and Windows cluster.
> 2.3.Low latency: HBlock will implement in-memory distributed cache to
> reduce
> write latency and improve Input / Output Operations Per Second (IOPS), and
> it will leverage storage-class memory to archive even higher durability
> without IOPS loss.
> 2.4.Smart Compaction and Garbage Collection(GC): HBlock will convert all
> the
> write operations into sequential append operations to improve the random
> write performance, and it will choose the best timing to compact and
> collect
> the garbage per Logic Unit (LU). Comparting to Solid State Drives (SSD's)
> internal Garbage Collection, such a global GC will reduce the need of SSD's
> internal GC, which indirectly make SSD have more usable space, and have
> even
> better GC strategy due to close to application. In essence, flash writes
> data in block (32MB) order. In order to realize random write, SSD disk will
> reserve a part of space for GC in the disk. Therefore, the more random
> write
> and delete, the more space needs to be reserved. HDFS based writes are
> sequential for SSD, so the space reserved in SSD is small. In short, as
> long
> as there is a GC, there must be reserved space, either in the HBlock layer
> or in the controller layer inside the SSD. Because HBlock is closer to LU,
> it can be more efficient GC. For example, a LU dedicated to video
> monitoring
> data basically writes video data in sequence, and starts writing again when
> the disk is full. This LU does not need any GC at all. If you do GC in the
> SSD layer, SSD will see the data of various LUs, and unnecessary movement
> will be made to the LU dedicated for video monitoring.
> 2.5.Hadoop Distributed File System (HDFS)-based: HBlock leverages HDFS a as
> persistent layer to avoid reinventing wheels. The iSCSI target will run on
> the client side of HDFS and directly read or write data from or to Data
> Nodes.
> 2.6.Easy to deploy: HBlock will provide easy-to-use utilities to make the
> installation process extremely easy. Since HBlock does not rely on any
> Operating System, deployment is easy unlike other storage systems that rely
> on in-kernel iSCSI module, such as Linux-IO (LIO), or SCST.
>

I noticed there are a lot of `will`s here in the Proposal section as the
project core features.
Are these language issues or all these features not available today?
Which parts have been implemented?


>
> 3.Background
> We think block storage is a very general technology.
> Block storage is the foundation of enterprise IT infrastructure. But
> unfortunately, there is not any open source and mature distributed block
> storage at this moment.
> Ceph is well known and widely adopted, but it is just a storage engine in
> the same level as HDFS. Ceph does not cover the need for iSCSI. If you want
> to use Ceph as block storage, you must use solutions like LIO to handle
> iSCSI. Unfortunately, LIO lacks many features and thus cannot be directly
> used in an enterprise production environment. Additionally, LIO is a Linux
> kernel module and Ceph is a user-space process creating problems to allow
> LIO to talk with Ceph processes. Even TCM in User Space (TCMU) is being
> worked on (https://www.kernel.org/doc/Documentation/target/tcmu-design.txt
> ),
> but it looks ugly to make an in-kernel module call a user-space process.
> That is why we want to create HBlock, which will implement comprehensive
> enterprise level features completely in user-space including High
> Availability (HA), distributed cache, VAAI, PR, ODX and so on.
> HBlock project is based on HDFS and will be an excellent addition to the
> Apache family of projects.
>
> 4.Rationale
> Block storage is the foundation of enterprise IT infrastructure. But
> unfortunately, there is not any open source and mature distributed block
> storage at this moment.
> Ceph is well known and widely adopted, but it is just a storage engine in
> the same level as HDFS. Ceph does not cover the need for iSCSI. If you want
> to use Ceph as block storage, you must use solutions like LIO to handle
> iSCSI. Unfortunately, LIO lacks many features and thus cannot be directly
> used in an enterprise production environment. Additionally, LIO is a Linux
> kernel module and Ceph is a user-space process creating problems to allow
> LIO to talk with Ceph processes. Even TCM in User Space (TCMU) is being
> worked on (https://www.kernel.org/doc/Documentation/target/tcmu-design.txt
> ),
> but it looks ugly to make an in-kernel module call a user-space process.
> That is why we want to create HBlock, which will implement comprehensive
> enterprise level features completely in user-space include High
> Availability
> (HA), distributed cache, VAAI, PR, ODX and so on.
> HBlock project is based on HDFS and will be an excellent addition to the
> Apache family of projects.
>
> 5.Initial Goals
> N/A.
>

Why this is N/A?


>
> 6.Current Status
> At present, we have completed the development of HBlock in a stand-alone
> version. HBlock has been used in the online environment of many customers.
> This standalone version has implemented advanced SCSI functions including
> PR, VAAI, ODX, etc., among which cross Network Address Translation(NAT) NAT
> support is a key feature of HBlock, which can allow clients in the LAN to
> access iSCSI targets located on the Internet. HBlock makes it possible to
> provide iSCSI as a Service. A version with high availability features is
> also under testing.
> 6.1 Meritocracy
> At present, this project is still an internal private project which is
> operated according to the internal project development technology of the
> enterprise, so it does not involve this issue. But we are willing to follow
> the rules of the open source community. We will be tracking submissions
> from
> patches, accepting the intentional patches of HBlock and increasing the
> publicity of HBlock. We look to invite more people who show merit to join
> the project.
> 6.2 Community
> At present, the HBlock project is still an internal private project, which
> is operated according to the internal project development technology of the
> enterprise, so it does not involve this issue. But we are willing to follow
> the rules of the open source community.
> There are several business customers using our HBlock, and we will invite
> them and their industry partners to join the community. We will communicate
> with China Telecom Cloud Service customers through forums, e-mail, instant
> messages and other ways, and update the product information in time, so as
> to attract more developers to join the project.
> 6.3 Core Developers
> At present, the HBlock Project has about 30 people.  Approximately 20
> internal developers and 10 test engineers, all very experienced engineers.
>

Are the test engineers internal too? I suppose.


> There is some brief introduction of the key contributors.
> Dong Changkun, who is the development team leader with rich JAVA
> development
> experience, as the architect of HBlock to control the overall design.
> Wu Zhimin, who is the R & D expert of cloud storage product line in our
> company, more than 12 years of storage development experience. In HBlock,
> he
> is mainly responsible for the architecture design of the protocol module,
> the implementation of the SCSI module, and the research of difficult
> points.
> Yu Erdong, who is rich JAVA development experience and distributed storage
> system development experience; Mainly responsible for the design of HBlock
> back-end modules and management tool modules, as well as the development of
> back-end cache and master-slave switching.
> 6.4 Alignment
> HBlock is the only product in the industry that develops block storage
> based
> on HDFS.
> With the increase in sizes of disk capacity, such as the emergence of
> Shingled Magnetic Recording (SMR) disk, more and more disks show the
> negative characteristics of sequential write. Flash memory also has the
> same
> characteristics. The underlying particles of flash memory are written
> sequentially in blocks (32MB), but the SSD disk will reserve 20% space for
> merging so that the file system seems to support random writing. Because
> HBlock is based on HDFS, HBlock inherently supports sequential write.
> Combined with thread IO of random write to SSDs being very small, HBlock
> allows you to reduce 20% of the reserved space to only 5%.
> In addition, with the large adoption of HDFS, HBlock allows HDFS facilities
> to become highly available, cloud-ready, block storage which is super cool!
>
> 7.Known Risks
> The software is not stable and has bugs, which needs continuous
> improvement.
> More sophisticated strategies are needed to schedule and optimize the time
> of data merging to avoid merging data during the business peak hours.
>
> 8.Project Name
> HBlock is named because Hadoop is a distributed project in the Apache
> community, and the database project based on this project is called HBase.
> In order to follow this style as a distributed block storage project, we
> named it HBlock.
>
> 9.Orphaned products
> Storage is our core business and HBlock is our technical direction.  We
> will
> continue to invest it and see value in building a vibrant open source
> community to improve it. We believe that HBlock, a product based on HDFS,
> will have more vitality as an open source software project under the Apache
> Software Foundation.
> 9.1 Inexperience with Open Source
> We don't have much experience in open source, but we hope to open source
> HBlock so that more people can use and develop this project. We are willing
> to learn from Apache's experience in open source and apply it to the HBlock
> project.
> Jiang Feng, who is the founder and team leader of HBlock project, submitted
> code to Hadoop more than 10 years ago.
>

Is he already a Hadoop committer or PMC? Does he have experience in the ASF
process?


> 9.2 Length of Incubation
> It is expected that the HBlock project will take one year to complete the
> incubation process.
>

One year is a short term for most incubator project. IPMC, please correct
me if I am wrong.
How do you get this as an expected conclusion?


> While learning the Apache Way, we have an aggressive release calendar:
>

Why the following features have anything related to the Apache Way?
These look like feature roadmap only to me. These are development plans,
not like the community build.
Confused for me, could you explain?


> In April 2020, we will complete the version of HBlock with high
> availability.
> In June 2020, we will complete the development of the web portal and
> "green"
> installation that can be installed with existing applications and support
> x86 and ARM servers.
> In September 2020, we will complete advanced SCSI functions, including PR,
> VAAI, ODX, etc.
> 9.3 Homogenous Developers
> At present, HBlock has approximately 20 developers, all of whom are very
> experienced engineers. They work in Beijing, Shanghai, Inner Mongolia and
> other regions, and they are experienced with working in a distributed
> environment for the same company.
> We will expand our existing team through campus recruitment and social
> recruitment, and attract more developers from the community to join the
> HBlock project. HDFS is a widely used project. We have confident that the
> block storage project based on HDFS will attract more volunteers.
> 9.4 Reliance on Salaried Developers
> HBlock is reliant on China Telecom's salaried developers. China Telecom
> will
> not easily change its market strategy. This is the first time for China
> Telecom to share the project with the open source community, so it will pay
> attention to the investment in this project. At the same time, the project
> will be widely used in China Telecom. With the support of resources of
> China
> Telecom and the verification of the actual project, the continuity and
> quality of the project will be guaranteed. We also have been developing in
> the storage field for seven and a half years and will continue to work in
> this field. At the same time, block storage based on HDFS will definitely
> attract more volunteers to join. We will support volunteers being involved
> and our developers are committed to doing so.
> 9.5 Relationships with Other Apache Products
> HBlock uses Apache HDFS, Apache commons-IO, commons-collections,
> commons-configuration, commons-email, commons-logging, Apache log4j, and
> Apache Hadoop-common.
> 9.6 An Excessive Fascination with the Apache Brand
> We have chosen the Apache Software Foundation as the home to open source
> HBlock because HBlock is based on HDFS.  We believe there is a very natural
> synergy with Apache.
>
> 10.Documentation
> About the user guide, please refer to "China Telecom HBlock User
> Guide_20200121.docx". (There is only a doc version right now)
>
> 11.Initial Source
> HBlock has been developed since the second half of 2018. HBlock is based on
> HDFS and the internal source code will be donated to the Foundation.  China
> Telecom is prepared to execute the paperwork required for the donation.
>
> 12.Source and Intellectual Property Submission Plan
> The HBlock specification and content on www.ctyun.cn are from China
> Telecom
> Co., Ltd. The HBlock library uses the Java language. There is no complexity
> in the code base donation process and we are ready to move the repositories
> over.
> 12.1 External Dependencies
> HBlock use Apache commons-IO, commons-collections, commons-configuration,
> Apache log4j,commons-email,commons-logging,org.json, jline,pty4j, Apache
> hadoop-hdfs, hadoop-common, netty-all, and Apache zookeeper. These are all
> under Apache or BSD licenses.

12.2 Cryptography
> The HBlock project does not involve encryption code.
>
> 13.Required Resources
> 13.1 Mailing lists:
> priv...@hblock.incubator.apache.org
> d...@hblock.incubator.apache.org
> us...@hblock.incubator.apache.org


user ml is not recommended. As you don't have users today. Recommend to
share it with the dev.

Sheng Wu 吴晟
Twitter, wusheng1108


>
> comm...@hblock.incubator.apache.org
> 13.2 Subversion Directory
> https://svn.apache.org/repos/asf/incubator/hblock
> (According to Apache rules)
> 13.3 Git Repositories
> https://gitbox.apache.org/repos/asf/incubator-hblock.git
> (According to Apache rules)
> 13.4 Issue Tracking
> JIRA HBlock(HBLOCK)
> (According to Apache rules)
> 13.5 Other Resources
> N/A.
>
> 14.Initial Committers
> Yu Erdong (yued at chinatelecom dot cn)
> Wu Zhimin (wuzhimin at chinatelecom dot cn)
> Yang Chao (yangchao1 at chinatelecom dot cn)
> Dong Changkun (dongck at chinatelecom dot cn)
> Guo Yong (guoyong1 at chinatelecom dot cn)
> Zhao Wentao(zhaowt at chinatelecom dot cn)
> Cui Meng (cuimeng at chinatelecom dot cn)
> Wei Wei (weiwei2 at chinatelecom dot cn)
>
> 15.Sponsors
> 15.1 Champion
> Kevin A. McGrail
> 15.2 Nominated Mentors
> Kevin A. McGrail
> 15.3 Sponsoring Entity
> The Incubator
> (END)
>
> Best Wishes.
>
> ----------------------------------------------------------------------------
> ------------------
> Zhang Guochen  Project Manager
> China Telecom Corporation Limited Cloud Computing Branch Corporation
> Mail: zhangguoc...@chinatelecom.cn
> Phone: 86-17301021225
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

Re: [Proposal]New storage project: HBlock

Reply via email to