Excited to see DistributedLog come to ASF! I see that you already have good list of nominated mentors. As a member of recently graduated project, I can offer mentorship(informal) as well if needed. I am not an IPMC member, so I guess I cannot be a formal mentor.
Regards, On Wed, Jun 8, 2016 at 9:34 PM, Sijie Guo <si...@apache.org> wrote: > Hi, > > I would like to propose DistributedLog to be an Apache Incubator project. > > DistributedLog is a high performance replicated log service. > It offers durability, replication and strong consistency, which provides > a fundamental building block for building reliable distributed systems, > e.g replicated-state-machines, general pub/sub systems, distributed > databases, distributed queues and etc. > > Here's a link to the proposal in the Incubator wiki > > https://wiki.apache.org/incubator/DistributedLogProposal > > I've also pasted the initial contents below. > > Thanks, > > Sijie > > = Abstract = > DistributedLog is a high-performance replicated log service. It offers > durability, replication and strong consistency, which provides a > fundamental building block for building reliable distributed systems, > e.g replicated-state-machines, general pub/sub systems, distributed > databases, distributed queues and etc. > > See “Building Distributedlog - Twitter’s high performance replicated > log service” for details: > > https://blog.twitter.com/2015/building-distributedlog-twitter-s-high-performance-replicated-log-service > > = Proposal = > We propose to contribute DistributedLog codebase and associated > artifacts (e.g. documentation, web-site content etc.) to the Apache > Software Foundation with the intent of forming a productive, > meritocratic and open community around DistributedLog’s continued > development, according to the ‘Apache Way’. > > = Background = > Engineers at Twitter began developing DistributedLog in early 2013. > DistributedLog is described in a Twitter engineering blog post and > presented at the Messaging Meetup in Sep 2015. It has been released as > an Apache-licensed open-source project on GitHub in May 2016. > > DistributedLog is a high-performance replicated log service, which > provides simple stream-oriented abstractions over log-segments and > offers durability, replication and strong consistency for building > reliable distributed systems. The features offered by DistributedLog > includes: > * Simple high-level, stream oriented interface > * Naming and metadata scheme for managing streams and other entities > * Log data management policies, include data segmentation and data > retention > * Fast write pipeline leveraging batching and compression > * Fast read mechanism leveraging long-poll and read-ahead caching > * Service tiers supporting writer fan-in and reader fan-out > * Geo-replicated logs > > DistributedLog’s most important benefit is high-performance with a > strong durability guarantee, making it extremely appropriate for > running different workloads from distributed database journaling to > real-time stream computing. Its modern, layered architecture makes it > easy to run the service tiers in multi-tenant datacenter environments > such as Apache Mesos or cloud environments such as EC2. > > = Rationale = > DistributedLog is designed to provide core fundamental features like > high-performance, durability and strong consistency to anyone who is > building reliable distributed systems, in a simple and efficient way. > > We believe that the ASF is the right venue to foster an open-source > community around DistributedLog’s development. We expect that > DistributedLog will benefit from collaboration with related Apache > projects, and under the auspices of the ASF will attract talented > contributors who will push DistributedLog’s development forward at a > faster pace. > > We believe that the timing is right for DistributedLog’s development > to move to the ASF: DistributedLog has already run in production at > Twitter for 3 years and served various workloads including a > distributed database journal, reliable cross datacenter replication, > search ingestion, andgeneral pub/sub messaging. The project is stable. > We are excited to see where an ASF-based community can take > DistributedLog. > > = Current Status = > DistributedLog is a stable project that has been used in production at > Twitter for 3 years. The source code is public at github.com/twitter, > which will seed the Apache git repository. > > = Meritocracy = > We understand the central importance of meritocracy to the Apache Way. > We will work to establish a welcoming, fair and meritocratic > community. Several companies have already expressed interest in this > project, and we intend to invite additional developers to participate. > We look forward to growing a rich user and developer community. > > = Community = > There is a large need for a performant replicated log service for > applications such as distributed databases, distributed transactional > systems, replicated-state-machines and pub/sub messaging/queuing. We > want to attract more developers to the project, and we believe that > the ASF’s open and meritocratic philosophy will help us with this. We > note the success of other similar projects already part of the ASF, > like Kafka. > > = Core Developers = > DistributedLog is actively developed within Twitter. Most of the > developers are from Twitter. Many of them are committers or PMC > members of Apache BookKeeper. Others aren’t currently affiliated with > ASF so they will require new ICLAs. > > = Alignment = > DistributedLog is related to several other Apache projects: > * DistributedLog stores log segments as Ledgers in Apache BookKeeper. > * DistributedLog uses Apache ZooKeeper for naming and metadata > management and tracking the ownership of logs. > * DistributedLog uses Apache Thrift as its RPC and serialization > framework. > * In the long-term, DistributedLog’s data will be stored in Apache > Hadoop clusters powered by HDFS filesystem for archives and backup. > > = Known Risks = > > == Orphaned Products == > DistributedLog is used as the fundamental messaging infrastructure at > Twitter. It has been serving production traffic for online database > systems, search ingestion and a general pub/sub system. Twitter > remains committed to developing and supporting the project. Twitter > has a strong track record in standing behind projects that were > contributed to the ASF by its employees, including Apache Mesos, > Apache Aurora, Apache BookKeeper, Apache Hadoop. There are many > companies are interested in using it in production. > > == Inexperience with Open Source == > The core developers of DistributedLog are committers of Apache > BookKeeper. Although other committers on the initial list are > committers or have less experience with the ASF, they already are > active in Apache BookKeeper community. We are confident that the > project can be run in accordance with Apache principles on an ongoing > basis. > > == Homogeneous Developers == > The initial committers are from Twitter. We hope to encourage > contributions from other developers and grow them into committers > after they have had time to continue their contributions. > > == Reliance on Salaried Developers == > Many of DistributedLog’s initial set of committers work full-time on > DistributedLog, and are paid to do so. However, as mentioned > elsewhere, we anticipate growth in the developer community which we > hope will include people from industry, hobbyists, and academics who > have an interested in distributed messaging systems. > > == Relationships with Other Apache Products == > DistributedLog uses Apache BookKeeper to store log segments and Apache > ZooKeeper to store log metadata and manage log namespaces. It provides > an end-to-end solution for replicated logs, to make building reliable > distributed systems much easier. Unlike Kafka or ActiveMQ, > DistributedLog is not a full-fledged pub/sub, queuing or messaging > system. Instead, it is targeting on providing a fundamental building > block for other distributed systems, offering durability, replication > and consistency. So it could be used by other distributed systems, > such as transaction log for replicated state machines (e.g., HDFS > NameNode), WAL for distributed databases (e.g. HBase), Journal for > in-memory services (e.g., Kestrel) and even storage backend for a > full-fledged messaging system. > > == An Excessive Fascination with the Apache Brand == > DistributedLog builds on two existing top-level projects, Apache > BookKeeper and Apache ZooKeeper. Some of the core developers actively > participate in both projects and understand well the implications of > being hosted by Apache. We would like this project to build on the > same core values of ASF and to grow a community based on meritocracy. > Also, there are several other projects already hosted by ASF in this > space of reliable messaging and that overlap with DistributedLog in > interests and scope. Consequently, the combination of all these > observations makes us believe that DistributedLog should be hosted by > the ASF. > > = Documentation = > Building DistributedLog: Twitter’s high performance replicated log > service ( > https://blog.twitter.com/2015/building-distributedlog-twitter-s-high-performance-replicated-log-service > ) > > Documentation located in http://distributedlog.io. > > = Initial Source = > DistributedLog’s initial source contribution will come from > http://github.com/twitter/distributedlog/. > > = External Dependencies = > DistributedLog depends upon a number of third-party libraries, which > we list below. > * Apache BookKeeper (Apache Software License v2.0) > * Apache Commons (Apache Software License v2.0) > * Apache Maven (Apache Software License v2.0) > * Apache Thrift (Apache Software License v2.0) > * Apache ZooKeeper (Apache Software License v2.0) > * Google Guava (Apache Software License v2.0) > * Mockito (MIT License) > * Junit (Eclipse Public License 1.0) > * LZ4-java (Apache Software License v2.0) > * SLF4J (MIT License) > * Twitter Finagle (Apache Software License v2.0) > * Twitter Scrooge (Apache Software License v2.0) > * Twitter Util (Apache Software License v2.0) > > = Required Resources = > We request that following resources be created for the project to use: > > == Mailing lists == > * priv...@distributedlog.incubator.apache.org (moderated subscriptions) > * comm...@distributedlog.incubator.apache.org > * d...@distributedlog.incubator.apache.org > * u...@distributedlog.incubator.apache.org > > == Git repository == > https://git.apache.org/distributedlog.git > > == JIRA instance == > JIRA project DLOG (DLOG or DL) > > = Initial Committers = > * Sijie Guo (Apache BookKeeper Committer, Twitter) > * Robin Dhamankar (Apache BookKeeper Committer) > * Leigh Stewart (Twitter) > * Dave Rusek (Twitter) > * Honggang Zhang (Twitter) > * Jordan Bull (Twitter) > * Satish Kotha (Twitter) > * Aniruddha Laud > * Franck Cuny (Twitter) > * Eitan Adler (Twitter) > > == Affiliations == > > Most of the initial committers are employees of Twitter, except Robin > Dhamankar and Aniruddha Laud. > > = Sponsors = > > == Champion == > > Flavio Junqueira > > == Nominated Mentors == > > * Flavio Junqueira > * Chris Nauroth > * Henry Saputra > > = Sponsoring Entity = > > We ask that the Apache Incubator PMC to sponsor this proposal. >