Re: Looking for Champion

Tan,Zhongyi Fri, 08 Jun 2018 17:16:25 -0700

great，dave，we will add you as champion.

thanks


> 在 2018年6月8日，20:59，Dave Fisher <dave2w...@comcast.net> 写道：
> 
> Hi -
> 
> I’m willing to Champion and Mentor. I have a couple of comments inline. I’ll 
> look at dependency licenses later today. It’s early for me.
> 
> 
>> On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <l...@baidu.com> wrote:
>> 
>> Hi all,
>> 
>> I am Reed, as a developer worked with the team for Palo (a MPP-based 
>> interactive SQL data warehousing).
>> https://github.com/baidu/palo/wiki/Palo-Overview
>> 
>> We propose to contribute Palo as an Apache Incubator project, and
>> we are still looking for possible Champion if anyone would like to 
>> volunteer. Thanks a lot.
>> 
>> Best Regards,
>> Reed
>> 
>> ===================
>> The draft of the proposal as below:
>> 
>> #Apache Palo
>> 
>> ##Abstract
>> 
>> Palo is a MPP-based interactive SQL data warehousing for reporting and 
>> analysis.
>> 
>> ##Proposal
>> 
>> We propose to contribute the Palo codebase and associated artifacts (e.g. 
>> documentation, web-site content etc.) to the Apache Software Foundation with 
>> the intent of forming a productive, meritocratic and open community around 
>> Palo’s continued development, according to the ‘Apache Way’.
>> 
>> Baidu owns several trademarks regarding Palo, and proposes to transfer 
>> ownership of those trademarks in full to the ASF.
>> 
>> ###Overview of Palo
>> 
>> Palo’s implementation consists of two daemons: Frontend (FE) and Backend 
>> (BE).
>> 
>> **Frontend daemon** consists of query coordinator and catalog manager. Query 
>> coordinator is responsible for receiving users’ sql queries, compiling 
>> queries and managing queries execution. Catalog manager is responsible for 
>> managing metadata such as databases, tables, partitions, replicas and etc. 
>> Several frontend daemons could be deployed to guarantee fault-tolerance, and 
>> load balancing.
>> 
>> **Backend daemon** stores the data and executes the query fragments. Many 
>> backend daemons could also be deployed to provide scalability and 
>> fault-tolerance.
>> 
>> A typical Palo cluster generally composes of several frontend daemons and 
>> dozens to hundreds of backend daemons.
>> 
>> Users can use MySQL client tools to connect any frontend daemon to submit 
>> SQL query. Frontend receives the query and compiles it into query plans 
>> executable by the Backend. Then Frontend sends the query plan fragments to 
>> Backend. Backend will build a query execution DAG. Data is fetched and 
>> pipelined into the DAG. The final result response is sent to client via 
>> Frontend. The distribution of query fragment execution takes minimizing data 
>> movement and maximizing scan locality as the main goal.
>> 
>> ##Background
>> 
>> At Baidu, Prior to Palo, different tools were deployed to solve diverse 
>> requirements in many ways. And when a use case requires the simultaneous 
>> availability of capabilities that cannot all be provided by a single tool, 
>> users were forced to build hybrid architectures that stitch multiple tools 
>> together, but we believe that they shouldn’t need to accept such inherent 
>> complexity. A storage system built to provide great performance across a 
>> broad range of workloads provides a more elegant solution to the problems 
>> that hybrid architectures aim to solve. Palo is the solution.
>> 
>> Palo is designed to be a simple and single tightly coupled system, not 
>> depending on other systems. Palo provides high concurrent low latency point 
>> query performance, but also provides high throughput queries of ad-hoc 
>> analysis. Palo provides bulk-batch data loading, but also provides near 
>> real-time mini-batch data loading. Palo also provides high availability, 
>> reliability, fault tolerance, and scalability.
>> 
>> ##Rationale
>> 
>> Palo mainly integrates the technology of Google Mesa and Apache Impala.
>> 
>> Mesa is a highly scalable analytic data storage system that stores critical 
>> measurement data related to Google's Internet advertising business. Mesa is 
>> designed to satisfy complex and challenging set of users’ and systems’ 
>> requirements, including near real-time data ingestion and query ability, as 
>> well as high availability, reliability, fault tolerance, and scalability for 
>> large data and query volumes.
>> 
>> Impala is a modern, open-source MPP SQL engine architected from the ground 
>> up for the Hadoop data processing environment. At present, by virtue of its 
>> superior performance and rich functionality， Impala has been comparable to 
>> many commercial MPP database query engine. Mesa can satisfy the needs of 
>> many of our storage requirements, however Mesa itself does not provide a SQL 
>> query engine; Impala is a very good MPP SQL query engine, but the lack of a 
>> perfect distributed storage engine. So in the end we chose the combination 
>> of these two technologies.
>> 
>> Learning from Mesa’s data model, we developed a distributed storage engine. 
>> Unlike Mesa, this storage engine does not rely on any distributed file 
>> system. Then we deeply integrate this storage engine with Impala query 
>> engine. Query compiling, query execution coordination and catalog management 
>> of storage engine are integrated to be frontend daemon; query execution and 
>> data storage are integrated to be backend daemon. With this integration, we 
>> implemented a single, full-featured, high performance state the art of MPP 
>> database, as well as maintaining the simplicity.
>> 
>> ##Current Status
>> 
>> Palo has been an open source project on GitHub 
>> (https://github.com/baidu/palo).
>> 
>> ###Meritocracy
>> 
>> Palo has been deployed in production at Baidu and is applying more than 200 
>> lines of business. It has demonstrated great performance benefits and has 
>> proved to be a better way for reporting and analysis based big data. Still 
>> We look forward to growing a rich user and developer community.
>> 
>> ###Community
>> 
>> Palo seeks to develop developer and user communities during incubation.
>> 
>> ###Core Developers
>> 
>> * Ruyue Ma (https://github.com/maruyue, 
>> maru...@baidu.com<mailto:maru...@baidu.com>)
>> * Chun Zhao (https://github.com/imay, 
>> buaa.zh...@gmail.com<mailto:buaa.zh...@gmail.com>)
>> * Mingyu Chen (https://github.com/morningman,chenmin...@baidu.com)
>> * De Li（https://github.com/lide-reed, 
>> mailtol...@sina.com）<mailto:mailtol...@sina.com%EF%BC%89>
>> * Hao Chen (https://github.com/chenhao7253886, 
>> chenha...@baidu.com<mailto:chenha...@baidu.com>)
>> * Chaoyong Li (https://github.com/cyongli, 
>> lichaoy...@baidu.com<mailto:lichaoy...@baidu.com>)
>> * Bin Lin (https://github.com/lingbin, 
>> lingbi...@gmail.com<mailto:lingbi...@gmail.com>)
>> 
>> ###Alignment
>> 
>> Palo is related to several other Apache projects:
>> 
>> * Palo can also read data stored in Apache Hadoop clusters powered by the 
>> HDFS filesystem.
>> * Palo is closely integrated with Impala, which is also being proposed to 
>> the Incubator.
> 
> Apache Impala has completed Incubation. Jim Apple is VP, Impala.
> 
>> * Palo uses Apache Thrift as its RPC and serialization framework of choice.
>> 
>> ##Known Risks
>> 
>> ###Orphaned Products
>> 
>> The core developers of Palo team plan to work full time on this project. 
>> There is very little risk of Palo getting orphaned since at least one large 
>> company (Baidu) is extensively using it in their production. For example, 
>> currently there are more than 200 use cases using Palo in production. 
>> Furthermore, since Palo was open sourced at the beginning of October 2017, 
>> it has received more than 660 stars and been forked nearly 170 times. We 
>> plan to extend and diversify this community further through Apache.
>> 
>> ###Inexperience with Open Source
>> 
>> The core developers are all active users and followers of open source. They 
>> are already committers and contributors to the Palo Github project. All have 
>> been involved with the source code that has been released under an open 
>> source license, and several of them also have experience developing code in 
>> an open source environment. Though the core set of Developers do not have 
>> Apache Open Source experience, there are plans to onboard individuals with 
>> Apache open source experience on to the project.
>> 
>> ###Homogenous Developers
>> 
>> The most of core developers are from Baidu, but after Palo was open sourced, 
>> Palo received a lot of bug fixes and enhancements from other developers not 
>> working at Baidu.
>> 
>> ###Reliance on Salaried Developers
>> 
>> Baidu invested in Palo as the OLAP solution and some of its key engineers 
>> are working full time on the project. In addition, since there is a growing 
>> Big Data need for scalable OLAP solutions, we look forward to other Apache 
>> developers and researchers to contribute to the project. Also key to 
>> addressing the risk associated with relying on Salaried developers from a 
>> single entity is to increase the diversity of the contributors and actively 
>> lobby for Domain experts in the BI space to contribute. Apache Palo intends 
>> to do this.
>> 
>> ###An Excessive Fascination with the Apache Brand
>> 
>> Palo is proposing to enter incubation at Apache in order to help efforts to 
>> diversify the committer-base, not so much to capitalize on the Apache brand. 
>> The Palo project is in production use already inside Baidu, but is not 
>> expected to be an Baidu product for external customers. As such, the Palo 
>> project is not seeking to use the Apache brand as a marketing tool.
>> 
>> ##Documentation
>> 
>> Information about Palo can be found at https://github.com/baidu/palo. The 
>> following links provide more information about Palo in open source:
>> 
>> * Palo wiki site: https://github.com/baidu/palo/wiki
>> * Codebase at Github: https://github.com/baidu/palo
>> * Issue Tracking: https://github.com/baidu/palo/issues
>> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
>> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
>> 
>> ##Initial Source
>> 
>> Palo has been under development since 2017 by a team of engineers at Baidu 
>> Inc. It is currently hosted on Github.com under an Apache license at 
>> https://github.com/baidu/palo.
>> 
>> ##External Dependencies
>> 
>> Palo has the following external dependencies.
>> 
>> * Google gflags (BSD)
>> * Google glog (BSD)
>> * Apache Thrift (Apache Software License v2.0)
>> * Apache Commons (Apache Software License v2.0)
>> * Boost (Boost Software License)
>> * OpenLdap (OpenLDAP Software License)
>> * rapidjson (Tencent)
>> * Google RE2 (BSD-style)
>> * lz4 (BSD)
>> * snappy (BSD)
>> * cyrus-sasl (CMU License)
>> * Twitter Bootstrap (Apache Software License v2.0)
>> * d3 (BSD)
>> * LLVM (BSD-like)
>> 
>> Build and test dependencies:
>> 
>> * ant (Apache Software License v2.0)
>> * Apache Maven (Apache Software License v2.0)
>> * cmake (BSD)
>> * clang (BSD)
>> * Google gtest (Apache Software License v2.0)
>> 
>> ##Required Resources
>> 
>> ###Mailing List
>> 
>> There are currently no mailing lists. The usual mailing lists are expected 
>> to be set up when entering incubation:
>> 
>> priv...@palo.incubator.apache.org<mailto:priv...@palo.incubator.apache.org>
>> d...@palo.incubator.apache.org<mailto:d...@palo.incubator.apache.org>
>> comm...@palo.incubator.apache.org<mailto:comm...@palo.incubator.apache.org>
>> 
>> ###Subversion Directory
>> 
>> Upon entering incubation: https://github.com/baidu/palo.
>> After incubation, we want to move the existing repo from 
>> https://github.com/baidu/palo to Apache infrastructure.
>> 
>> ###Issue Tracking
>> 
>> Palo currently uses GitHub to track issues. Would like to continue to do so 
>> while we discuss migration possibilities with the ASF Infra committee.
>> 
>> ###Other Resources
>> 
>> The existing code already has unit tests so we will make use of existing 
>> Apache continuous testing infrastructure. The resulting load should not be 
>> very large.
>> 
>> ##Initial Committers
>> 
>> * Ruyue Ma (https://github.com/maruyue, 
>> maru...@baidu.com<mailto:maru...@baidu.com>)
>> * Chun Zhao (https://github.com/imay, 
>> buaa.zh...@gmail.com<mailto:buaa.zh...@gmail.com>)
>> * Mingyu Chen (https://github.com/morningman,chenmin...@baidu.com)
>> * De Li（https://github.com/lide-reed, 
>> mailtol...@sina.com）<mailto:mailtol...@sina.com%EF%BC%89>
>> * Hao Chen (https://github.com/chenhao7253886, 
>> chenha...@baidu.com<mailto:chenha...@baidu.com>)
>> * Chaoyong Li (https://github.com/cyongli, 
>> lichaoy...@baidu.com<mailto:lichaoy...@baidu.com>)
>> * Bin Lin (https://github.com/lingbin, 
>> lingbi...@gmail.com<mailto:lingbi...@gmail.com>)
>> 
>> ##Affiliations
>> 
>> The initial committers are employees of Baidu Inc.. The nominated mentors 
>> are employees of TODO.
>> 
>> ##Sponsors
>> 
>> ###Champion
>> 
>> TODO
>> 
>> ###Nominated Mentors
>> 
>> * sijie guo, guosi...@gmail.com<mailto:guosi...@gmail.com>
>> * Luke Han, luke...@apache.org<mailto:luke...@apache.org>
>> * Zheng Shao, zs...@apache.org<mailto:zs...@apache.org>
> 
> Mentors must be members of the IPMC and almost always Members of the ASF.
> 
> At this moment only Luke Han is qualified.
> 
> Regards,
> Dave
> 
>> 
>> ###Sponsoring Entity
>> 
>> We are requesting the Incubator to sponsor this project.
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Re: Looking for Champion

Reply via email to