Re: Looking for Champion

Dave Fisher Fri, 08 Jun 2018 05:59:23 -0700

Hi -

I’m willing to Champion and Mentor. I have a couple of comments inline. I’ll 
look at dependency licenses later today. It’s early for me.



> On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <l...@baidu.com> wrote:
> 
> Hi all,
> 
> I am Reed, as a developer worked with the team for Palo (a MPP-based 
> interactive SQL data warehousing).
> https://github.com/baidu/palo/wiki/Palo-Overview
> 
> We propose to contribute Palo as an Apache Incubator project, and
> we are still looking for possible Champion if anyone would like to volunteer. 
> Thanks a lot.
> 
> Best Regards,
> Reed
> 
> ===================
> The draft of the proposal as below:
> 
> #Apache Palo
> 
> ##Abstract
> 
> Palo is a MPP-based interactive SQL data warehousing for reporting and 
> analysis.
> 
> ##Proposal
> 
> We propose to contribute the Palo codebase and associated artifacts (e.g. 
> documentation, web-site content etc.) to the Apache Software Foundation with 
> the intent of forming a productive, meritocratic and open community around 
> Palo’s continued development, according to the ‘Apache Way’.
> 
> Baidu owns several trademarks regarding Palo, and proposes to transfer 
> ownership of those trademarks in full to the ASF.
> 
> ###Overview of Palo
> 
> Palo’s implementation consists of two daemons: Frontend (FE) and Backend (BE).
> 
> **Frontend daemon** consists of query coordinator and catalog manager. Query 
> coordinator is responsible for receiving users’ sql queries, compiling 
> queries and managing queries execution. Catalog manager is responsible for 
> managing metadata such as databases, tables, partitions, replicas and etc. 
> Several frontend daemons could be deployed to guarantee fault-tolerance, and 
> load balancing.
> 
> **Backend daemon** stores the data and executes the query fragments. Many 
> backend daemons could also be deployed to provide scalability and 
> fault-tolerance.
> 
> A typical Palo cluster generally composes of several frontend daemons and 
> dozens to hundreds of backend daemons.
> 
> Users can use MySQL client tools to connect any frontend daemon to submit SQL 
> query. Frontend receives the query and compiles it into query plans 
> executable by the Backend. Then Frontend sends the query plan fragments to 
> Backend. Backend will build a query execution DAG. Data is fetched and 
> pipelined into the DAG. The final result response is sent to client via 
> Frontend. The distribution of query fragment execution takes minimizing data 
> movement and maximizing scan locality as the main goal.
> 
> ##Background
> 
> At Baidu, Prior to Palo, different tools were deployed to solve diverse 
> requirements in many ways. And when a use case requires the simultaneous 
> availability of capabilities that cannot all be provided by a single tool, 
> users were forced to build hybrid architectures that stitch multiple tools 
> together, but we believe that they shouldn’t need to accept such inherent 
> complexity. A storage system built to provide great performance across a 
> broad range of workloads provides a more elegant solution to the problems 
> that hybrid architectures aim to solve. Palo is the solution.
> 
> Palo is designed to be a simple and single tightly coupled system, not 
> depending on other systems. Palo provides high concurrent low latency point 
> query performance, but also provides high throughput queries of ad-hoc 
> analysis. Palo provides bulk-batch data loading, but also provides near 
> real-time mini-batch data loading. Palo also provides high availability, 
> reliability, fault tolerance, and scalability.
> 
> ##Rationale
> 
> Palo mainly integrates the technology of Google Mesa and Apache Impala.
> 
> Mesa is a highly scalable analytic data storage system that stores critical 
> measurement data related to Google's Internet advertising business. Mesa is 
> designed to satisfy complex and challenging set of users’ and systems’ 
> requirements, including near real-time data ingestion and query ability, as 
> well as high availability, reliability, fault tolerance, and scalability for 
> large data and query volumes.
> 
> Impala is a modern, open-source MPP SQL engine architected from the ground up 
> for the Hadoop data processing environment. At present, by virtue of its 
> superior performance and rich functionality， Impala has been comparable to 
> many commercial MPP database query engine. Mesa can satisfy the needs of many 
> of our storage requirements, however Mesa itself does not provide a SQL query 
> engine; Impala is a very good MPP SQL query engine, but the lack of a perfect 
> distributed storage engine. So in the end we chose the combination of these 
> two technologies.
> 
> Learning from Mesa’s data model, we developed a distributed storage engine. 
> Unlike Mesa, this storage engine does not rely on any distributed file 
> system. Then we deeply integrate this storage engine with Impala query 
> engine. Query compiling, query execution coordination and catalog management 
> of storage engine are integrated to be frontend daemon; query execution and 
> data storage are integrated to be backend daemon. With this integration, we 
> implemented a single, full-featured, high performance state the art of MPP 
> database, as well as maintaining the simplicity.
> 
> ##Current Status
> 
> Palo has been an open source project on GitHub 
> (https://github.com/baidu/palo).
> 
> ###Meritocracy
> 
> Palo has been deployed in production at Baidu and is applying more than 200 
> lines of business. It has demonstrated great performance benefits and has 
> proved to be a better way for reporting and analysis based big data. Still We 
> look forward to growing a rich user and developer community.
> 
> ###Community
> 
> Palo seeks to develop developer and user communities during incubation.
> 
> ###Core Developers
> 
> * Ruyue Ma (https://github.com/maruyue, 
> maru...@baidu.com<mailto:maru...@baidu.com>)
> * Chun Zhao (https://github.com/imay, 
> buaa.zh...@gmail.com<mailto:buaa.zh...@gmail.com>)
> * Mingyu Chen (https://github.com/morningman,chenmin...@baidu.com)
> * De Li（https://github.com/lide-reed, 
> mailtol...@sina.com）<mailto:mailtol...@sina.com%EF%BC%89>
> * Hao Chen (https://github.com/chenhao7253886, 
> chenha...@baidu.com<mailto:chenha...@baidu.com>)
> * Chaoyong Li (https://github.com/cyongli, 
> lichaoy...@baidu.com<mailto:lichaoy...@baidu.com>)
> * Bin Lin (https://github.com/lingbin, 
> lingbi...@gmail.com<mailto:lingbi...@gmail.com>)
> 
> ###Alignment
> 
> Palo is related to several other Apache projects:
> 
> * Palo can also read data stored in Apache Hadoop clusters powered by the 
> HDFS filesystem.
> * Palo is closely integrated with Impala, which is also being proposed to the 
> Incubator.

Apache Impala has completed Incubation. Jim Apple is VP, Impala.

> * Palo uses Apache Thrift as its RPC and serialization framework of choice.
> 
> ##Known Risks
> 
> ###Orphaned Products
> 
> The core developers of Palo team plan to work full time on this project. 
> There is very little risk of Palo getting orphaned since at least one large 
> company (Baidu) is extensively using it in their production. For example, 
> currently there are more than 200 use cases using Palo in production. 
> Furthermore, since Palo was open sourced at the beginning of October 2017, it 
> has received more than 660 stars and been forked nearly 170 times. We plan to 
> extend and diversify this community further through Apache.
> 
> ###Inexperience with Open Source
> 
> The core developers are all active users and followers of open source. They 
> are already committers and contributors to the Palo Github project. All have 
> been involved with the source code that has been released under an open 
> source license, and several of them also have experience developing code in 
> an open source environment. Though the core set of Developers do not have 
> Apache Open Source experience, there are plans to onboard individuals with 
> Apache open source experience on to the project.
> 
> ###Homogenous Developers
> 
> The most of core developers are from Baidu, but after Palo was open sourced, 
> Palo received a lot of bug fixes and enhancements from other developers not 
> working at Baidu.
> 
> ###Reliance on Salaried Developers
> 
> Baidu invested in Palo as the OLAP solution and some of its key engineers are 
> working full time on the project. In addition, since there is a growing Big 
> Data need for scalable OLAP solutions, we look forward to other Apache 
> developers and researchers to contribute to the project. Also key to 
> addressing the risk associated with relying on Salaried developers from a 
> single entity is to increase the diversity of the contributors and actively 
> lobby for Domain experts in the BI space to contribute. Apache Palo intends 
> to do this.
> 
> ###An Excessive Fascination with the Apache Brand
> 
> Palo is proposing to enter incubation at Apache in order to help efforts to 
> diversify the committer-base, not so much to capitalize on the Apache brand. 
> The Palo project is in production use already inside Baidu, but is not 
> expected to be an Baidu product for external customers. As such, the Palo 
> project is not seeking to use the Apache brand as a marketing tool.
> 
> ##Documentation
> 
> Information about Palo can be found at https://github.com/baidu/palo. The 
> following links provide more information about Palo in open source:
> 
> * Palo wiki site: https://github.com/baidu/palo/wiki
> * Codebase at Github: https://github.com/baidu/palo
> * Issue Tracking: https://github.com/baidu/palo/issues
> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
> 
> ##Initial Source
> 
> Palo has been under development since 2017 by a team of engineers at Baidu 
> Inc. It is currently hosted on Github.com under an Apache license at 
> https://github.com/baidu/palo.
> 
> ##External Dependencies
> 
> Palo has the following external dependencies.
> 
> * Google gflags (BSD)
> * Google glog (BSD)
> * Apache Thrift (Apache Software License v2.0)
> * Apache Commons (Apache Software License v2.0)
> * Boost (Boost Software License)
> * OpenLdap (OpenLDAP Software License)
> * rapidjson (Tencent)
> * Google RE2 (BSD-style)
> * lz4 (BSD)
> * snappy (BSD)
> * cyrus-sasl (CMU License)
> * Twitter Bootstrap (Apache Software License v2.0)
> * d3 (BSD)
> * LLVM (BSD-like)
> 
> Build and test dependencies:
> 
> * ant (Apache Software License v2.0)
> * Apache Maven (Apache Software License v2.0)
> * cmake (BSD)
> * clang (BSD)
> * Google gtest (Apache Software License v2.0)
> 
> ##Required Resources
> 
> ###Mailing List
> 
> There are currently no mailing lists. The usual mailing lists are expected to 
> be set up when entering incubation:
> 
> priv...@palo.incubator.apache.org<mailto:priv...@palo.incubator.apache.org>
> d...@palo.incubator.apache.org<mailto:d...@palo.incubator.apache.org>
> comm...@palo.incubator.apache.org<mailto:comm...@palo.incubator.apache.org>
> 
> ###Subversion Directory
> 
> Upon entering incubation: https://github.com/baidu/palo.
> After incubation, we want to move the existing repo from 
> https://github.com/baidu/palo to Apache infrastructure.
> 
> ###Issue Tracking
> 
> Palo currently uses GitHub to track issues. Would like to continue to do so 
> while we discuss migration possibilities with the ASF Infra committee.
> 
> ###Other Resources
> 
> The existing code already has unit tests so we will make use of existing 
> Apache continuous testing infrastructure. The resulting load should not be 
> very large.
> 
> ##Initial Committers
> 
> * Ruyue Ma (https://github.com/maruyue, 
> maru...@baidu.com<mailto:maru...@baidu.com>)
> * Chun Zhao (https://github.com/imay, 
> buaa.zh...@gmail.com<mailto:buaa.zh...@gmail.com>)
> * Mingyu Chen (https://github.com/morningman,chenmin...@baidu.com)
> * De Li（https://github.com/lide-reed, 
> mailtol...@sina.com）<mailto:mailtol...@sina.com%EF%BC%89>
> * Hao Chen (https://github.com/chenhao7253886, 
> chenha...@baidu.com<mailto:chenha...@baidu.com>)
> * Chaoyong Li (https://github.com/cyongli, 
> lichaoy...@baidu.com<mailto:lichaoy...@baidu.com>)
> * Bin Lin (https://github.com/lingbin, 
> lingbi...@gmail.com<mailto:lingbi...@gmail.com>)
> 
> ##Affiliations
> 
> The initial committers are employees of Baidu Inc.. The nominated mentors are 
> employees of TODO.
> 
> ##Sponsors
> 
> ###Champion
> 
> TODO
> 
> ###Nominated Mentors
> 
> * sijie guo, guosi...@gmail.com<mailto:guosi...@gmail.com>
> * Luke Han, luke...@apache.org<mailto:luke...@apache.org>
> * Zheng Shao, zs...@apache.org<mailto:zs...@apache.org>

Mentors must be members of the IPMC and almost always Members of the ASF.

At this moment only Luke Han is qualified.

Regards,
Dave

> 
> ###Sponsoring Entity
> 
> We are requesting the Incubator to sponsor this project.

signature.asc
Description: Message signed with OpenPGP

Re: Looking for Champion

Reply via email to