Cool Dave, it’s great to have you to be the campaign.
From: Tan,Zhongyi
Sent: Saturday, June 9, 2018 8:16:28 AM
To: general@incubator.apache.org
Subject: Re: Looking for Champion
thanks,willem
we are very appreciate.
> 在 2018年6月8日,23:03,Willem Jiang 写道:
>
> Hi,
>
> I'm willing to be the Mentor.
> Please count me in.
>
>
>
> Willem Jiang
>
> Twitter: willemjiang
> Weibo: 姜宁willem
>
>> On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher wrote:
>>
>> Hi -
>>
>> I’m willing to Champion and Mentor. I have a couple of comments inline.
>> I’ll look at dependency licenses later today. It’s early for me.
>>
>>
>>> On Jun 7, 2018, at 9:45 PM, Li,De(BDG) wrote:
>>>
>>> Hi all,
>>>
>>> I am Reed, as a developer worked with the team for Palo (a MPP-based
>> interactive SQL data warehousing).
>>> https://github.com/baidu/palo/wiki/Palo-Overview
>>>
>>> We propose to contribute Palo as an Apache Incubator project, and
>>> we are still looking for possible Champion if anyone would like to
>> volunteer. Thanks a lot.
>>>
>>> Best Regards,
>>> Reed
>>>
>>> ===
>>> The draft of the proposal as below:
>>>
>>> #Apache Palo
>>>
>>> ##Abstract
>>>
>>> Palo is a MPP-based interactive SQL data warehousing for reporting and
>> analysis.
>>>
>>> ##Proposal
>>>
>>> We propose to contribute the Palo codebase and associated artifacts
>> (e.g. documentation, web-site content etc.) to the Apache Software
>> Foundation with the intent of forming a productive, meritocratic and open
>> community around Palo’s continued development, according to the ‘Apache
>> Way’.
>>>
>>> Baidu owns several trademarks regarding Palo, and proposes to transfer
>> ownership of those trademarks in full to the ASF.
>>>
>>> ###Overview of Palo
>>>
>>> Palo’s implementation consists of two daemons: Frontend (FE) and Backend
>> (BE).
>>>
>>> **Frontend daemon** consists of query coordinator and catalog manager.
>> Query coordinator is responsible for receiving users’ sql queries,
>> compiling queries and managing queries execution. Catalog manager is
>> responsible for managing metadata such as databases, tables, partitions,
>> replicas and etc. Several frontend daemons could be deployed to guarantee
>> fault-tolerance, and load balancing.
>>>
>>> **Backend daemon** stores the data and executes the query fragments.
>> Many backend daemons could also be deployed to provide scalability and
>> fault-tolerance.
>>>
>>> A typical Palo cluster generally composes of several frontend daemons
>> and dozens to hundreds of backend daemons.
>>>
>>> Users can use MySQL client tools to connect any frontend daemon to
>> submit SQL query. Frontend receives the query and compiles it into query
>> plans executable by the Backend. Then Frontend sends the query plan
>> fragments to Backend. Backend will build a query execution DAG. Data is
>> fetched and pipelined into the DAG. The final result response is sent to
>> client via Frontend. The distribution of query fragment execution takes
>> minimizing data movement and maximizing scan locality as the main goal.
>>>
>>> ##Background
>>>
>>> At Baidu, Prior to Palo, different tools were deployed to solve diverse
>> requirements in many ways. And when a use case requires the simultaneous
>> availability of capabilities that cannot all be provided by a single tool,
>> users were forced to build hybrid architectures that stitch multiple tools
>> together, but we believe that they shouldn’t need to accept such inherent
>> complexity. A storage system built to provide great performance across a
>> broad range of workloads provides a more elegant solution to the problems
>> that hybrid architectures aim to solve. Palo is the solution.
>>>
>>> Palo is designed to be a simple and single tightly coupled system, not
>> depending on other systems. Palo provides high concurrent low latency point
>> query performance, but also provides high throughput queries of ad-hoc
>> analysis. Palo provides bulk-batch data loading, but also provides near
>> real-time mini-batch data loading. Palo also provides high availability,
>> reliability, fault tolerance, and scalability.
>>>
>>> ##Rationale
>>>
>>> Palo mainly integrates the technology of Google Mesa and Apache Impala.
>>>
>>> Mesa is a highly scalable analytic data storage system that stores
>> critical measurement data related to Google's Internet advertising
>> business. Mesa is designed to satisfy complex and challenging set of users’
>> and systems’ requirements, including near real-time data ingestion and
>> query ability, as well as high availability, reliability, fault tolerance,
>> and scalability for large data and query volumes.
>>>
>>> Impala is a modern, open-source MPP SQL engine architected from the
>> ground up for the Hadoop data processing environment. At present, by virtue
>> of its superior performance and rich functionality, Impala has been
>> comparable to many commercial MPP database query engine. Mesa can satisfy
>> the needs of