I'm not a binding vote on incubator entry, but I think it would be great to have roadmaps as soon as feasible on addressing Tim's concern (which is deeply related to #2, "Licensing") and on addressing the code and toil duplication.
On Mon, Jun 18, 2018 at 11:08 AM, Dave Fisher <dave2w...@comcast.net> wrote: > Hi Li,De - > > Since I agreed to champion this project I think that we need a summary about > what the Incubator PMC cares about in order to accept a podling. What the > prospective project needs to address. We also need to be clear what should > happen during Incubation and at what time. I think that many of the > questions that came up in this thread had to do with assessing how much > effort it will take to Incubate Palo (or whatever the name will be) > > (1) The name Palo. Since there seems to be an issue with that name we should > have a new name. It is not unknown for a podling to change its name, but > that does generate extra work for Infrastructure to change the name after > podling start up. It would be our preference for Palo to find a new name > prior to VOTING on the proposal. Please do this elsewhere and come back to > me with the new name so that I can help with the updated proposal. > > (2) Licensing of the software. Several bits came up as questionable. > Regardless of cleanup that has already occurred we have identified that we > will need to be very careful. It will be important to discuss and carefully > handle the Software Grant Agreement to make sure that the source listed is > correct. I think that the SGA must come early during incubation. > > (3) Relationship with Impala. Palo has apparently forked portions of Impala. > This means that some are concerned that there is a missed synergy with the > Apache Impala project. Is there a clean interface that can be built between > the projects? It would help if the Palo developers would explore this with > Impala at d...@impala.apache.org. > > That said, part of the Incubation process is to learn the Apache Way. IMHO > it is ok for the relationship between Impala PMC and a pooling PPMC to be a > work in process. > > (4) Currently, Willem, Luke Han and Dave Fisher are qualified to officially > mentor. I suggest that Sijie Guo and Zheng Shao be included as Initial > Committers in order to help from within the PPMC. > > On Jun 14, 2018, at 11:03 AM, Jim Apple <jbap...@cloudera.com.INVALID> > wrote: > > I don't want to be a stickler, but I don't think "For issues mentioned by > Jim, Todd and Tim, I have replied on last Saturday." > > To my email about Palo being an ASF project as a storage system without a > query engine, you replied only, "We will seriously consider this proposal." > > I see no response to Tim's concern that "The code isn't owned by any > individual, I contributed it to Apache and it's > free for anyone to do what they want to do with it, but pulling in > improvements from other projects without any attempt to attribute it or > contribute improvements back seems contrary to the Apache way.” > > > Jim - do you need answers to these concerns prior to agreeing to accept this > project into the Incubator? > > Regards, > Dave > > > On Thu, Jun 14, 2018 at 12:48 AM, Li,De(BDG) <l...@baidu.com> wrote: > > Hi all, > > About Palo, we have fixed following issues. > > 1. Related Impala > For issues mentioned by Jim, Todd and Tim, I have replied on last Saturday. > > 2、Lisence issue > For issues mentioned by Todd and Ted. > 1) be/aes/* come from mysql-5.6, GPL v2.1 license > Fixed: removed aes related codes. > https://github.com/baidu/palo/commit/ac770c33d445a4c18a0b74f56b28a4 > 180b30bf > b7 > https://github.com/baidu/palo/commit/3c9f2ae6695ffebe41e39b6bf65440 > 77698f1c > ed > > 2) be/util/mysql_dtoa.cpp copy from MySQL, GPL license > Fixed: removed mysql_dtoa related codes. > https://github.com/baidu/palo/commit/bfe1bc7cf39e165a7c52b2c9415509 > 75b1f841 > a1 > > 3) be/http/mongoose.h, Copyright (c) 2004-2012 Sergey Lyubka > Fixed: restored to original lisence, we are searching another http server > to replace it. > https://github.com/baidu/palo/commit/81baef34f48a2dbe7401712c5e0a50 > f59f04a8 > 31 > > 4) be/rpc/* > Fixed: We have replaced it with brpc, and we will remove Hypertable after > few weeks for waiting users' upgrade to brpc. > https://github.com/baidu/palo/tree/master/be/src/rpc > > 3、Dependency licenses > For issue mentioned by Dave, It looks like that Palo have not depend on > OpenLdap and cyrus-sasl directly, > but some thirdpary libraries need them to compile, libcurl and gperftools > for instance. > For rapidjson, we are looking for alternative one. > > 4、About the name of Palo > For issue mentioned by Julian. > We are figuring out a better one. > > Best Regards, > Reed > > > > 在 2018/6/13 上午8:54, "Li,De(BDG)" <l...@baidu.com> 写入: > > Hi Julian, > > Thank you. > > It looks like that we have to find another one. > If anyone has a good name, please feel free to let me know. > > Best Regards, > Reed > > 在 2018/6/13 上午4:20, "Julian Hyde" <jh...@apache.org> 写入: > > Note that there is an existing database product called Palo - an open > source OLAP engine by German company Jedox[1]. There there is a high > likelihood that Palo would have to change its name during incubation, if > accepted. > > Julian > > [1] https://en.wikipedia.org/wiki/Palo_(OLAP_database) > <https://en.wikipedia.org/wiki/Palo_(OLAP_database)> > > > > On Jun 10, 2018, at 3:49 AM, Han Luke <luke...@gmail.com> wrote: > > Cool Dave, it’s great to have you to be the campaign. > > > ________________________________ > From: Tan,Zhongyi <tanzhon...@baidu.com <mailto:tanzhon...@baidu.com>> > Sent: Saturday, June 9, 2018 8:16:28 AM > To: general@incubator.apache.org <mailto:general@incubator.apache.org> > Subject: Re: Looking for Champion > > thanks,willem > > we are very appreciate. > > 在 2018年6月8日,23:03,Willem Jiang <willem.ji...@gmail.com> 写道: > > Hi, > > I'm willing to be the Mentor. > Please count me in. > > > > Willem Jiang > > Twitter: willemjiang > Weibo: 姜宁willem > > On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher <dave2w...@comcast.net> > wrote: > > Hi - > > I’m willing to Champion and Mentor. I have a couple of comments > inline. > I’ll look at dependency licenses later today. It’s early for me. > > > On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <l...@baidu.com> wrote: > > Hi all, > > I am Reed, as a developer worked with the team for Palo (a MPP-based > > interactive SQL data warehousing). > > https://github.com/baidu/palo/wiki/Palo-Overview > > We propose to contribute Palo as an Apache Incubator project, and > we are still looking for possible Champion if anyone would like to > > volunteer. Thanks a lot. > > > Best Regards, > Reed > > =================== > The draft of the proposal as below: > > #Apache Palo > > ##Abstract > > Palo is a MPP-based interactive SQL data warehousing for reporting > and > > analysis. > > > ##Proposal > > We propose to contribute the Palo codebase and associated artifacts > > (e.g. documentation, web-site content etc.) to the Apache Software > Foundation with the intent of forming a productive, meritocratic and > open > community around Palo’s continued development, according to the > ‘Apache > Way’. > > > Baidu owns several trademarks regarding Palo, and proposes to > transfer > > ownership of those trademarks in full to the ASF. > > > ###Overview of Palo > > Palo’s implementation consists of two daemons: Frontend (FE) and > Backend > > (BE). > > > **Frontend daemon** consists of query coordinator and catalog > manager. > > Query coordinator is responsible for receiving users’ sql queries, > compiling queries and managing queries execution. Catalog manager is > responsible for managing metadata such as databases, tables, > partitions, > replicas and etc. Several frontend daemons could be deployed to > guarantee > fault-tolerance, and load balancing. > > > **Backend daemon** stores the data and executes the query fragments. > > Many backend daemons could also be deployed to provide scalability > and > fault-tolerance. > > > A typical Palo cluster generally composes of several frontend > daemons > > and dozens to hundreds of backend daemons. > > > Users can use MySQL client tools to connect any frontend daemon to > > submit SQL query. Frontend receives the query and compiles it into > query > plans executable by the Backend. Then Frontend sends the query plan > fragments to Backend. Backend will build a query execution DAG. Data > is > fetched and pipelined into the DAG. The final result response is sent > to > client via Frontend. The distribution of query fragment execution > takes > minimizing data movement and maximizing scan locality as the main > goal. > > > ##Background > > At Baidu, Prior to Palo, different tools were deployed to solve > diverse > > requirements in many ways. And when a use case requires the > simultaneous > availability of capabilities that cannot all be provided by a single > tool, > users were forced to build hybrid architectures that stitch multiple > tools > together, but we believe that they shouldn’t need to accept such > inherent > complexity. A storage system built to provide great performance > across a > broad range of workloads provides a more elegant solution to the > problems > that hybrid architectures aim to solve. Palo is the solution. > > > Palo is designed to be a simple and single tightly coupled system, > not > > depending on other systems. Palo provides high concurrent low latency > point > query performance, but also provides high throughput queries of > ad-hoc > analysis. Palo provides bulk-batch data loading, but also provides > near > real-time mini-batch data loading. Palo also provides high > availability, > reliability, fault tolerance, and scalability. > > > ##Rationale > > Palo mainly integrates the technology of Google Mesa and Apache > Impala. > > Mesa is a highly scalable analytic data storage system that stores > > critical measurement data related to Google's Internet advertising > business. Mesa is designed to satisfy complex and challenging set of > users’ > and systems’ requirements, including near real-time data ingestion > and > query ability, as well as high availability, reliability, fault > tolerance, > and scalability for large data and query volumes. > > > Impala is a modern, open-source MPP SQL engine architected from the > > ground up for the Hadoop data processing environment. At present, by > virtue > of its superior performance and rich functionality, Impala has been > comparable to many commercial MPP database query engine. Mesa can > satisfy > the needs of many of our storage requirements, however Mesa itself > does not > provide a SQL query engine; Impala is a very good MPP SQL query > engine, but > the lack of a perfect distributed storage engine. So in the end we > chose > the combination of these two technologies. > > > Learning from Mesa’s data model, we developed a distributed storage > > engine. Unlike Mesa, this storage engine does not rely on any > distributed > file system. Then we deeply integrate this storage engine with Impala > query > engine. Query compiling, query execution coordination and catalog > management of storage engine are integrated to be frontend daemon; > query > execution and data storage are integrated to be backend daemon. With > this > integration, we implemented a single, full-featured, high performance > state > the art of MPP database, as well as maintaining the simplicity. > > > ##Current Status > > Palo has been an open source project on GitHub ( > > https://github.com/baidu/palo). > > > ###Meritocracy > > Palo has been deployed in production at Baidu and is applying more > than > > 200 lines of business. It has demonstrated great performance benefits > and > has proved to be a better way for reporting and analysis based big > data. > Still We look forward to growing a rich user and developer community. > > > ###Community > > Palo seeks to develop developer and user communities during > incubation. > > ###Core Developers > > * Ruyue Ma (https://github.com/maruyue, > maru...@baidu.com<mailto:maruy > > u...@baidu.com>) > > * Chun Zhao (https://github.com/imay, buaa.zh...@gmail.com<mailto: > > bu > > aa.zh...@gmail.com>) > > * Mingyu Chen (https://github.com/morningman,chenmin...@baidu.com) > * De Li(https://github.com/lide-reed, mailtol...@sina.com)<mailto: > > ma > > iltol...@sina.com%EF%BC%89> > > * Hao Chen (https://github.com/chenhao7253886, chenha...@baidu.com > > <mailto:chenha...@baidu.com>) > > * Chaoyong Li (https://github.com/cyongli, > lichaoy...@baidu.com<mailto: > > lichaoy...@baidu.com>) > > * Bin Lin (https://github.com/lingbin, > lingbi...@gmail.com<mailto:lin > > gbi...@gmail.com>) > > > ###Alignment > > Palo is related to several other Apache projects: > > * Palo can also read data stored in Apache Hadoop clusters powered > by > > the HDFS filesystem. > > * Palo is closely integrated with Impala, which is also being > proposed > > to the Incubator. > > Apache Impala has completed Incubation. Jim Apple is VP, Impala. > > * Palo uses Apache Thrift as its RPC and serialization framework of > > choice. > > > ##Known Risks > > ###Orphaned Products > > The core developers of Palo team plan to work full time on this > project. > > There is very little risk of Palo getting orphaned since at least one > large > company (Baidu) is extensively using it in their production. For > example, > currently there are more than 200 use cases using Palo in production. > Furthermore, since Palo was open sourced at the beginning of October > 2017, > it has received more than 660 stars and been forked nearly 170 times. > We > plan to extend and diversify this community further through Apache. > > > ###Inexperience with Open Source > > The core developers are all active users and followers of open > source. > > They are already committers and contributors to the Palo Github > project. > All have been involved with the source code that has been released > under an > open source license, and several of them also have experience > developing > code in an open source environment. Though the core set of Developers > do > not have Apache Open Source experience, there are plans to onboard > individuals with Apache open source experience on to the project. > > > ###Homogenous Developers > > The most of core developers are from Baidu, but after Palo was open > > sourced, Palo received a lot of bug fixes and enhancements from other > developers not working at Baidu. > > > ###Reliance on Salaried Developers > > Baidu invested in Palo as the OLAP solution and some of its key > > engineers are working full time on the project. In addition, since > there is > a growing Big Data need for scalable OLAP solutions, we look forward > to > other Apache developers and researchers to contribute to the project. > Also > key to addressing the risk associated with relying on Salaried > developers > from a single entity is to increase the diversity of the contributors > and > actively lobby for Domain experts in the BI space to contribute. > Apache > Palo intends to do this. > > > ###An Excessive Fascination with the Apache Brand > > Palo is proposing to enter incubation at Apache in order to help > efforts > > to diversify the committer-base, not so much to capitalize on the > Apache > brand. The Palo project is in production use already inside Baidu, > but is > not expected to be an Baidu product for external customers. As such, > the > Palo project is not seeking to use the Apache brand as a marketing > tool. > > > ##Documentation > > Information about Palo can be found at > https://github.com/baidu/palo. > > The following links provide more information about Palo in open > source: > > > * Palo wiki site: https://github.com/baidu/palo/wiki > * Codebase at Github: https://github.com/baidu/palo > * Issue Tracking: https://github.com/baidu/palo/issues > * Overview: https://github.com/baidu/palo/wiki/Palo-Overview > * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ > > ##Initial Source > > Palo has been under development since 2017 by a team of engineers at > > Baidu Inc. It is currently hosted on Github.com under an Apache > license at > https://github.com/baidu/palo. > > > ##External Dependencies > > Palo has the following external dependencies. > > * Google gflags (BSD) > * Google glog (BSD) > * Apache Thrift (Apache Software License v2.0) > * Apache Commons (Apache Software License v2.0) > * Boost (Boost Software License) > * OpenLdap (OpenLDAP Software License) > * rapidjson (Tencent) > * Google RE2 (BSD-style) > * lz4 (BSD) > * snappy (BSD) > * cyrus-sasl (CMU License) > * Twitter Bootstrap (Apache Software License v2.0) > * d3 (BSD) > * LLVM (BSD-like) > > Build and test dependencies: > > * ant (Apache Software License v2.0) > * Apache Maven (Apache Software License v2.0) > * cmake (BSD) > * clang (BSD) > * Google gtest (Apache Software License v2.0) > > ##Required Resources > > ###Mailing List > > There are currently no mailing lists. The usual mailing lists are > > expected to be set up when entering incubation: > > > priv...@palo.incubator.apache.org<mailto:private@palo. > > incubator.apache.org> > > d...@palo.incubator.apache.org<mailto:d...@palo.incubator.apache.org> > comm...@palo.incubator.apache.org<mailto:commits@palo. > > incubator.apache.org> > > > ###Subversion Directory > > Upon entering incubation: https://github.com/baidu/palo. > After incubation, we want to move the existing repo from > > https://github.com/baidu/palo to Apache infrastructure. > > > ###Issue Tracking > > Palo currently uses GitHub to track issues. Would like to continue > to do > > so while we discuss migration possibilities with the ASF Infra > committee. > > > ###Other Resources > > The existing code already has unit tests so we will make use of > existing > > Apache continuous testing infrastructure. The resulting load should > not be > very large. > > > ##Initial Committers > > * Ruyue Ma (https://github.com/maruyue, > maru...@baidu.com<mailto:maruy > > u...@baidu.com>) > > * Chun Zhao (https://github.com/imay, buaa.zh...@gmail.com<mailto: > > bu > > aa.zh...@gmail.com>) > > * Mingyu Chen (https://github.com/morningman,chenmin...@baidu.com) > * De Li(https://github.com/lide-reed, mailtol...@sina.com)<mailto: > > ma > > iltol...@sina.com%EF%BC%89> > > * Hao Chen (https://github.com/chenhao7253886, chenha...@baidu.com > > <mailto:chenha...@baidu.com>) > > * Chaoyong Li (https://github.com/cyongli, > lichaoy...@baidu.com<mailto: > > lichaoy...@baidu.com>) > > * Bin Lin (https://github.com/lingbin, > lingbi...@gmail.com<mailto:lin > > gbi...@gmail.com>) > > > ##Affiliations > > The initial committers are employees of Baidu Inc.. The nominated > > mentors are employees of TODO. > > > ##Sponsors > > ###Champion > > TODO > > ###Nominated Mentors > > * sijie guo, guosi...@gmail.com<mailto:guosi...@gmail.com> > * Luke Han, luke...@apache.org<mailto:luke...@apache.org> > * Zheng Shao, zs...@apache.org<mailto:zs...@apache.org> > > > Mentors must be members of the IPMC and almost always Members of the > ASF. > > At this moment only Luke Han is qualified. > > Regards, > Dave > > > ###Sponsoring Entity > > We are requesting the Incubator to sponsor this project. > > > > ?B婯 > KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK > > KKKKKKKCB??[ > > 溳 > X溫軞X橩??K[XZ[??賉橽榌 ][溳X溫軞X橮?[樰X榏?軏榎?X?K涇櫭B憶軋?Y??]?[蹣[?? > > 圹[X[???K[XZ[??賉橽榌 > > Z?[???[樰X榏?軏榎?X?K涇櫭B > > > > ?B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK > > KKKKKKKKCB� > > ?�?[��X��ܚX�K??K[XZ[?�?�[�\�[?][��X��ܚX�P?[��X�]?܋�\?X�?K� > > ܙ�B��܈?Y??]?[ۘ[? > > ?��[X[�?�??K[XZ[?�?�[�\�[?Z?[???[��X�]?܋�\?X�?K�ܙ�B > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org