Re: [DISCUSS] Apache Pinot Incubator Proposal

2018-03-09 Thread kishore g
Thanks Jim. I will update the proposal.



On Fri, Mar 9, 2018 at 1:37 PM, Jim Jagielski  wrote:

> If still looking for mentors, I'll throw my hat in.
>
> > On Mar 9, 2018, at 1:19 PM, kishore g  wrote:
> >
> > @John, will start that process asap.
> >
> > @Felix Yes, we are looking for mentors. I can remove myself since I will
> be
> > actively participating the project anyways.
> >
> > On Fri, Mar 9, 2018 at 8:58 AM, Felix Cheung 
> wrote:
> >
> >> Hi Kishore - do you need one more mentor?
> >>
> >>
> >> On Tue, Feb 13, 2018 at 12:10 AM kishore g  wrote:
> >>
> >>> Hello,
> >>>
> >>> I would like to propose Pinot as an Apache Incubator project. The
> >> proposal
> >>> is available as a draft at https://wiki.apache.org/
> >> incubator/PinotProposal.
> >>> I
> >>> have also included the text of the proposal below.
> >>>
> >>> Any feedback from the community is much appreciated.
> >>>
> >>> Regards,
> >>> Kishore G
> >>>
> >>> = Pinot Proposal =
> >>>
> >>> == Abstract ==
> >>>
> >>> Pinot is a distributed columnar storage engine that can ingest data in
> >>> real-time and serve analytical queries at low latency. There are two
> >> modes
> >>> of data ingestion - batch and/or realtime. Batch mode allows users to
> >>> generate pinot segments externally using systems such as Hadoop. These
> >>> segments can be uploaded into Pinot via simple curl calls. Pinot can
> >> ingest
> >>> data in near real-time from streaming sources such as Kafka. Data
> >> ingested
> >>> into Pinot is stored in a columnar format. Pinot provides a SQL like
> >>> interface (PQL) that supports filters, aggregations, and group by
> >>> operations. It does not support joins by design, in order to guarantee
> >>> predictable latency. It leverages other Apache projects such as
> >> Zookeeper,
> >>> Kafka, and Helix, along with many libraries from the ASF.
> >>>
> >>> == Proposal ==
> >>>
> >>> Pinot was open sourced by LinkedIn and hosted on GitHub. Majority of
> the
> >>> development happens at LinkedIn with other contributions from Uber and
> >>> Slack. We believe that being a part of Apache Software Foundation will
> >>> improve the diversity and help form a strong community around the
> >> project.
> >>>
> >>> LinkedIn submits this proposal to donate the code base to Apache
> Software
> >>> Foundation. The code is already under Apache License 2.0.  Code and the
> >>> documentation are hosted on Github.
> >>> * Code: http://github.com/linkedin/pinot
> >>> * Documentation: https://github.com/linkedin/pinot/wiki
> >>>
> >>>
> >>> == Background ==
> >>>
> >>> LinkedIn, similar to other companies, has many applications that
> provide
> >>> rich real-time insights to members and customers (internal and
> external).
> >>> The workload characteristics for these applications vary a lot. Some
> >>> internal applications simply need ad-hoc query capabilities with
> >> sub-second
> >>> to multiple seconds latency. But external site facing applications
> >> require
> >>> strong SLA even very high workloads. Prior to Pinot, LinkedIn had
> >> multiple
> >>> solutions depending on the workload generated by the application and
> this
> >>> was inefficient. Pinot was developed to be the one single platform that
> >>> addresses all classes of applications. Today at LinkedIn, Pinot powers
> >> more
> >>> than 50 site facing products with workload ranging from few queries per
> >>> second to 1000’s of queries per second while maintaining the 99th
> >>> percentile latency which can be as low as few milliseconds. All
> internal
> >>> dashboards at LinkedIn are powered by Pinot.
> >>>
> >>> == Rationale ==
> >>>
> >>> We believe that requirement to develop rich real-time analytic
> >> applications
> >>> is applicable to other organizations. Both Pinot and the interested
> >>> communities would benefit from this work being openly available.
> >>>
> >>> == Current Status ==
> >>>
> >>> Pinot is currently open sourced under the Apache License Version 2.0
> and
> >>> available at github.com/linkedin/pinot. All the development is done
> >> using
> >>> GitHub Pull Requests. We cut releases on a weekly basis and deploy it
> at
> >>> LinkedIn. mp-0.1.468 is the latest release tag that is deployed in
> >>> production.
> >>>
> >>> == Meritocracy ==
> >>>
> >>> Following the Apache meritocracy model, we intend to build an open and
> >>> diverse community around Pinot. We will encourage the community to
> >>> contribute to discussion and codebase.
> >>>
> >>> == Community ==
> >>>
> >>> Pinot is currently used extensively at LinkedIn and Uber. Several
> >> companies
> >>> have expressed interest in the project. We hope to extend the
> contributor
> >>> base significantly by bringing Pinot into Apache.
> >>>
> >>> == Core Developers ==
> >>>
> >>> Pinot was started by engineers at LinkedIn, and now has committers from
> >>> Uber.
> >>>
> >>> == Alignment ==
> >>>
> >>> Apache is the most natural home for taking Pinot forward. Pinot
> leverages
> >>> several existing Apac

Re: [DISCUSS] Apache Pinot Incubator Proposal

2018-03-09 Thread Jim Jagielski
If still looking for mentors, I'll throw my hat in.

> On Mar 9, 2018, at 1:19 PM, kishore g  wrote:
> 
> @John, will start that process asap.
> 
> @Felix Yes, we are looking for mentors. I can remove myself since I will be
> actively participating the project anyways.
> 
> On Fri, Mar 9, 2018 at 8:58 AM, Felix Cheung  wrote:
> 
>> Hi Kishore - do you need one more mentor?
>> 
>> 
>> On Tue, Feb 13, 2018 at 12:10 AM kishore g  wrote:
>> 
>>> Hello,
>>> 
>>> I would like to propose Pinot as an Apache Incubator project. The
>> proposal
>>> is available as a draft at https://wiki.apache.org/
>> incubator/PinotProposal.
>>> I
>>> have also included the text of the proposal below.
>>> 
>>> Any feedback from the community is much appreciated.
>>> 
>>> Regards,
>>> Kishore G
>>> 
>>> = Pinot Proposal =
>>> 
>>> == Abstract ==
>>> 
>>> Pinot is a distributed columnar storage engine that can ingest data in
>>> real-time and serve analytical queries at low latency. There are two
>> modes
>>> of data ingestion - batch and/or realtime. Batch mode allows users to
>>> generate pinot segments externally using systems such as Hadoop. These
>>> segments can be uploaded into Pinot via simple curl calls. Pinot can
>> ingest
>>> data in near real-time from streaming sources such as Kafka. Data
>> ingested
>>> into Pinot is stored in a columnar format. Pinot provides a SQL like
>>> interface (PQL) that supports filters, aggregations, and group by
>>> operations. It does not support joins by design, in order to guarantee
>>> predictable latency. It leverages other Apache projects such as
>> Zookeeper,
>>> Kafka, and Helix, along with many libraries from the ASF.
>>> 
>>> == Proposal ==
>>> 
>>> Pinot was open sourced by LinkedIn and hosted on GitHub. Majority of the
>>> development happens at LinkedIn with other contributions from Uber and
>>> Slack. We believe that being a part of Apache Software Foundation will
>>> improve the diversity and help form a strong community around the
>> project.
>>> 
>>> LinkedIn submits this proposal to donate the code base to Apache Software
>>> Foundation. The code is already under Apache License 2.0.  Code and the
>>> documentation are hosted on Github.
>>> * Code: http://github.com/linkedin/pinot
>>> * Documentation: https://github.com/linkedin/pinot/wiki
>>> 
>>> 
>>> == Background ==
>>> 
>>> LinkedIn, similar to other companies, has many applications that provide
>>> rich real-time insights to members and customers (internal and external).
>>> The workload characteristics for these applications vary a lot. Some
>>> internal applications simply need ad-hoc query capabilities with
>> sub-second
>>> to multiple seconds latency. But external site facing applications
>> require
>>> strong SLA even very high workloads. Prior to Pinot, LinkedIn had
>> multiple
>>> solutions depending on the workload generated by the application and this
>>> was inefficient. Pinot was developed to be the one single platform that
>>> addresses all classes of applications. Today at LinkedIn, Pinot powers
>> more
>>> than 50 site facing products with workload ranging from few queries per
>>> second to 1000’s of queries per second while maintaining the 99th
>>> percentile latency which can be as low as few milliseconds. All internal
>>> dashboards at LinkedIn are powered by Pinot.
>>> 
>>> == Rationale ==
>>> 
>>> We believe that requirement to develop rich real-time analytic
>> applications
>>> is applicable to other organizations. Both Pinot and the interested
>>> communities would benefit from this work being openly available.
>>> 
>>> == Current Status ==
>>> 
>>> Pinot is currently open sourced under the Apache License Version 2.0 and
>>> available at github.com/linkedin/pinot. All the development is done
>> using
>>> GitHub Pull Requests. We cut releases on a weekly basis and deploy it at
>>> LinkedIn. mp-0.1.468 is the latest release tag that is deployed in
>>> production.
>>> 
>>> == Meritocracy ==
>>> 
>>> Following the Apache meritocracy model, we intend to build an open and
>>> diverse community around Pinot. We will encourage the community to
>>> contribute to discussion and codebase.
>>> 
>>> == Community ==
>>> 
>>> Pinot is currently used extensively at LinkedIn and Uber. Several
>> companies
>>> have expressed interest in the project. We hope to extend the contributor
>>> base significantly by bringing Pinot into Apache.
>>> 
>>> == Core Developers ==
>>> 
>>> Pinot was started by engineers at LinkedIn, and now has committers from
>>> Uber.
>>> 
>>> == Alignment ==
>>> 
>>> Apache is the most natural home for taking Pinot forward. Pinot leverages
>>> several existing Apache Projects such as Kafka, Helix, Zookeeper, and
>> Avro.
>>> As Pinot gains adoption, we plan to add support for the ORC and Parquet
>>> formats, as well as adding integration with Yarn and Mesos.
>>> 
>>> == Known Risks ==
>>> 
>>> === Orphaned Products ===
>>> 
>>> The risk of the Pinot project being abandoned is minimal. The teams

Re: [DISCUSS] Apache Pinot Incubator Proposal

2018-03-09 Thread kishore g
@John, will start that process asap.

@Felix Yes, we are looking for mentors. I can remove myself since I will be
actively participating the project anyways.

On Fri, Mar 9, 2018 at 8:58 AM, Felix Cheung  wrote:

> Hi Kishore - do you need one more mentor?
>
>
> On Tue, Feb 13, 2018 at 12:10 AM kishore g  wrote:
>
> > Hello,
> >
> > I would like to propose Pinot as an Apache Incubator project. The
> proposal
> > is available as a draft at https://wiki.apache.org/
> incubator/PinotProposal.
> > I
> > have also included the text of the proposal below.
> >
> > Any feedback from the community is much appreciated.
> >
> > Regards,
> > Kishore G
> >
> > = Pinot Proposal =
> >
> > == Abstract ==
> >
> > Pinot is a distributed columnar storage engine that can ingest data in
> > real-time and serve analytical queries at low latency. There are two
> modes
> > of data ingestion - batch and/or realtime. Batch mode allows users to
> > generate pinot segments externally using systems such as Hadoop. These
> > segments can be uploaded into Pinot via simple curl calls. Pinot can
> ingest
> > data in near real-time from streaming sources such as Kafka. Data
> ingested
> > into Pinot is stored in a columnar format. Pinot provides a SQL like
> > interface (PQL) that supports filters, aggregations, and group by
> > operations. It does not support joins by design, in order to guarantee
> > predictable latency. It leverages other Apache projects such as
> Zookeeper,
> > Kafka, and Helix, along with many libraries from the ASF.
> >
> > == Proposal ==
> >
> > Pinot was open sourced by LinkedIn and hosted on GitHub. Majority of the
> > development happens at LinkedIn with other contributions from Uber and
> > Slack. We believe that being a part of Apache Software Foundation will
> > improve the diversity and help form a strong community around the
> project.
> >
> > LinkedIn submits this proposal to donate the code base to Apache Software
> > Foundation. The code is already under Apache License 2.0.  Code and the
> > documentation are hosted on Github.
> >  * Code: http://github.com/linkedin/pinot
> >  * Documentation: https://github.com/linkedin/pinot/wiki
> >
> >
> > == Background ==
> >
> > LinkedIn, similar to other companies, has many applications that provide
> > rich real-time insights to members and customers (internal and external).
> > The workload characteristics for these applications vary a lot. Some
> > internal applications simply need ad-hoc query capabilities with
> sub-second
> > to multiple seconds latency. But external site facing applications
> require
> > strong SLA even very high workloads. Prior to Pinot, LinkedIn had
> multiple
> > solutions depending on the workload generated by the application and this
> > was inefficient. Pinot was developed to be the one single platform that
> > addresses all classes of applications. Today at LinkedIn, Pinot powers
> more
> > than 50 site facing products with workload ranging from few queries per
> > second to 1000’s of queries per second while maintaining the 99th
> > percentile latency which can be as low as few milliseconds. All internal
> > dashboards at LinkedIn are powered by Pinot.
> >
> > == Rationale ==
> >
> > We believe that requirement to develop rich real-time analytic
> applications
> > is applicable to other organizations. Both Pinot and the interested
> > communities would benefit from this work being openly available.
> >
> > == Current Status ==
> >
> > Pinot is currently open sourced under the Apache License Version 2.0 and
> > available at github.com/linkedin/pinot. All the development is done
> using
> > GitHub Pull Requests. We cut releases on a weekly basis and deploy it at
> > LinkedIn. mp-0.1.468 is the latest release tag that is deployed in
> > production.
> >
> > == Meritocracy ==
> >
> > Following the Apache meritocracy model, we intend to build an open and
> > diverse community around Pinot. We will encourage the community to
> > contribute to discussion and codebase.
> >
> > == Community ==
> >
> > Pinot is currently used extensively at LinkedIn and Uber. Several
> companies
> > have expressed interest in the project. We hope to extend the contributor
> > base significantly by bringing Pinot into Apache.
> >
> > == Core Developers ==
> >
> > Pinot was started by engineers at LinkedIn, and now has committers from
> > Uber.
> >
> > == Alignment ==
> >
> > Apache is the most natural home for taking Pinot forward. Pinot leverages
> > several existing Apache Projects such as Kafka, Helix, Zookeeper, and
> Avro.
> > As Pinot gains adoption, we plan to add support for the ORC and Parquet
> > formats, as well as adding integration with Yarn and Mesos.
> >
> > == Known Risks ==
> >
> > === Orphaned Products ===
> >
> > The risk of the Pinot project being abandoned is minimal. The teams at
> > LinkedIn and Uber are highly incentivized to continue development of
> Pinot
> > as it is a critical part of their infrastructure.
> >
> > === Inexperience wi

Re: [DISCUSS] Apache Pinot Incubator Proposal

2018-03-09 Thread Felix Cheung
Hi Kishore - do you need one more mentor?


On Tue, Feb 13, 2018 at 12:10 AM kishore g  wrote:

> Hello,
>
> I would like to propose Pinot as an Apache Incubator project. The proposal
> is available as a draft at https://wiki.apache.org/incubator/PinotProposal.
> I
> have also included the text of the proposal below.
>
> Any feedback from the community is much appreciated.
>
> Regards,
> Kishore G
>
> = Pinot Proposal =
>
> == Abstract ==
>
> Pinot is a distributed columnar storage engine that can ingest data in
> real-time and serve analytical queries at low latency. There are two modes
> of data ingestion - batch and/or realtime. Batch mode allows users to
> generate pinot segments externally using systems such as Hadoop. These
> segments can be uploaded into Pinot via simple curl calls. Pinot can ingest
> data in near real-time from streaming sources such as Kafka. Data ingested
> into Pinot is stored in a columnar format. Pinot provides a SQL like
> interface (PQL) that supports filters, aggregations, and group by
> operations. It does not support joins by design, in order to guarantee
> predictable latency. It leverages other Apache projects such as Zookeeper,
> Kafka, and Helix, along with many libraries from the ASF.
>
> == Proposal ==
>
> Pinot was open sourced by LinkedIn and hosted on GitHub. Majority of the
> development happens at LinkedIn with other contributions from Uber and
> Slack. We believe that being a part of Apache Software Foundation will
> improve the diversity and help form a strong community around the project.
>
> LinkedIn submits this proposal to donate the code base to Apache Software
> Foundation. The code is already under Apache License 2.0.  Code and the
> documentation are hosted on Github.
>  * Code: http://github.com/linkedin/pinot
>  * Documentation: https://github.com/linkedin/pinot/wiki
>
>
> == Background ==
>
> LinkedIn, similar to other companies, has many applications that provide
> rich real-time insights to members and customers (internal and external).
> The workload characteristics for these applications vary a lot. Some
> internal applications simply need ad-hoc query capabilities with sub-second
> to multiple seconds latency. But external site facing applications require
> strong SLA even very high workloads. Prior to Pinot, LinkedIn had multiple
> solutions depending on the workload generated by the application and this
> was inefficient. Pinot was developed to be the one single platform that
> addresses all classes of applications. Today at LinkedIn, Pinot powers more
> than 50 site facing products with workload ranging from few queries per
> second to 1000’s of queries per second while maintaining the 99th
> percentile latency which can be as low as few milliseconds. All internal
> dashboards at LinkedIn are powered by Pinot.
>
> == Rationale ==
>
> We believe that requirement to develop rich real-time analytic applications
> is applicable to other organizations. Both Pinot and the interested
> communities would benefit from this work being openly available.
>
> == Current Status ==
>
> Pinot is currently open sourced under the Apache License Version 2.0 and
> available at github.com/linkedin/pinot. All the development is done using
> GitHub Pull Requests. We cut releases on a weekly basis and deploy it at
> LinkedIn. mp-0.1.468 is the latest release tag that is deployed in
> production.
>
> == Meritocracy ==
>
> Following the Apache meritocracy model, we intend to build an open and
> diverse community around Pinot. We will encourage the community to
> contribute to discussion and codebase.
>
> == Community ==
>
> Pinot is currently used extensively at LinkedIn and Uber. Several companies
> have expressed interest in the project. We hope to extend the contributor
> base significantly by bringing Pinot into Apache.
>
> == Core Developers ==
>
> Pinot was started by engineers at LinkedIn, and now has committers from
> Uber.
>
> == Alignment ==
>
> Apache is the most natural home for taking Pinot forward. Pinot leverages
> several existing Apache Projects such as Kafka, Helix, Zookeeper, and Avro.
> As Pinot gains adoption, we plan to add support for the ORC and Parquet
> formats, as well as adding integration with Yarn and Mesos.
>
> == Known Risks ==
>
> === Orphaned Products ===
>
> The risk of the Pinot project being abandoned is minimal. The teams at
> LinkedIn and Uber are highly incentivized to continue development of Pinot
> as it is a critical part of their infrastructure.
>
> === Inexperience with Open Source ===
>
> Post open sourcing, Pinot was completely developed on GitHub. All the
> current developers on Pinot are well aware of the open source development
> process. However, most of the developers are new to the Apache process.
> Kishore Gopalakrishna, one of the lead developers in Pinot, is VP and
> committer of the Apache Helix project.
>
> === Homogenous Developers ===
>
> The current core developers are all from LinkedIn and Uber. However, 

Re: [DISCUSS] Apache Pinot Incubator Proposal

2018-03-09 Thread John D. Ament
Hi Kishore

Presently you're listed as a mentor on this proposal.  However, you're not
a member of the IPMC.  The joining process is easy, please review
https://incubator.apache.org/guides/pmc.html#joining_the_ipmc

John

On Tue, Feb 13, 2018 at 8:13 PM Olivier Lamy  wrote:

> Hi
> Kishore well I think as you are an ASF member you can add yourself as a
> mentor :-)
>
>
> On 14 February 2018 at 01:01, kishore g  wrote:
>
> > Kevin,
> >
> > Increasing the adoption of Pinot is one thing that can help build a good
> > diverse community. Few things that come to my mind
> > - Improve documentation
> > - Better integration with cloud providers
> > - Meetup and blog posts.
> >
> > We would also love to get additional mentors from ASF to help us build
> the
> > community around Pinot.
> >
> >
> >
> >
> > On Tue, Feb 13, 2018 at 4:29 PM, Timothy Chen  wrote:
> >
> > > Love to see this in the incubator as well. +1
> > >
> > > Tim
> > >
> > > On Tue, Feb 13, 2018 at 4:22 PM, Kevin A. McGrail
> > >  wrote:
> > > > Agreed.  It could use more mentors from ASF which I'm too overloaded
> to
> > > help
> > > > with but I'd be inclined to +1 this.  Do you have some thoughts on
> > > getting
> > > > more community people outside of LI and Uber to help?
> > > >
> > > > On 2/13/2018 7:07 PM, Dave Fisher wrote:
> > > >>
> > > >> Noir or Blanc? Gris or Grigio? What’s the vintage?
> > > >>
> > > >> All kidding aside this looks interesting.
> > > >>
> > > >> Regards,
> > > >> Dave
> > > >>
> > > >> Sent from my iPhone
> > > >>
> > > >>> On Feb 13, 2018, at 12:10 AM, kishore g 
> wrote:
> > > >>>
> > > >>> Hello,
> > > >>>
> > > >>> I would like to propose Pinot as an Apache Incubator project. The
> > > >>> proposal
> > > >>> is available as a draft at
> > > >>> https://wiki.apache.org/incubator/PinotProposal. I
> > > >>> have also included the text of the proposal below.
> > > >>>
> > > >>> Any feedback from the community is much appreciated.
> > > >>>
> > > >>> Regards,
> > > >>> Kishore G
> > > >>>
> > > >>> = Pinot Proposal =
> > > >>>
> > > >>> == Abstract ==
> > > >>>
> > > >>> Pinot is a distributed columnar storage engine that can ingest data
> > in
> > > >>> real-time and serve analytical queries at low latency. There are
> two
> > > >>> modes
> > > >>> of data ingestion - batch and/or realtime. Batch mode allows users
> to
> > > >>> generate pinot segments externally using systems such as Hadoop.
> > These
> > > >>> segments can be uploaded into Pinot via simple curl calls. Pinot
> can
> > > >>> ingest
> > > >>> data in near real-time from streaming sources such as Kafka. Data
> > > >>> ingested
> > > >>> into Pinot is stored in a columnar format. Pinot provides a SQL
> like
> > > >>> interface (PQL) that supports filters, aggregations, and group by
> > > >>> operations. It does not support joins by design, in order to
> > guarantee
> > > >>> predictable latency. It leverages other Apache projects such as
> > > >>> Zookeeper,
> > > >>> Kafka, and Helix, along with many libraries from the ASF.
> > > >>>
> > > >>> == Proposal ==
> > > >>>
> > > >>> Pinot was open sourced by LinkedIn and hosted on GitHub. Majority
> of
> > > the
> > > >>> development happens at LinkedIn with other contributions from Uber
> > and
> > > >>> Slack. We believe that being a part of Apache Software Foundation
> > will
> > > >>> improve the diversity and help form a strong community around the
> > > >>> project.
> > > >>>
> > > >>> LinkedIn submits this proposal to donate the code base to Apache
> > > Software
> > > >>> Foundation. The code is already under Apache License 2.0.  Code and
> > the
> > > >>> documentation are hosted on Github.
> > > >>> * Code: http://github.com/linkedin/pinot
> > > >>> * Documentation: https://github.com/linkedin/pinot/wiki
> > > >>>
> > > >>>
> > > >>> == Background ==
> > > >>>
> > > >>> LinkedIn, similar to other companies, has many applications that
> > > provide
> > > >>> rich real-time insights to members and customers (internal and
> > > external).
> > > >>> The workload characteristics for these applications vary a lot.
> Some
> > > >>> internal applications simply need ad-hoc query capabilities with
> > > >>> sub-second
> > > >>> to multiple seconds latency. But external site facing applications
> > > >>> require
> > > >>> strong SLA even very high workloads. Prior to Pinot, LinkedIn had
> > > >>> multiple
> > > >>> solutions depending on the workload generated by the application
> and
> > > this
> > > >>> was inefficient. Pinot was developed to be the one single platform
> > that
> > > >>> addresses all classes of applications. Today at LinkedIn, Pinot
> > powers
> > > >>> more
> > > >>> than 50 site facing products with workload ranging from few queries
> > per
> > > >>> second to 1000’s of queries per second while maintaining the 99th
> > > >>> percentile latency which can be as low as few milliseconds. All
> > > internal
> > > >>> dashboards at LinkedIn are powered by Pinot.
> > > >>>
> > > >>> == Rational

Re: [DISCUSS] Apache Pinot Incubator Proposal

2018-02-13 Thread Olivier Lamy
Hi
Kishore well I think as you are an ASF member you can add yourself as a
mentor :-)


On 14 February 2018 at 01:01, kishore g  wrote:

> Kevin,
>
> Increasing the adoption of Pinot is one thing that can help build a good
> diverse community. Few things that come to my mind
> - Improve documentation
> - Better integration with cloud providers
> - Meetup and blog posts.
>
> We would also love to get additional mentors from ASF to help us build the
> community around Pinot.
>
>
>
>
> On Tue, Feb 13, 2018 at 4:29 PM, Timothy Chen  wrote:
>
> > Love to see this in the incubator as well. +1
> >
> > Tim
> >
> > On Tue, Feb 13, 2018 at 4:22 PM, Kevin A. McGrail
> >  wrote:
> > > Agreed.  It could use more mentors from ASF which I'm too overloaded to
> > help
> > > with but I'd be inclined to +1 this.  Do you have some thoughts on
> > getting
> > > more community people outside of LI and Uber to help?
> > >
> > > On 2/13/2018 7:07 PM, Dave Fisher wrote:
> > >>
> > >> Noir or Blanc? Gris or Grigio? What’s the vintage?
> > >>
> > >> All kidding aside this looks interesting.
> > >>
> > >> Regards,
> > >> Dave
> > >>
> > >> Sent from my iPhone
> > >>
> > >>> On Feb 13, 2018, at 12:10 AM, kishore g  wrote:
> > >>>
> > >>> Hello,
> > >>>
> > >>> I would like to propose Pinot as an Apache Incubator project. The
> > >>> proposal
> > >>> is available as a draft at
> > >>> https://wiki.apache.org/incubator/PinotProposal. I
> > >>> have also included the text of the proposal below.
> > >>>
> > >>> Any feedback from the community is much appreciated.
> > >>>
> > >>> Regards,
> > >>> Kishore G
> > >>>
> > >>> = Pinot Proposal =
> > >>>
> > >>> == Abstract ==
> > >>>
> > >>> Pinot is a distributed columnar storage engine that can ingest data
> in
> > >>> real-time and serve analytical queries at low latency. There are two
> > >>> modes
> > >>> of data ingestion - batch and/or realtime. Batch mode allows users to
> > >>> generate pinot segments externally using systems such as Hadoop.
> These
> > >>> segments can be uploaded into Pinot via simple curl calls. Pinot can
> > >>> ingest
> > >>> data in near real-time from streaming sources such as Kafka. Data
> > >>> ingested
> > >>> into Pinot is stored in a columnar format. Pinot provides a SQL like
> > >>> interface (PQL) that supports filters, aggregations, and group by
> > >>> operations. It does not support joins by design, in order to
> guarantee
> > >>> predictable latency. It leverages other Apache projects such as
> > >>> Zookeeper,
> > >>> Kafka, and Helix, along with many libraries from the ASF.
> > >>>
> > >>> == Proposal ==
> > >>>
> > >>> Pinot was open sourced by LinkedIn and hosted on GitHub. Majority of
> > the
> > >>> development happens at LinkedIn with other contributions from Uber
> and
> > >>> Slack. We believe that being a part of Apache Software Foundation
> will
> > >>> improve the diversity and help form a strong community around the
> > >>> project.
> > >>>
> > >>> LinkedIn submits this proposal to donate the code base to Apache
> > Software
> > >>> Foundation. The code is already under Apache License 2.0.  Code and
> the
> > >>> documentation are hosted on Github.
> > >>> * Code: http://github.com/linkedin/pinot
> > >>> * Documentation: https://github.com/linkedin/pinot/wiki
> > >>>
> > >>>
> > >>> == Background ==
> > >>>
> > >>> LinkedIn, similar to other companies, has many applications that
> > provide
> > >>> rich real-time insights to members and customers (internal and
> > external).
> > >>> The workload characteristics for these applications vary a lot. Some
> > >>> internal applications simply need ad-hoc query capabilities with
> > >>> sub-second
> > >>> to multiple seconds latency. But external site facing applications
> > >>> require
> > >>> strong SLA even very high workloads. Prior to Pinot, LinkedIn had
> > >>> multiple
> > >>> solutions depending on the workload generated by the application and
> > this
> > >>> was inefficient. Pinot was developed to be the one single platform
> that
> > >>> addresses all classes of applications. Today at LinkedIn, Pinot
> powers
> > >>> more
> > >>> than 50 site facing products with workload ranging from few queries
> per
> > >>> second to 1000’s of queries per second while maintaining the 99th
> > >>> percentile latency which can be as low as few milliseconds. All
> > internal
> > >>> dashboards at LinkedIn are powered by Pinot.
> > >>>
> > >>> == Rationale ==
> > >>>
> > >>> We believe that requirement to develop rich real-time analytic
> > >>> applications
> > >>> is applicable to other organizations. Both Pinot and the interested
> > >>> communities would benefit from this work being openly available.
> > >>>
> > >>> == Current Status ==
> > >>>
> > >>> Pinot is currently open sourced under the Apache License Version 2.0
> > and
> > >>> available at github.com/linkedin/pinot. All the development is done
> > using
> > >>> GitHub Pull Requests. We cut releases on a weekly basis and deploy it
> > at
> 

Re: [DISCUSS] Apache Pinot Incubator Proposal

2018-02-13 Thread kishore g
Kevin,

Increasing the adoption of Pinot is one thing that can help build a good
diverse community. Few things that come to my mind
- Improve documentation
- Better integration with cloud providers
- Meetup and blog posts.

We would also love to get additional mentors from ASF to help us build the
community around Pinot.




On Tue, Feb 13, 2018 at 4:29 PM, Timothy Chen  wrote:

> Love to see this in the incubator as well. +1
>
> Tim
>
> On Tue, Feb 13, 2018 at 4:22 PM, Kevin A. McGrail
>  wrote:
> > Agreed.  It could use more mentors from ASF which I'm too overloaded to
> help
> > with but I'd be inclined to +1 this.  Do you have some thoughts on
> getting
> > more community people outside of LI and Uber to help?
> >
> > On 2/13/2018 7:07 PM, Dave Fisher wrote:
> >>
> >> Noir or Blanc? Gris or Grigio? What’s the vintage?
> >>
> >> All kidding aside this looks interesting.
> >>
> >> Regards,
> >> Dave
> >>
> >> Sent from my iPhone
> >>
> >>> On Feb 13, 2018, at 12:10 AM, kishore g  wrote:
> >>>
> >>> Hello,
> >>>
> >>> I would like to propose Pinot as an Apache Incubator project. The
> >>> proposal
> >>> is available as a draft at
> >>> https://wiki.apache.org/incubator/PinotProposal. I
> >>> have also included the text of the proposal below.
> >>>
> >>> Any feedback from the community is much appreciated.
> >>>
> >>> Regards,
> >>> Kishore G
> >>>
> >>> = Pinot Proposal =
> >>>
> >>> == Abstract ==
> >>>
> >>> Pinot is a distributed columnar storage engine that can ingest data in
> >>> real-time and serve analytical queries at low latency. There are two
> >>> modes
> >>> of data ingestion - batch and/or realtime. Batch mode allows users to
> >>> generate pinot segments externally using systems such as Hadoop. These
> >>> segments can be uploaded into Pinot via simple curl calls. Pinot can
> >>> ingest
> >>> data in near real-time from streaming sources such as Kafka. Data
> >>> ingested
> >>> into Pinot is stored in a columnar format. Pinot provides a SQL like
> >>> interface (PQL) that supports filters, aggregations, and group by
> >>> operations. It does not support joins by design, in order to guarantee
> >>> predictable latency. It leverages other Apache projects such as
> >>> Zookeeper,
> >>> Kafka, and Helix, along with many libraries from the ASF.
> >>>
> >>> == Proposal ==
> >>>
> >>> Pinot was open sourced by LinkedIn and hosted on GitHub. Majority of
> the
> >>> development happens at LinkedIn with other contributions from Uber and
> >>> Slack. We believe that being a part of Apache Software Foundation will
> >>> improve the diversity and help form a strong community around the
> >>> project.
> >>>
> >>> LinkedIn submits this proposal to donate the code base to Apache
> Software
> >>> Foundation. The code is already under Apache License 2.0.  Code and the
> >>> documentation are hosted on Github.
> >>> * Code: http://github.com/linkedin/pinot
> >>> * Documentation: https://github.com/linkedin/pinot/wiki
> >>>
> >>>
> >>> == Background ==
> >>>
> >>> LinkedIn, similar to other companies, has many applications that
> provide
> >>> rich real-time insights to members and customers (internal and
> external).
> >>> The workload characteristics for these applications vary a lot. Some
> >>> internal applications simply need ad-hoc query capabilities with
> >>> sub-second
> >>> to multiple seconds latency. But external site facing applications
> >>> require
> >>> strong SLA even very high workloads. Prior to Pinot, LinkedIn had
> >>> multiple
> >>> solutions depending on the workload generated by the application and
> this
> >>> was inefficient. Pinot was developed to be the one single platform that
> >>> addresses all classes of applications. Today at LinkedIn, Pinot powers
> >>> more
> >>> than 50 site facing products with workload ranging from few queries per
> >>> second to 1000’s of queries per second while maintaining the 99th
> >>> percentile latency which can be as low as few milliseconds. All
> internal
> >>> dashboards at LinkedIn are powered by Pinot.
> >>>
> >>> == Rationale ==
> >>>
> >>> We believe that requirement to develop rich real-time analytic
> >>> applications
> >>> is applicable to other organizations. Both Pinot and the interested
> >>> communities would benefit from this work being openly available.
> >>>
> >>> == Current Status ==
> >>>
> >>> Pinot is currently open sourced under the Apache License Version 2.0
> and
> >>> available at github.com/linkedin/pinot. All the development is done
> using
> >>> GitHub Pull Requests. We cut releases on a weekly basis and deploy it
> at
> >>> LinkedIn. mp-0.1.468 is the latest release tag that is deployed in
> >>> production.
> >>>
> >>> == Meritocracy ==
> >>>
> >>> Following the Apache meritocracy model, we intend to build an open and
> >>> diverse community around Pinot. We will encourage the community to
> >>> contribute to discussion and codebase.
> >>>
> >>> == Community ==
> >>>
> >>> Pinot is currently used extensively at LinkedIn 

Re: [DISCUSS] Apache Pinot Incubator Proposal

2018-02-13 Thread Timothy Chen
Love to see this in the incubator as well. +1

Tim

On Tue, Feb 13, 2018 at 4:22 PM, Kevin A. McGrail
 wrote:
> Agreed.  It could use more mentors from ASF which I'm too overloaded to help
> with but I'd be inclined to +1 this.  Do you have some thoughts on getting
> more community people outside of LI and Uber to help?
>
> On 2/13/2018 7:07 PM, Dave Fisher wrote:
>>
>> Noir or Blanc? Gris or Grigio? What’s the vintage?
>>
>> All kidding aside this looks interesting.
>>
>> Regards,
>> Dave
>>
>> Sent from my iPhone
>>
>>> On Feb 13, 2018, at 12:10 AM, kishore g  wrote:
>>>
>>> Hello,
>>>
>>> I would like to propose Pinot as an Apache Incubator project. The
>>> proposal
>>> is available as a draft at
>>> https://wiki.apache.org/incubator/PinotProposal. I
>>> have also included the text of the proposal below.
>>>
>>> Any feedback from the community is much appreciated.
>>>
>>> Regards,
>>> Kishore G
>>>
>>> = Pinot Proposal =
>>>
>>> == Abstract ==
>>>
>>> Pinot is a distributed columnar storage engine that can ingest data in
>>> real-time and serve analytical queries at low latency. There are two
>>> modes
>>> of data ingestion - batch and/or realtime. Batch mode allows users to
>>> generate pinot segments externally using systems such as Hadoop. These
>>> segments can be uploaded into Pinot via simple curl calls. Pinot can
>>> ingest
>>> data in near real-time from streaming sources such as Kafka. Data
>>> ingested
>>> into Pinot is stored in a columnar format. Pinot provides a SQL like
>>> interface (PQL) that supports filters, aggregations, and group by
>>> operations. It does not support joins by design, in order to guarantee
>>> predictable latency. It leverages other Apache projects such as
>>> Zookeeper,
>>> Kafka, and Helix, along with many libraries from the ASF.
>>>
>>> == Proposal ==
>>>
>>> Pinot was open sourced by LinkedIn and hosted on GitHub. Majority of the
>>> development happens at LinkedIn with other contributions from Uber and
>>> Slack. We believe that being a part of Apache Software Foundation will
>>> improve the diversity and help form a strong community around the
>>> project.
>>>
>>> LinkedIn submits this proposal to donate the code base to Apache Software
>>> Foundation. The code is already under Apache License 2.0.  Code and the
>>> documentation are hosted on Github.
>>> * Code: http://github.com/linkedin/pinot
>>> * Documentation: https://github.com/linkedin/pinot/wiki
>>>
>>>
>>> == Background ==
>>>
>>> LinkedIn, similar to other companies, has many applications that provide
>>> rich real-time insights to members and customers (internal and external).
>>> The workload characteristics for these applications vary a lot. Some
>>> internal applications simply need ad-hoc query capabilities with
>>> sub-second
>>> to multiple seconds latency. But external site facing applications
>>> require
>>> strong SLA even very high workloads. Prior to Pinot, LinkedIn had
>>> multiple
>>> solutions depending on the workload generated by the application and this
>>> was inefficient. Pinot was developed to be the one single platform that
>>> addresses all classes of applications. Today at LinkedIn, Pinot powers
>>> more
>>> than 50 site facing products with workload ranging from few queries per
>>> second to 1000’s of queries per second while maintaining the 99th
>>> percentile latency which can be as low as few milliseconds. All internal
>>> dashboards at LinkedIn are powered by Pinot.
>>>
>>> == Rationale ==
>>>
>>> We believe that requirement to develop rich real-time analytic
>>> applications
>>> is applicable to other organizations. Both Pinot and the interested
>>> communities would benefit from this work being openly available.
>>>
>>> == Current Status ==
>>>
>>> Pinot is currently open sourced under the Apache License Version 2.0 and
>>> available at github.com/linkedin/pinot. All the development is done using
>>> GitHub Pull Requests. We cut releases on a weekly basis and deploy it at
>>> LinkedIn. mp-0.1.468 is the latest release tag that is deployed in
>>> production.
>>>
>>> == Meritocracy ==
>>>
>>> Following the Apache meritocracy model, we intend to build an open and
>>> diverse community around Pinot. We will encourage the community to
>>> contribute to discussion and codebase.
>>>
>>> == Community ==
>>>
>>> Pinot is currently used extensively at LinkedIn and Uber. Several
>>> companies
>>> have expressed interest in the project. We hope to extend the contributor
>>> base significantly by bringing Pinot into Apache.
>>>
>>> == Core Developers ==
>>>
>>> Pinot was started by engineers at LinkedIn, and now has committers from
>>> Uber.
>>>
>>> == Alignment ==
>>>
>>> Apache is the most natural home for taking Pinot forward. Pinot leverages
>>> several existing Apache Projects such as Kafka, Helix, Zookeeper, and
>>> Avro.
>>> As Pinot gains adoption, we plan to add support for the ORC and Parquet
>>> formats, as well as adding integration with Yarn and Mesos.
>>>
>>> == Known R

Re: [DISCUSS] Apache Pinot Incubator Proposal

2018-02-13 Thread Kevin A. McGrail
Agreed.  It could use more mentors from ASF which I'm too overloaded to 
help with but I'd be inclined to +1 this.  Do you have some thoughts on 
getting more community people outside of LI and Uber to help?


On 2/13/2018 7:07 PM, Dave Fisher wrote:

Noir or Blanc? Gris or Grigio? What’s the vintage?

All kidding aside this looks interesting.

Regards,
Dave

Sent from my iPhone


On Feb 13, 2018, at 12:10 AM, kishore g  wrote:

Hello,

I would like to propose Pinot as an Apache Incubator project. The proposal
is available as a draft at https://wiki.apache.org/incubator/PinotProposal. I
have also included the text of the proposal below.

Any feedback from the community is much appreciated.

Regards,
Kishore G

= Pinot Proposal =

== Abstract ==

Pinot is a distributed columnar storage engine that can ingest data in
real-time and serve analytical queries at low latency. There are two modes
of data ingestion - batch and/or realtime. Batch mode allows users to
generate pinot segments externally using systems such as Hadoop. These
segments can be uploaded into Pinot via simple curl calls. Pinot can ingest
data in near real-time from streaming sources such as Kafka. Data ingested
into Pinot is stored in a columnar format. Pinot provides a SQL like
interface (PQL) that supports filters, aggregations, and group by
operations. It does not support joins by design, in order to guarantee
predictable latency. It leverages other Apache projects such as Zookeeper,
Kafka, and Helix, along with many libraries from the ASF.

== Proposal ==

Pinot was open sourced by LinkedIn and hosted on GitHub. Majority of the
development happens at LinkedIn with other contributions from Uber and
Slack. We believe that being a part of Apache Software Foundation will
improve the diversity and help form a strong community around the project.

LinkedIn submits this proposal to donate the code base to Apache Software
Foundation. The code is already under Apache License 2.0.  Code and the
documentation are hosted on Github.
* Code: http://github.com/linkedin/pinot
* Documentation: https://github.com/linkedin/pinot/wiki


== Background ==

LinkedIn, similar to other companies, has many applications that provide
rich real-time insights to members and customers (internal and external).
The workload characteristics for these applications vary a lot. Some
internal applications simply need ad-hoc query capabilities with sub-second
to multiple seconds latency. But external site facing applications require
strong SLA even very high workloads. Prior to Pinot, LinkedIn had multiple
solutions depending on the workload generated by the application and this
was inefficient. Pinot was developed to be the one single platform that
addresses all classes of applications. Today at LinkedIn, Pinot powers more
than 50 site facing products with workload ranging from few queries per
second to 1000’s of queries per second while maintaining the 99th
percentile latency which can be as low as few milliseconds. All internal
dashboards at LinkedIn are powered by Pinot.

== Rationale ==

We believe that requirement to develop rich real-time analytic applications
is applicable to other organizations. Both Pinot and the interested
communities would benefit from this work being openly available.

== Current Status ==

Pinot is currently open sourced under the Apache License Version 2.0 and
available at github.com/linkedin/pinot. All the development is done using
GitHub Pull Requests. We cut releases on a weekly basis and deploy it at
LinkedIn. mp-0.1.468 is the latest release tag that is deployed in
production.

== Meritocracy ==

Following the Apache meritocracy model, we intend to build an open and
diverse community around Pinot. We will encourage the community to
contribute to discussion and codebase.

== Community ==

Pinot is currently used extensively at LinkedIn and Uber. Several companies
have expressed interest in the project. We hope to extend the contributor
base significantly by bringing Pinot into Apache.

== Core Developers ==

Pinot was started by engineers at LinkedIn, and now has committers from
Uber.

== Alignment ==

Apache is the most natural home for taking Pinot forward. Pinot leverages
several existing Apache Projects such as Kafka, Helix, Zookeeper, and Avro.
As Pinot gains adoption, we plan to add support for the ORC and Parquet
formats, as well as adding integration with Yarn and Mesos.

== Known Risks ==

=== Orphaned Products ===

The risk of the Pinot project being abandoned is minimal. The teams at
LinkedIn and Uber are highly incentivized to continue development of Pinot
as it is a critical part of their infrastructure.

=== Inexperience with Open Source ===

Post open sourcing, Pinot was completely developed on GitHub. All the
current developers on Pinot are well aware of the open source development
process. However, most of the developers are new to the Apache process.
Kishore Gopalakrishna, one of the lead developers in Pinot, is VP and
comm

Re: [DISCUSS] Apache Pinot Incubator Proposal

2018-02-13 Thread Dave Fisher
Noir or Blanc? Gris or Grigio? What’s the vintage?

All kidding aside this looks interesting.

Regards,
Dave

Sent from my iPhone

> On Feb 13, 2018, at 12:10 AM, kishore g  wrote:
> 
> Hello,
> 
> I would like to propose Pinot as an Apache Incubator project. The proposal
> is available as a draft at https://wiki.apache.org/incubator/PinotProposal. I
> have also included the text of the proposal below.
> 
> Any feedback from the community is much appreciated.
> 
> Regards,
> Kishore G
> 
> = Pinot Proposal =
> 
> == Abstract ==
> 
> Pinot is a distributed columnar storage engine that can ingest data in
> real-time and serve analytical queries at low latency. There are two modes
> of data ingestion - batch and/or realtime. Batch mode allows users to
> generate pinot segments externally using systems such as Hadoop. These
> segments can be uploaded into Pinot via simple curl calls. Pinot can ingest
> data in near real-time from streaming sources such as Kafka. Data ingested
> into Pinot is stored in a columnar format. Pinot provides a SQL like
> interface (PQL) that supports filters, aggregations, and group by
> operations. It does not support joins by design, in order to guarantee
> predictable latency. It leverages other Apache projects such as Zookeeper,
> Kafka, and Helix, along with many libraries from the ASF.
> 
> == Proposal ==
> 
> Pinot was open sourced by LinkedIn and hosted on GitHub. Majority of the
> development happens at LinkedIn with other contributions from Uber and
> Slack. We believe that being a part of Apache Software Foundation will
> improve the diversity and help form a strong community around the project.
> 
> LinkedIn submits this proposal to donate the code base to Apache Software
> Foundation. The code is already under Apache License 2.0.  Code and the
> documentation are hosted on Github.
> * Code: http://github.com/linkedin/pinot
> * Documentation: https://github.com/linkedin/pinot/wiki
> 
> 
> == Background ==
> 
> LinkedIn, similar to other companies, has many applications that provide
> rich real-time insights to members and customers (internal and external).
> The workload characteristics for these applications vary a lot. Some
> internal applications simply need ad-hoc query capabilities with sub-second
> to multiple seconds latency. But external site facing applications require
> strong SLA even very high workloads. Prior to Pinot, LinkedIn had multiple
> solutions depending on the workload generated by the application and this
> was inefficient. Pinot was developed to be the one single platform that
> addresses all classes of applications. Today at LinkedIn, Pinot powers more
> than 50 site facing products with workload ranging from few queries per
> second to 1000’s of queries per second while maintaining the 99th
> percentile latency which can be as low as few milliseconds. All internal
> dashboards at LinkedIn are powered by Pinot.
> 
> == Rationale ==
> 
> We believe that requirement to develop rich real-time analytic applications
> is applicable to other organizations. Both Pinot and the interested
> communities would benefit from this work being openly available.
> 
> == Current Status ==
> 
> Pinot is currently open sourced under the Apache License Version 2.0 and
> available at github.com/linkedin/pinot. All the development is done using
> GitHub Pull Requests. We cut releases on a weekly basis and deploy it at
> LinkedIn. mp-0.1.468 is the latest release tag that is deployed in
> production.
> 
> == Meritocracy ==
> 
> Following the Apache meritocracy model, we intend to build an open and
> diverse community around Pinot. We will encourage the community to
> contribute to discussion and codebase.
> 
> == Community ==
> 
> Pinot is currently used extensively at LinkedIn and Uber. Several companies
> have expressed interest in the project. We hope to extend the contributor
> base significantly by bringing Pinot into Apache.
> 
> == Core Developers ==
> 
> Pinot was started by engineers at LinkedIn, and now has committers from
> Uber.
> 
> == Alignment ==
> 
> Apache is the most natural home for taking Pinot forward. Pinot leverages
> several existing Apache Projects such as Kafka, Helix, Zookeeper, and Avro.
> As Pinot gains adoption, we plan to add support for the ORC and Parquet
> formats, as well as adding integration with Yarn and Mesos.
> 
> == Known Risks ==
> 
> === Orphaned Products ===
> 
> The risk of the Pinot project being abandoned is minimal. The teams at
> LinkedIn and Uber are highly incentivized to continue development of Pinot
> as it is a critical part of their infrastructure.
> 
> === Inexperience with Open Source ===
> 
> Post open sourcing, Pinot was completely developed on GitHub. All the
> current developers on Pinot are well aware of the open source development
> process. However, most of the developers are new to the Apache process.
> Kishore Gopalakrishna, one of the lead developers in Pinot, is VP and
> committer of the Apache Helix pro

Re: [DISCUSS] Apache Pinot Incubator Proposal

2018-02-13 Thread Abhishek Tiwari
Pinot is already quite popular and I think it will be an awesome addition
under the Apache umbrella.

+1 (non-binding)

On Tue, Feb 13, 2018 at 12:10 AM, kishore g  wrote:

> Hello,
>
> I would like to propose Pinot as an Apache Incubator project. The proposal
> is available as a draft at https://wiki.apache.org/incubator/PinotProposal.
> I
> have also included the text of the proposal below.
>
> Any feedback from the community is much appreciated.
>
> Regards,
> Kishore G
>
> = Pinot Proposal =
>
> == Abstract ==
>
> Pinot is a distributed columnar storage engine that can ingest data in
> real-time and serve analytical queries at low latency. There are two modes
> of data ingestion - batch and/or realtime. Batch mode allows users to
> generate pinot segments externally using systems such as Hadoop. These
> segments can be uploaded into Pinot via simple curl calls. Pinot can ingest
> data in near real-time from streaming sources such as Kafka. Data ingested
> into Pinot is stored in a columnar format. Pinot provides a SQL like
> interface (PQL) that supports filters, aggregations, and group by
> operations. It does not support joins by design, in order to guarantee
> predictable latency. It leverages other Apache projects such as Zookeeper,
> Kafka, and Helix, along with many libraries from the ASF.
>
> == Proposal ==
>
> Pinot was open sourced by LinkedIn and hosted on GitHub. Majority of the
> development happens at LinkedIn with other contributions from Uber and
> Slack. We believe that being a part of Apache Software Foundation will
> improve the diversity and help form a strong community around the project.
>
> LinkedIn submits this proposal to donate the code base to Apache Software
> Foundation. The code is already under Apache License 2.0.  Code and the
> documentation are hosted on Github.
>  * Code: http://github.com/linkedin/pinot
>  * Documentation: https://github.com/linkedin/pinot/wiki
>
>
> == Background ==
>
> LinkedIn, similar to other companies, has many applications that provide
> rich real-time insights to members and customers (internal and external).
> The workload characteristics for these applications vary a lot. Some
> internal applications simply need ad-hoc query capabilities with sub-second
> to multiple seconds latency. But external site facing applications require
> strong SLA even very high workloads. Prior to Pinot, LinkedIn had multiple
> solutions depending on the workload generated by the application and this
> was inefficient. Pinot was developed to be the one single platform that
> addresses all classes of applications. Today at LinkedIn, Pinot powers more
> than 50 site facing products with workload ranging from few queries per
> second to 1000’s of queries per second while maintaining the 99th
> percentile latency which can be as low as few milliseconds. All internal
> dashboards at LinkedIn are powered by Pinot.
>
> == Rationale ==
>
> We believe that requirement to develop rich real-time analytic applications
> is applicable to other organizations. Both Pinot and the interested
> communities would benefit from this work being openly available.
>
> == Current Status ==
>
> Pinot is currently open sourced under the Apache License Version 2.0 and
> available at github.com/linkedin/pinot. All the development is done using
> GitHub Pull Requests. We cut releases on a weekly basis and deploy it at
> LinkedIn. mp-0.1.468 is the latest release tag that is deployed in
> production.
>
> == Meritocracy ==
>
> Following the Apache meritocracy model, we intend to build an open and
> diverse community around Pinot. We will encourage the community to
> contribute to discussion and codebase.
>
> == Community ==
>
> Pinot is currently used extensively at LinkedIn and Uber. Several companies
> have expressed interest in the project. We hope to extend the contributor
> base significantly by bringing Pinot into Apache.
>
> == Core Developers ==
>
> Pinot was started by engineers at LinkedIn, and now has committers from
> Uber.
>
> == Alignment ==
>
> Apache is the most natural home for taking Pinot forward. Pinot leverages
> several existing Apache Projects such as Kafka, Helix, Zookeeper, and Avro.
> As Pinot gains adoption, we plan to add support for the ORC and Parquet
> formats, as well as adding integration with Yarn and Mesos.
>
> == Known Risks ==
>
> === Orphaned Products ===
>
> The risk of the Pinot project being abandoned is minimal. The teams at
> LinkedIn and Uber are highly incentivized to continue development of Pinot
> as it is a critical part of their infrastructure.
>
> === Inexperience with Open Source ===
>
> Post open sourcing, Pinot was completely developed on GitHub. All the
> current developers on Pinot are well aware of the open source development
> process. However, most of the developers are new to the Apache process.
> Kishore Gopalakrishna, one of the lead developers in Pinot, is VP and
> committer of the Apache Helix project.
>
> === Homogenous Developers 

[DISCUSS] Apache Pinot Incubator Proposal

2018-02-13 Thread kishore g
Hello,

I would like to propose Pinot as an Apache Incubator project. The proposal
is available as a draft at https://wiki.apache.org/incubator/PinotProposal. I
have also included the text of the proposal below.

Any feedback from the community is much appreciated.

Regards,
Kishore G

= Pinot Proposal =

== Abstract ==

Pinot is a distributed columnar storage engine that can ingest data in
real-time and serve analytical queries at low latency. There are two modes
of data ingestion - batch and/or realtime. Batch mode allows users to
generate pinot segments externally using systems such as Hadoop. These
segments can be uploaded into Pinot via simple curl calls. Pinot can ingest
data in near real-time from streaming sources such as Kafka. Data ingested
into Pinot is stored in a columnar format. Pinot provides a SQL like
interface (PQL) that supports filters, aggregations, and group by
operations. It does not support joins by design, in order to guarantee
predictable latency. It leverages other Apache projects such as Zookeeper,
Kafka, and Helix, along with many libraries from the ASF.

== Proposal ==

Pinot was open sourced by LinkedIn and hosted on GitHub. Majority of the
development happens at LinkedIn with other contributions from Uber and
Slack. We believe that being a part of Apache Software Foundation will
improve the diversity and help form a strong community around the project.

LinkedIn submits this proposal to donate the code base to Apache Software
Foundation. The code is already under Apache License 2.0.  Code and the
documentation are hosted on Github.
 * Code: http://github.com/linkedin/pinot
 * Documentation: https://github.com/linkedin/pinot/wiki


== Background ==

LinkedIn, similar to other companies, has many applications that provide
rich real-time insights to members and customers (internal and external).
The workload characteristics for these applications vary a lot. Some
internal applications simply need ad-hoc query capabilities with sub-second
to multiple seconds latency. But external site facing applications require
strong SLA even very high workloads. Prior to Pinot, LinkedIn had multiple
solutions depending on the workload generated by the application and this
was inefficient. Pinot was developed to be the one single platform that
addresses all classes of applications. Today at LinkedIn, Pinot powers more
than 50 site facing products with workload ranging from few queries per
second to 1000’s of queries per second while maintaining the 99th
percentile latency which can be as low as few milliseconds. All internal
dashboards at LinkedIn are powered by Pinot.

== Rationale ==

We believe that requirement to develop rich real-time analytic applications
is applicable to other organizations. Both Pinot and the interested
communities would benefit from this work being openly available.

== Current Status ==

Pinot is currently open sourced under the Apache License Version 2.0 and
available at github.com/linkedin/pinot. All the development is done using
GitHub Pull Requests. We cut releases on a weekly basis and deploy it at
LinkedIn. mp-0.1.468 is the latest release tag that is deployed in
production.

== Meritocracy ==

Following the Apache meritocracy model, we intend to build an open and
diverse community around Pinot. We will encourage the community to
contribute to discussion and codebase.

== Community ==

Pinot is currently used extensively at LinkedIn and Uber. Several companies
have expressed interest in the project. We hope to extend the contributor
base significantly by bringing Pinot into Apache.

== Core Developers ==

Pinot was started by engineers at LinkedIn, and now has committers from
Uber.

== Alignment ==

Apache is the most natural home for taking Pinot forward. Pinot leverages
several existing Apache Projects such as Kafka, Helix, Zookeeper, and Avro.
As Pinot gains adoption, we plan to add support for the ORC and Parquet
formats, as well as adding integration with Yarn and Mesos.

== Known Risks ==

=== Orphaned Products ===

The risk of the Pinot project being abandoned is minimal. The teams at
LinkedIn and Uber are highly incentivized to continue development of Pinot
as it is a critical part of their infrastructure.

=== Inexperience with Open Source ===

Post open sourcing, Pinot was completely developed on GitHub. All the
current developers on Pinot are well aware of the open source development
process. However, most of the developers are new to the Apache process.
Kishore Gopalakrishna, one of the lead developers in Pinot, is VP and
committer of the Apache Helix project.

=== Homogenous Developers ===

The current core developers are all from LinkedIn and Uber. However, we
hope to establish a developer community that includes contributors from
several corporations and we are actively encouraging new contributors via
the mailing lists and public presentations of Pinot.

=== Reliance on Salaried Developers ===

It is expected that Pinot development will occur on