[PROPOSAL] Apache Spark for the Incubator

2013-05-31 Thread Mattmann, Chris A (398J)
Hi Folks,

I'm pleased to bring you a proposal to the Apache Incubator for the Apache
Spark project: https://wiki.apache.org/incubator/SparkProposal

The work originates from the Berkeley AMPLab and through a number of
industry
participants, and other institutions. Spark is a framework for large-scale
data 
analysis on clusters, with a particular focus on low latency operations.
The
source code is written in Scala, and provides a number of APIs and bindings
in various programming languages.

The proposal text is copied to the bottom of this email. I'm going to leave
this thread open for the next week for discussion. Once it's died down,
I'll
call an official VOTE.

Suresh, Ross G. -- heads up -- this project may be of interest to you both
and would welcome you guys as additional mentors. We currently have 3
mentors
committed to the project, but would love to have more. People interested in
contributing should declare their interest here on the general@incubator
thread
and those potential contributors will be discussed by the incoming Spark
community.

Questions -- let's hear em'! :)

Cheers,
Chris
("Champion", incoming Apache Spark)

=== Abstract ===
Spark is an open source system for large-scale data analysis on clusters.

=== Proposal ===
Spark is an open source system for fast and flexible large-scale data
analysis. Spark provides a general purpose runtime that supports
low-latency execution in several forms. These include interactive
exploration of very large datasets, near real-time stream processing, and
ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
with HDFS, HBase, Cassandra and several other storage storage layers, and
exposes APIs in Scala, Java and Python.
Background
Spark started as U.C. Berkeley research project, designed to efficiently
run machine learning algorithms on large datasets. Over time, it has
evolved into a general computing engine as outlined above. Spark¹s
developer community has also grown to include additional institutions,
such as universities, research labs, and corporations. Funding has been
provided by various institutions including the U.S. National Science
Foundation, DARPA, and a number of industry sponsors. See:
https://amplab.cs.berkeley.edu/sponsors/ for full details.

=== Rationale ===
As the number of contributors to Spark has grown, we have sought for a
long-term home for the project, and we believe the Apache foundation would
be a great fit. Spark is a natural fit for the Apache foundation: Spark
already interoperates with several existing Apache projects (HDFS, HBase,
Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar
with the Apache process and and subscribes to the Apache mission - the
team includes multiple Apache committers already. Finally, joining Apache
will help coordinate the development effort of the growing number of
organizations which contribute to Spark.

== Initial Goals ==
The initial goals will most likely be to move the existing codebase to
Apache and integrate with the Apache development process. Furthermore, we
plan for incremental development, and releases along with the Apache
guidelines.

=== Current Status ===
== Meritocracy ==
The Spark project already operates on meritocratic principles. Today,
Spark has several developers and has accepted multiple major patches from
outside of U.C. Berkeley. While this process has remained mostly informal
(we do not have an official committer list), an implicit organization
exists in which individuals who contribute major components act as
maintainers for those modules. If accepted, the Spark project would
include several of these participants as committers from the onset. We
will work to identify all committers and PPMC members for the project and
to operate under the ASF meritocratic principles.

=== Community ===
Acceptance into the Apache foundation would bolster the already strong
user and developer community around Spark. That community includes dozens
of contributors from several institutions, a meetup group with several
hundred members, and an active mailing list composed of hundreds of users.
Core Developers
The core developers of our project are listed in our contributors and
initial PPMC below. Though many exist at UC Berkeley, there is a
representative cross sampling of other organizations including Quantifind,
Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends.


=== Alignment ===
Our proposed effort aligns with several ongoing BIGDATA and U.S. National
priority funding interests including the NSF and its Expeditions program,
and the DARPA XDATA project. Our industry partners and collaborators are
well aligned with our code base.

There are also a number of related Apache projects and dependencies, that
will be mentioned in the Relationships with Other Apache products section.

== Known Risks ==

=== Orphaned Products ===
Given the current level of investment in Spark - the risk of the project
being abandoned is minimal. 

[PROPOSAL] Apache Spark for the Incubator

2013-05-31 Thread Henry Saputra
I believe it is more of a framework but you can take a look at Shark which
using Spark to do data warehousing that support hive query (
http://shark.cs.berkeley.edu)

- Henry

On Friday, May 31, 2013, Chen, Pei wrote:

> +1 (non-binding)
> This seems like a really interesting project.
> Q- Is Spark just a framework/API or does it also have some tools
> implemented for data analytics?
> --Pei
>
> > -Original Message-
> > From: Mattmann, Chris A (398J) [mailto:chris.a.mattm...@jpl.nasa.gov]
> > Sent: Friday, May 31, 2013 2:04 PM
> > To: general@incubator.apache.org
> > Subject: [PROPOSAL] Apache Spark for the Incubator
> >
> > Hi Folks,
> >
> > I'm pleased to bring you a proposal to the Apache Incubator for the
> Apache
> > Spark project: https://wiki.apache.org/incubator/SparkProposal
> >
> > The work originates from the Berkeley AMPLab and through a number of
> > industry participants, and other institutions. Spark is a framework for
> large-
> > scale data analysis on clusters, with a particular focus on low latency
> > operations.
> > The
> > source code is written in Scala, and provides a number of APIs and
> bindings in
> > various programming languages.
> >
> > The proposal text is copied to the bottom of this email. I'm going to
> leave this
> > thread open for the next week for discussion. Once it's died down, I'll
> call an
> > official VOTE.
> >
> > Suresh, Ross G. -- heads up -- this project may be of interest to you
> both and
> > would welcome you guys as additional mentors. We currently have 3
> > mentors committed to the project, but would love to have more. People
> > interested in contributing should declare their interest here on the
> > general@incubator thread and those potential contributors will be
> discussed
> > by the incoming Spark community.
> >
> > Questions -- let's hear em'! :)
> >
> > Cheers,
> > Chris
> > ("Champion", incoming Apache Spark)
> >
> > === Abstract ===
> > Spark is an open source system for large-scale data analysis on clusters.
> >
> > === Proposal ===
> > Spark is an open source system for fast and flexible large-scale data
> analysis.
> > Spark provides a general purpose runtime that supports low-latency
> > execution in several forms. These include interactive exploration of very
> > large datasets, near real-time stream processing, and ad-hoc SQL
> analytics
> > (through higher layer extensions). Spark interfaces with HDFS, HBase,
> > Cassandra and several other storage storage layers, and exposes APIs in
> > Scala, Java and Python.
> > Background
> > Spark started as U.C. Berkeley research project, designed to efficiently
> run
> > machine learning algorithms on large datasets. Over time, it has evolved
> into
> > a general computing engine as outlined above. Spark¹s developer community
> > has also grown to include additional institutions, such as universities,
> > research labs, and corporations. Funding has been provided by various
> > institutions including the U.S. National Science Foundation, DARPA, and a
> > number of industry sponsors. See:
> > https://amplab.cs.berkeley.edu/sponsors/ for full details.
> >
> > === Rationale ===
> > As the number of contributors to Spark has grown, we have sought for a
> > long-term home for the project, and we believe the Apache foundation
> > would be a great fit. Spark is a natural fit for the Apache foundation:
> Spark
> > already interoperates with several existing Apache projects (HDFS, HBase,
> > Hive, Cassandra, Avro and Flume to name a few). The Spark team is
> familiar
> > with the Apache process and and subscribes to the Apache mission - the
> > team includes multiple Apache committers already. Finally, joining Apache
> > will help coordinate the development effort of the growing number of
> > organizations which contribute to Spark.
> >
> > == Initial Goals ==
> > The initial goals will most likely be to move the existing codebase to
> Apache
> > and integrate with the Apache development process. Furthermore, we plan
> > for incremental development, and releases along with the Apache
> > guidelines.
> >
> > === Current Status ===
> > == Meritocracy ==
> > The Spark project already operates on meritocratic principles. Today,
> Spark
> > has several developers and has accepted multiple major patches from
> > outside of U.C. Berkeley. While this process has remained mostly informal
> > (we do not have an official committer l

Re: [PROPOSAL] Apache Spark for the Incubator

2013-05-31 Thread Mattmann, Chris A (398J)
Guys, I've added: Thomas Dudziak as a mentor to the proposal
at his request. He is a member of the ASF and should be granted
IPMC access soon.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: , jpluser 
Reply-To: "general@incubator.apache.org" 
Date: Friday, May 31, 2013 11:03 AM
To: "general@incubator.apache.org" 
Subject: [PROPOSAL] Apache Spark for the Incubator

>Hi Folks,
>
>I'm pleased to bring you a proposal to the Apache Incubator for the Apache
>Spark project: https://wiki.apache.org/incubator/SparkProposal
>
>The work originates from the Berkeley AMPLab and through a number of
>industry
>participants, and other institutions. Spark is a framework for large-scale
>data 
>analysis on clusters, with a particular focus on low latency operations.
>The
>source code is written in Scala, and provides a number of APIs and
>bindings
>in various programming languages.
>
>The proposal text is copied to the bottom of this email. I'm going to
>leave
>this thread open for the next week for discussion. Once it's died down,
>I'll
>call an official VOTE.
>
>Suresh, Ross G. -- heads up -- this project may be of interest to you both
>and would welcome you guys as additional mentors. We currently have 3
>mentors
>committed to the project, but would love to have more. People interested
>in
>contributing should declare their interest here on the general@incubator
>thread
>and those potential contributors will be discussed by the incoming Spark
>community.
>
>Questions -- let's hear em'! :)
>
>Cheers,
>Chris
>("Champion", incoming Apache Spark)
>
>=== Abstract ===
>Spark is an open source system for large-scale data analysis on clusters.
>
>=== Proposal ===
>Spark is an open source system for fast and flexible large-scale data
>analysis. Spark provides a general purpose runtime that supports
>low-latency execution in several forms. These include interactive
>exploration of very large datasets, near real-time stream processing, and
>ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
>with HDFS, HBase, Cassandra and several other storage storage layers, and
>exposes APIs in Scala, Java and Python.
>Background
>Spark started as U.C. Berkeley research project, designed to efficiently
>run machine learning algorithms on large datasets. Over time, it has
>evolved into a general computing engine as outlined above. Spark¹s
>developer community has also grown to include additional institutions,
>such as universities, research labs, and corporations. Funding has been
>provided by various institutions including the U.S. National Science
>Foundation, DARPA, and a number of industry sponsors. See:
>https://amplab.cs.berkeley.edu/sponsors/ for full details.
>
>=== Rationale ===
>As the number of contributors to Spark has grown, we have sought for a
>long-term home for the project, and we believe the Apache foundation would
>be a great fit. Spark is a natural fit for the Apache foundation: Spark
>already interoperates with several existing Apache projects (HDFS, HBase,
>Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar
>with the Apache process and and subscribes to the Apache mission - the
>team includes multiple Apache committers already. Finally, joining Apache
>will help coordinate the development effort of the growing number of
>organizations which contribute to Spark.
>
>== Initial Goals ==
>The initial goals will most likely be to move the existing codebase to
>Apache and integrate with the Apache development process. Furthermore, we
>plan for incremental development, and releases along with the Apache
>guidelines.
>
>=== Current Status ===
>== Meritocracy ==
>The Spark project already operates on meritocratic principles. Today,
>Spark has several developers and has accepted multiple major patches from
>outside of U.C. Berkeley. While this process has remained mostly informal
>(we do not have an official committer list), an implicit organization
>exists in which individuals who contribute major components act as
>maintainers for those modules. If accepted, the Spark project would
>include several of these participants as committers from the ons

RE: [PROPOSAL] Apache Spark for the Incubator

2013-05-31 Thread Chen, Pei
+1 (non-binding)
This seems like a really interesting project.  
Q- Is Spark just a framework/API or does it also have some tools implemented 
for data analytics?
--Pei

> -Original Message-
> From: Mattmann, Chris A (398J) [mailto:chris.a.mattm...@jpl.nasa.gov]
> Sent: Friday, May 31, 2013 2:04 PM
> To: general@incubator.apache.org
> Subject: [PROPOSAL] Apache Spark for the Incubator
> 
> Hi Folks,
> 
> I'm pleased to bring you a proposal to the Apache Incubator for the Apache
> Spark project: https://wiki.apache.org/incubator/SparkProposal
> 
> The work originates from the Berkeley AMPLab and through a number of
> industry participants, and other institutions. Spark is a framework for large-
> scale data analysis on clusters, with a particular focus on low latency
> operations.
> The
> source code is written in Scala, and provides a number of APIs and bindings in
> various programming languages.
> 
> The proposal text is copied to the bottom of this email. I'm going to leave 
> this
> thread open for the next week for discussion. Once it's died down, I'll call 
> an
> official VOTE.
> 
> Suresh, Ross G. -- heads up -- this project may be of interest to you both and
> would welcome you guys as additional mentors. We currently have 3
> mentors committed to the project, but would love to have more. People
> interested in contributing should declare their interest here on the
> general@incubator thread and those potential contributors will be discussed
> by the incoming Spark community.
> 
> Questions -- let's hear em'! :)
> 
> Cheers,
> Chris
> ("Champion", incoming Apache Spark)
> 
> === Abstract ===
> Spark is an open source system for large-scale data analysis on clusters.
> 
> === Proposal ===
> Spark is an open source system for fast and flexible large-scale data 
> analysis.
> Spark provides a general purpose runtime that supports low-latency
> execution in several forms. These include interactive exploration of very
> large datasets, near real-time stream processing, and ad-hoc SQL analytics
> (through higher layer extensions). Spark interfaces with HDFS, HBase,
> Cassandra and several other storage storage layers, and exposes APIs in
> Scala, Java and Python.
> Background
> Spark started as U.C. Berkeley research project, designed to efficiently run
> machine learning algorithms on large datasets. Over time, it has evolved into
> a general computing engine as outlined above. Spark¹s developer community
> has also grown to include additional institutions, such as universities,
> research labs, and corporations. Funding has been provided by various
> institutions including the U.S. National Science Foundation, DARPA, and a
> number of industry sponsors. See:
> https://amplab.cs.berkeley.edu/sponsors/ for full details.
> 
> === Rationale ===
> As the number of contributors to Spark has grown, we have sought for a
> long-term home for the project, and we believe the Apache foundation
> would be a great fit. Spark is a natural fit for the Apache foundation: Spark
> already interoperates with several existing Apache projects (HDFS, HBase,
> Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar
> with the Apache process and and subscribes to the Apache mission - the
> team includes multiple Apache committers already. Finally, joining Apache
> will help coordinate the development effort of the growing number of
> organizations which contribute to Spark.
> 
> == Initial Goals ==
> The initial goals will most likely be to move the existing codebase to Apache
> and integrate with the Apache development process. Furthermore, we plan
> for incremental development, and releases along with the Apache
> guidelines.
> 
> === Current Status ===
> == Meritocracy ==
> The Spark project already operates on meritocratic principles. Today, Spark
> has several developers and has accepted multiple major patches from
> outside of U.C. Berkeley. While this process has remained mostly informal
> (we do not have an official committer list), an implicit organization exists 
> in
> which individuals who contribute major components act as maintainers for
> those modules. If accepted, the Spark project would include several of these
> participants as committers from the onset. We will work to identify all
> committers and PPMC members for the project and to operate under the
> ASF meritocratic principles.
> 
> === Community ===
> Acceptance into the Apache foundation would bolster the already strong
> user and developer community around Spark. That community includes
> dozens of contributors from several institutions, a meetup group with
> several hundred members, and an active mail

Re: [PROPOSAL] Apache Spark for the Incubator

2013-05-31 Thread Konstantin Boudnik
Great news!

Definitely +1 (non-binding, I guess) on adding Spark to the family
of ASF project!

I also express the interest to contribute to the project and move it forward
to the graduation! Bigtop has been packaging and providing Spark as a part of
Hadoop 1.x software stacks for some time; and hopefully would be able to offer
it as a part of Hadoop 2.x line in the coming days.

Dr. Konstantin Boudnik
  Hadoop committer
  BigTop PMC

On Fri, May 31, 2013 at 06:03PM, Mattmann, Chris A (398J) wrote:
> Hi Folks,
> 
> I'm pleased to bring you a proposal to the Apache Incubator for the Apache
> Spark project: https://wiki.apache.org/incubator/SparkProposal
> 
> The work originates from the Berkeley AMPLab and through a number of
> industry
> participants, and other institutions. Spark is a framework for large-scale
> data 
> analysis on clusters, with a particular focus on low latency operations.
> The
> source code is written in Scala, and provides a number of APIs and bindings
> in various programming languages.
> 
> The proposal text is copied to the bottom of this email. I'm going to leave
> this thread open for the next week for discussion. Once it's died down,
> I'll
> call an official VOTE.
> 
> Suresh, Ross G. -- heads up -- this project may be of interest to you both
> and would welcome you guys as additional mentors. We currently have 3
> mentors
> committed to the project, but would love to have more. People interested in
> contributing should declare their interest here on the general@incubator
> thread
> and those potential contributors will be discussed by the incoming Spark
> community.
> 
> Questions -- let's hear em'! :)
> 
> Cheers,
> Chris
> ("Champion", incoming Apache Spark)
> 
> === Abstract ===
> Spark is an open source system for large-scale data analysis on clusters.
> 
> === Proposal ===
> Spark is an open source system for fast and flexible large-scale data
> analysis. Spark provides a general purpose runtime that supports
> low-latency execution in several forms. These include interactive
> exploration of very large datasets, near real-time stream processing, and
> ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
> with HDFS, HBase, Cassandra and several other storage storage layers, and
> exposes APIs in Scala, Java and Python.
> Background
> Spark started as U.C. Berkeley research project, designed to efficiently
> run machine learning algorithms on large datasets. Over time, it has
> evolved into a general computing engine as outlined above. Spark╧s
> developer community has also grown to include additional institutions,
> such as universities, research labs, and corporations. Funding has been
> provided by various institutions including the U.S. National Science
> Foundation, DARPA, and a number of industry sponsors. See:
> https://amplab.cs.berkeley.edu/sponsors/ for full details.
> 
> === Rationale ===
> As the number of contributors to Spark has grown, we have sought for a
> long-term home for the project, and we believe the Apache foundation would
> be a great fit. Spark is a natural fit for the Apache foundation: Spark
> already interoperates with several existing Apache projects (HDFS, HBase,
> Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar
> with the Apache process and and subscribes to the Apache mission - the
> team includes multiple Apache committers already. Finally, joining Apache
> will help coordinate the development effort of the growing number of
> organizations which contribute to Spark.
> 
> == Initial Goals ==
> The initial goals will most likely be to move the existing codebase to
> Apache and integrate with the Apache development process. Furthermore, we
> plan for incremental development, and releases along with the Apache
> guidelines.
> 
> === Current Status ===
> == Meritocracy ==
> The Spark project already operates on meritocratic principles. Today,
> Spark has several developers and has accepted multiple major patches from
> outside of U.C. Berkeley. While this process has remained mostly informal
> (we do not have an official committer list), an implicit organization
> exists in which individuals who contribute major components act as
> maintainers for those modules. If accepted, the Spark project would
> include several of these participants as committers from the onset. We
> will work to identify all committers and PPMC members for the project and
> to operate under the ASF meritocratic principles.
> 
> === Community ===
> Acceptance into the Apache foundation would bolster the already strong
> user and developer community around Spark. That community includes dozens
> of contributors from several institutions, a meetup group with several
> hundred members, and an active mailing list composed of hundreds of users.
> Core Developers
> The core developers of our project are listed in our contributors and
> initial PPMC below. Though many exist at UC Berkeley, there is a
> representative cross sampl

Re: [PROPOSAL] Apache Spark for the Incubator

2013-05-31 Thread Henry Saputra
Wow! I have been using Shark, which runs on top of Shark, with Mesos in our
prototype for API analytics for a while and would LOVE to help as mentor
and initial contributors.


- Henry



On Fri, May 31, 2013 at 11:03 AM, Mattmann, Chris A (398J) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Hi Folks,
>
> I'm pleased to bring you a proposal to the Apache Incubator for the Apache
> Spark project: https://wiki.apache.org/incubator/SparkProposal
>
> The work originates from the Berkeley AMPLab and through a number of
> industry
> participants, and other institutions. Spark is a framework for large-scale
> data
> analysis on clusters, with a particular focus on low latency operations.
> The
> source code is written in Scala, and provides a number of APIs and bindings
> in various programming languages.
>
> The proposal text is copied to the bottom of this email. I'm going to leave
> this thread open for the next week for discussion. Once it's died down,
> I'll
> call an official VOTE.
>
> Suresh, Ross G. -- heads up -- this project may be of interest to you both
> and would welcome you guys as additional mentors. We currently have 3
> mentors
> committed to the project, but would love to have more. People interested in
> contributing should declare their interest here on the general@incubator
> thread
> and those potential contributors will be discussed by the incoming Spark
> community.
>
> Questions -- let's hear em'! :)
>
> Cheers,
> Chris
> ("Champion", incoming Apache Spark)
>
> === Abstract ===
> Spark is an open source system for large-scale data analysis on clusters.
>
> === Proposal ===
> Spark is an open source system for fast and flexible large-scale data
> analysis. Spark provides a general purpose runtime that supports
> low-latency execution in several forms. These include interactive
> exploration of very large datasets, near real-time stream processing, and
> ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
> with HDFS, HBase, Cassandra and several other storage storage layers, and
> exposes APIs in Scala, Java and Python.
> Background
> Spark started as U.C. Berkeley research project, designed to efficiently
> run machine learning algorithms on large datasets. Over time, it has
> evolved into a general computing engine as outlined above. Spark¹s
> developer community has also grown to include additional institutions,
> such as universities, research labs, and corporations. Funding has been
> provided by various institutions including the U.S. National Science
> Foundation, DARPA, and a number of industry sponsors. See:
> https://amplab.cs.berkeley.edu/sponsors/ for full details.
>
> === Rationale ===
> As the number of contributors to Spark has grown, we have sought for a
> long-term home for the project, and we believe the Apache foundation would
> be a great fit. Spark is a natural fit for the Apache foundation: Spark
> already interoperates with several existing Apache projects (HDFS, HBase,
> Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar
> with the Apache process and and subscribes to the Apache mission - the
> team includes multiple Apache committers already. Finally, joining Apache
> will help coordinate the development effort of the growing number of
> organizations which contribute to Spark.
>
> == Initial Goals ==
> The initial goals will most likely be to move the existing codebase to
> Apache and integrate with the Apache development process. Furthermore, we
> plan for incremental development, and releases along with the Apache
> guidelines.
>
> === Current Status ===
> == Meritocracy ==
> The Spark project already operates on meritocratic principles. Today,
> Spark has several developers and has accepted multiple major patches from
> outside of U.C. Berkeley. While this process has remained mostly informal
> (we do not have an official committer list), an implicit organization
> exists in which individuals who contribute major components act as
> maintainers for those modules. If accepted, the Spark project would
> include several of these participants as committers from the onset. We
> will work to identify all committers and PPMC members for the project and
> to operate under the ASF meritocratic principles.
>
> === Community ===
> Acceptance into the Apache foundation would bolster the already strong
> user and developer community around Spark. That community includes dozens
> of contributors from several institutions, a meetup group with several
> hundred members, and an active mailing list composed of hundreds of users.
> Core Developers
> The core developers of our project are listed in our contributors and
> initial PPMC below. Though many exist at UC Berkeley, there is a
> representative cross sampling of other organizations including Quantifind,
> Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends.
>
>
> === Alignment ===
> Our proposed effort aligns with several ongoing BIGDATA and U.S. National
> priori

Re: [PROPOSAL] Apache Spark for the Incubator

2013-05-31 Thread Reynold Xin
Spark it is an execution framework, but it also provides some high level
APIs which makes it much easier to do data analytics.

For example, to do grep like queries:

val docs = sparkContext.textFile("hdfs://...")
docs.filter(doc => doc.contains("Berkeley")).count

Another example to do word count (using the Scala API):

val docs = sparkContext.textFile("hdfs://...")
val counts = docs.flatMap(line => line.split("\\s+")).map(word =>
(word, 1)).reduceByKey(_
+ _)
counts.saveAsTextFile("hdfs://...")

The high level APIs are similar to a lot of the relational operators,
including aggregations, group bys, joins, etc.

Shark uses Spark as the execution engine but provides a Hive-compatible SQL
interface. This proposal is however only about moving Spark to ASF
incubator, and not Shark.

--
Reynold Xin, AMPLab, UC Berkeley
http://rxin.org


On Fri, May 31, 2013 at 1:03 PM, Henry Saputra wrote:

> I believe it is more of a framework but you can take a look at Shark which
> using Spark to do data warehousing that support hive query (
> http://shark.cs.berkeley.edu)
>
> - Henry
>
> On Friday, May 31, 2013, Chen, Pei wrote:
>
> > +1 (non-binding)
> > This seems like a really interesting project.
> > Q- Is Spark just a framework/API or does it also have some tools
> > implemented for data analytics?
> > --Pei
> >
> > > -Original Message-
> > > From: Mattmann, Chris A (398J) [mailto:chris.a.mattm...@jpl.nasa.gov]
> > > Sent: Friday, May 31, 2013 2:04 PM
> > > To: general@incubator.apache.org
> > > Subject: [PROPOSAL] Apache Spark for the Incubator
> > >
> > > Hi Folks,
> > >
> > > I'm pleased to bring you a proposal to the Apache Incubator for the
> > Apache
> > > Spark project: https://wiki.apache.org/incubator/SparkProposal
> > >
> > > The work originates from the Berkeley AMPLab and through a number of
> > > industry participants, and other institutions. Spark is a framework for
> > large-
> > > scale data analysis on clusters, with a particular focus on low latency
> > > operations.
> > > The
> > > source code is written in Scala, and provides a number of APIs and
> > bindings in
> > > various programming languages.
> > >
> > > The proposal text is copied to the bottom of this email. I'm going to
> > leave this
> > > thread open for the next week for discussion. Once it's died down, I'll
> > call an
> > > official VOTE.
> > >
> > > Suresh, Ross G. -- heads up -- this project may be of interest to you
> > both and
> > > would welcome you guys as additional mentors. We currently have 3
> > > mentors committed to the project, but would love to have more. People
> > > interested in contributing should declare their interest here on the
> > > general@incubator thread and those potential contributors will be
> > discussed
> > > by the incoming Spark community.
> > >
> > > Questions -- let's hear em'! :)
> > >
> > > Cheers,
> > > Chris
> > > ("Champion", incoming Apache Spark)
> > >
> > > === Abstract ===
> > > Spark is an open source system for large-scale data analysis on
> clusters.
> > >
> > > === Proposal ===
> > > Spark is an open source system for fast and flexible large-scale data
> > analysis.
> > > Spark provides a general purpose runtime that supports low-latency
> > > execution in several forms. These include interactive exploration of
> very
> > > large datasets, near real-time stream processing, and ad-hoc SQL
> > analytics
> > > (through higher layer extensions). Spark interfaces with HDFS, HBase,
> > > Cassandra and several other storage storage layers, and exposes APIs in
> > > Scala, Java and Python.
> > > Background
> > > Spark started as U.C. Berkeley research project, designed to
> efficiently
> > run
> > > machine learning algorithms on large datasets. Over time, it has
> evolved
> > into
> > > a general computing engine as outlined above. Spark¹s developer
> community
> > > has also grown to include additional institutions, such as
> universities,
> > > research labs, and corporations. Funding has been provided by various
> > > institutions including the U.S. National Science Foundation, DARPA,
> and a
> > > number of industry sponsors. See:
> > > https://amplab.cs.berkeley.edu/sponsors/ for full details.
> > >
> > > === Rationale ===
>

Re: [PROPOSAL] Apache Spark for the Incubator

2013-05-31 Thread Roman Shaposhnik
Extremely enthusiastic +1!!!

If you ever need help with mentorship -- please let me know.

Also, looking forward to seeing this in Bigtop!

Thanks,
Roman.

On Fri, May 31, 2013 at 11:03 AM, Mattmann, Chris A (398J)
 wrote:
> Hi Folks,
>
> I'm pleased to bring you a proposal to the Apache Incubator for the Apache
> Spark project: https://wiki.apache.org/incubator/SparkProposal
>
> The work originates from the Berkeley AMPLab and through a number of
> industry
> participants, and other institutions. Spark is a framework for large-scale
> data
> analysis on clusters, with a particular focus on low latency operations.
> The
> source code is written in Scala, and provides a number of APIs and bindings
> in various programming languages.
>
> The proposal text is copied to the bottom of this email. I'm going to leave
> this thread open for the next week for discussion. Once it's died down,
> I'll
> call an official VOTE.
>
> Suresh, Ross G. -- heads up -- this project may be of interest to you both
> and would welcome you guys as additional mentors. We currently have 3
> mentors
> committed to the project, but would love to have more. People interested in
> contributing should declare their interest here on the general@incubator
> thread
> and those potential contributors will be discussed by the incoming Spark
> community.
>
> Questions -- let's hear em'! :)
>
> Cheers,
> Chris
> ("Champion", incoming Apache Spark)
>
> === Abstract ===
> Spark is an open source system for large-scale data analysis on clusters.
>
> === Proposal ===
> Spark is an open source system for fast and flexible large-scale data
> analysis. Spark provides a general purpose runtime that supports
> low-latency execution in several forms. These include interactive
> exploration of very large datasets, near real-time stream processing, and
> ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
> with HDFS, HBase, Cassandra and several other storage storage layers, and
> exposes APIs in Scala, Java and Python.
> Background
> Spark started as U.C. Berkeley research project, designed to efficiently
> run machine learning algorithms on large datasets. Over time, it has
> evolved into a general computing engine as outlined above. Spark¹s
> developer community has also grown to include additional institutions,
> such as universities, research labs, and corporations. Funding has been
> provided by various institutions including the U.S. National Science
> Foundation, DARPA, and a number of industry sponsors. See:
> https://amplab.cs.berkeley.edu/sponsors/ for full details.
>
> === Rationale ===
> As the number of contributors to Spark has grown, we have sought for a
> long-term home for the project, and we believe the Apache foundation would
> be a great fit. Spark is a natural fit for the Apache foundation: Spark
> already interoperates with several existing Apache projects (HDFS, HBase,
> Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar
> with the Apache process and and subscribes to the Apache mission - the
> team includes multiple Apache committers already. Finally, joining Apache
> will help coordinate the development effort of the growing number of
> organizations which contribute to Spark.
>
> == Initial Goals ==
> The initial goals will most likely be to move the existing codebase to
> Apache and integrate with the Apache development process. Furthermore, we
> plan for incremental development, and releases along with the Apache
> guidelines.
>
> === Current Status ===
> == Meritocracy ==
> The Spark project already operates on meritocratic principles. Today,
> Spark has several developers and has accepted multiple major patches from
> outside of U.C. Berkeley. While this process has remained mostly informal
> (we do not have an official committer list), an implicit organization
> exists in which individuals who contribute major components act as
> maintainers for those modules. If accepted, the Spark project would
> include several of these participants as committers from the onset. We
> will work to identify all committers and PPMC members for the project and
> to operate under the ASF meritocratic principles.
>
> === Community ===
> Acceptance into the Apache foundation would bolster the already strong
> user and developer community around Spark. That community includes dozens
> of contributors from several institutions, a meetup group with several
> hundred members, and an active mailing list composed of hundreds of users.
> Core Developers
> The core developers of our project are listed in our contributors and
> initial PPMC below. Though many exist at UC Berkeley, there is a
> representative cross sampling of other organizations including Quantifind,
> Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends.
>
>
> === Alignment ===
> Our proposed effort aligns with several ongoing BIGDATA and U.S. National
> priority funding interests including the NSF and its Expeditions progr

Re: [PROPOSAL] Apache Spark for the Incubator

2013-06-01 Thread Suresh Marru
On May 31, 2013, at 2:03 PM, "Mattmann, Chris A (398J)" 
 wrote:

> Hi Folks,
> 
> I'm pleased to bring you a proposal to the Apache Incubator for the Apache
> Spark project: https://wiki.apache.org/incubator/SparkProposal
> 
> The work originates from the Berkeley AMPLab and through a number of
> industry
> participants, and other institutions. Spark is a framework for large-scale
> data 
> analysis on clusters, with a particular focus on low latency operations.
> The
> source code is written in Scala, and provides a number of APIs and bindings
> in various programming languages.
> 
> The proposal text is copied to the bottom of this email. I'm going to leave
> this thread open for the next week for discussion. Once it's died down,
> I'll
> call an official VOTE.
> 
> Suresh, Ross G. -- heads up -- this project may be of interest to you both
> and would welcome you guys as additional mentors. We currently have 3
> mentors
> committed to the project, but would love to have more.

Thanks Chris for the alert. Great proposal indeed, if the podling needs help I 
am in.

Suresh


> People interested in
> contributing should declare their interest here on the general@incubator
> thread
> and those potential contributors will be discussed by the incoming Spark
> community.
> 
> Questions -- let's hear em'! :)
> 
> Cheers,
> Chris
> ("Champion", incoming Apache Spark)
> 
> === Abstract ===
> Spark is an open source system for large-scale data analysis on clusters.
> 
> === Proposal ===
> Spark is an open source system for fast and flexible large-scale data
> analysis. Spark provides a general purpose runtime that supports
> low-latency execution in several forms. These include interactive
> exploration of very large datasets, near real-time stream processing, and
> ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
> with HDFS, HBase, Cassandra and several other storage storage layers, and
> exposes APIs in Scala, Java and Python.
> Background
> Spark started as U.C. Berkeley research project, designed to efficiently
> run machine learning algorithms on large datasets. Over time, it has
> evolved into a general computing engine as outlined above. Spark¹s
> developer community has also grown to include additional institutions,
> such as universities, research labs, and corporations. Funding has been
> provided by various institutions including the U.S. National Science
> Foundation, DARPA, and a number of industry sponsors. See:
> https://amplab.cs.berkeley.edu/sponsors/ for full details.
> 
> === Rationale ===
> As the number of contributors to Spark has grown, we have sought for a
> long-term home for the project, and we believe the Apache foundation would
> be a great fit. Spark is a natural fit for the Apache foundation: Spark
> already interoperates with several existing Apache projects (HDFS, HBase,
> Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar
> with the Apache process and and subscribes to the Apache mission - the
> team includes multiple Apache committers already. Finally, joining Apache
> will help coordinate the development effort of the growing number of
> organizations which contribute to Spark.
> 
> == Initial Goals ==
> The initial goals will most likely be to move the existing codebase to
> Apache and integrate with the Apache development process. Furthermore, we
> plan for incremental development, and releases along with the Apache
> guidelines.
> 
> === Current Status ===
> == Meritocracy ==
> The Spark project already operates on meritocratic principles. Today,
> Spark has several developers and has accepted multiple major patches from
> outside of U.C. Berkeley. While this process has remained mostly informal
> (we do not have an official committer list), an implicit organization
> exists in which individuals who contribute major components act as
> maintainers for those modules. If accepted, the Spark project would
> include several of these participants as committers from the onset. We
> will work to identify all committers and PPMC members for the project and
> to operate under the ASF meritocratic principles.
> 
> === Community ===
> Acceptance into the Apache foundation would bolster the already strong
> user and developer community around Spark. That community includes dozens
> of contributors from several institutions, a meetup group with several
> hundred members, and an active mailing list composed of hundreds of users.
> Core Developers
> The core developers of our project are listed in our contributors and
> initial PPMC below. Though many exist at UC Berkeley, there is a
> representative cross sampling of other organizations including Quantifind,
> Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends.
> 
> 
> === Alignment ===
> Our proposed effort aligns with several ongoing BIGDATA and U.S. National
> priority funding interests including the NSF and its Expeditions program,
> and the DARPA XDATA project. Our indu

Re: [PROPOSAL] Apache Spark for the Incubator

2013-06-03 Thread Mattmann, Chris A (398J)
Hi Henry,

Thanks for your support! I will leave it up to Matei and
the incoming Spark community to decide if they would like
to add you (or anyone else) to the wiki as a contributor
on the project.

Thanks!

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Henry Saputra 
Reply-To: "general@incubator.apache.org" 
Date: Friday, May 31, 2013 12:38 PM
To: "general@incubator.apache.org" 
Subject: Re: [PROPOSAL] Apache Spark for the Incubator

>Wow! I have been using Shark, which runs on top of Shark, with Mesos in
>our
>prototype for API analytics for a while and would LOVE to help as mentor
>and initial contributors.
>
>
>- Henry
>
>
>
>On Fri, May 31, 2013 at 11:03 AM, Mattmann, Chris A (398J) <
>chris.a.mattm...@jpl.nasa.gov> wrote:
>
>> Hi Folks,
>>
>> I'm pleased to bring you a proposal to the Apache Incubator for the
>>Apache
>> Spark project: https://wiki.apache.org/incubator/SparkProposal
>>
>> The work originates from the Berkeley AMPLab and through a number of
>> industry
>> participants, and other institutions. Spark is a framework for
>>large-scale
>> data
>> analysis on clusters, with a particular focus on low latency operations.
>> The
>> source code is written in Scala, and provides a number of APIs and
>>bindings
>> in various programming languages.
>>
>> The proposal text is copied to the bottom of this email. I'm going to
>>leave
>> this thread open for the next week for discussion. Once it's died down,
>> I'll
>> call an official VOTE.
>>
>> Suresh, Ross G. -- heads up -- this project may be of interest to you
>>both
>> and would welcome you guys as additional mentors. We currently have 3
>> mentors
>> committed to the project, but would love to have more. People
>>interested in
>> contributing should declare their interest here on the general@incubator
>> thread
>> and those potential contributors will be discussed by the incoming Spark
>> community.
>>
>> Questions -- let's hear em'! :)
>>
>> Cheers,
>> Chris
>> ("Champion", incoming Apache Spark)
>>
>> === Abstract ===
>> Spark is an open source system for large-scale data analysis on
>>clusters.
>>
>> === Proposal ===
>> Spark is an open source system for fast and flexible large-scale data
>> analysis. Spark provides a general purpose runtime that supports
>> low-latency execution in several forms. These include interactive
>> exploration of very large datasets, near real-time stream processing,
>>and
>> ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
>> with HDFS, HBase, Cassandra and several other storage storage layers,
>>and
>> exposes APIs in Scala, Java and Python.
>> Background
>> Spark started as U.C. Berkeley research project, designed to efficiently
>> run machine learning algorithms on large datasets. Over time, it has
>> evolved into a general computing engine as outlined above. Spark¹s
>> developer community has also grown to include additional institutions,
>> such as universities, research labs, and corporations. Funding has been
>> provided by various institutions including the U.S. National Science
>> Foundation, DARPA, and a number of industry sponsors. See:
>> https://amplab.cs.berkeley.edu/sponsors/ for full details.
>>
>> === Rationale ===
>> As the number of contributors to Spark has grown, we have sought for a
>> long-term home for the project, and we believe the Apache foundation
>>would
>> be a great fit. Spark is a natural fit for the Apache foundation: Spark
>> already interoperates with several existing Apache projects (HDFS,
>>HBase,
>> Hive, Cassandra, Avro and Flume to name a few). The Spark team is
>>familiar
>> with the Apache process and and subscribes to the Apache mission - the
>> team includes multiple Apache committers already. Finally, joining
>>Apache
>> will help coordinate the development effort of the growing number of
>> organizations which contribute to Spark.
>>
>> =

Re: [PROPOSAL] Apache Spark for the Incubator

2013-06-03 Thread Mattmann, Chris A (398J)
Thanks for the support, Pei. I think the questions you had
about frameworks/etc., hopefully were answered.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: , Pei 
Reply-To: "general@incubator.apache.org" 
Date: Friday, May 31, 2013 11:45 AM
To: "general@incubator.apache.org" 
Subject: RE: [PROPOSAL] Apache Spark for the Incubator

>+1 (non-binding)
>This seems like a really interesting project.
>Q- Is Spark just a framework/API or does it also have some tools
>implemented for data analytics?
>--Pei
>
>> -Original Message-
>> From: Mattmann, Chris A (398J) [mailto:chris.a.mattm...@jpl.nasa.gov]
>> Sent: Friday, May 31, 2013 2:04 PM
>> To: general@incubator.apache.org
>> Subject: [PROPOSAL] Apache Spark for the Incubator
>> 
>> Hi Folks,
>> 
>> I'm pleased to bring you a proposal to the Apache Incubator for the
>>Apache
>> Spark project: https://wiki.apache.org/incubator/SparkProposal
>> 
>> The work originates from the Berkeley AMPLab and through a number of
>> industry participants, and other institutions. Spark is a framework for
>>large-
>> scale data analysis on clusters, with a particular focus on low latency
>> operations.
>> The
>> source code is written in Scala, and provides a number of APIs and
>>bindings in
>> various programming languages.
>> 
>> The proposal text is copied to the bottom of this email. I'm going to
>>leave this
>> thread open for the next week for discussion. Once it's died down, I'll
>>call an
>> official VOTE.
>> 
>> Suresh, Ross G. -- heads up -- this project may be of interest to you
>>both and
>> would welcome you guys as additional mentors. We currently have 3
>> mentors committed to the project, but would love to have more. People
>> interested in contributing should declare their interest here on the
>> general@incubator thread and those potential contributors will be
>>discussed
>> by the incoming Spark community.
>> 
>> Questions -- let's hear em'! :)
>> 
>> Cheers,
>> Chris
>> ("Champion", incoming Apache Spark)
>> 
>> === Abstract ===
>> Spark is an open source system for large-scale data analysis on
>>clusters.
>> 
>> === Proposal ===
>> Spark is an open source system for fast and flexible large-scale data
>>analysis.
>> Spark provides a general purpose runtime that supports low-latency
>> execution in several forms. These include interactive exploration of
>>very
>> large datasets, near real-time stream processing, and ad-hoc SQL
>>analytics
>> (through higher layer extensions). Spark interfaces with HDFS, HBase,
>> Cassandra and several other storage storage layers, and exposes APIs in
>> Scala, Java and Python.
>> Background
>> Spark started as U.C. Berkeley research project, designed to
>>efficiently run
>> machine learning algorithms on large datasets. Over time, it has
>>evolved into
>> a general computing engine as outlined above. Spark¹s developer
>>community
>> has also grown to include additional institutions, such as universities,
>> research labs, and corporations. Funding has been provided by various
>> institutions including the U.S. National Science Foundation, DARPA, and
>>a
>> number of industry sponsors. See:
>> https://amplab.cs.berkeley.edu/sponsors/ for full details.
>> 
>> === Rationale ===
>> As the number of contributors to Spark has grown, we have sought for a
>> long-term home for the project, and we believe the Apache foundation
>> would be a great fit. Spark is a natural fit for the Apache foundation:
>>Spark
>> already interoperates with several existing Apache projects (HDFS,
>>HBase,
>> Hive, Cassandra, Avro and Flume to name a few). The Spark team is
>>familiar
>> with the Apache process and and subscribes to the Apache mission - the
>> team includes multiple Apache committers already. Finally, joining
>>Apache
>> will help coordinate the development effort of the growing number of
>> organizations which contri

Re: [PROPOSAL] Apache Spark for the Incubator

2013-06-03 Thread Mattmann, Chris A (398J)
Thanks for the support Roman!

I will leave it up to the incoming Spark community members to
decide if they need more mentors and we'll be in touch.

Thank you again.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Roman Shaposhnik 
Reply-To: "general@incubator.apache.org" 
Date: Friday, May 31, 2013 3:25 PM
To: "general@incubator.apache.org" 
Subject: Re: [PROPOSAL] Apache Spark for the Incubator

>Extremely enthusiastic +1!!!
>
>If you ever need help with mentorship -- please let me know.
>
>Also, looking forward to seeing this in Bigtop!
>
>Thanks,
>Roman.
>
>On Fri, May 31, 2013 at 11:03 AM, Mattmann, Chris A (398J)
> wrote:
>> Hi Folks,
>>
>> I'm pleased to bring you a proposal to the Apache Incubator for the
>>Apache
>> Spark project: https://wiki.apache.org/incubator/SparkProposal
>>
>> The work originates from the Berkeley AMPLab and through a number of
>> industry
>> participants, and other institutions. Spark is a framework for
>>large-scale
>> data
>> analysis on clusters, with a particular focus on low latency operations.
>> The
>> source code is written in Scala, and provides a number of APIs and
>>bindings
>> in various programming languages.
>>
>> The proposal text is copied to the bottom of this email. I'm going to
>>leave
>> this thread open for the next week for discussion. Once it's died down,
>> I'll
>> call an official VOTE.
>>
>> Suresh, Ross G. -- heads up -- this project may be of interest to you
>>both
>> and would welcome you guys as additional mentors. We currently have 3
>> mentors
>> committed to the project, but would love to have more. People
>>interested in
>> contributing should declare their interest here on the general@incubator
>> thread
>> and those potential contributors will be discussed by the incoming Spark
>> community.
>>
>> Questions -- let's hear em'! :)
>>
>> Cheers,
>> Chris
>> ("Champion", incoming Apache Spark)
>>
>> === Abstract ===
>> Spark is an open source system for large-scale data analysis on
>>clusters.
>>
>> === Proposal ===
>> Spark is an open source system for fast and flexible large-scale data
>> analysis. Spark provides a general purpose runtime that supports
>> low-latency execution in several forms. These include interactive
>> exploration of very large datasets, near real-time stream processing,
>>and
>> ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
>> with HDFS, HBase, Cassandra and several other storage storage layers,
>>and
>> exposes APIs in Scala, Java and Python.
>> Background
>> Spark started as U.C. Berkeley research project, designed to efficiently
>> run machine learning algorithms on large datasets. Over time, it has
>> evolved into a general computing engine as outlined above. Spark¹s
>> developer community has also grown to include additional institutions,
>> such as universities, research labs, and corporations. Funding has been
>> provided by various institutions including the U.S. National Science
>> Foundation, DARPA, and a number of industry sponsors. See:
>> https://amplab.cs.berkeley.edu/sponsors/ for full details.
>>
>> === Rationale ===
>> As the number of contributors to Spark has grown, we have sought for a
>> long-term home for the project, and we believe the Apache foundation
>>would
>> be a great fit. Spark is a natural fit for the Apache foundation: Spark
>> already interoperates with several existing Apache projects (HDFS,
>>HBase,
>> Hive, Cassandra, Avro and Flume to name a few). The Spark team is
>>familiar
>> with the Apache process and and subscribes to the Apache mission - the
>> team includes multiple Apache committers already. Finally, joining
>>Apache
>> will help coordinate the development effort of the growing number of
>> organizations which contribute to Spark.
>>
>> == Initial Goals ==
>> The initial goals will most likely be to move the existing codebase to
>&

Re: [PROPOSAL] Apache Spark for the Incubator

2013-06-03 Thread Mattmann, Chris A (398J)
Thanks Suresh, after conferring with the incoming Spark community
members, I will add you as a mentor on the wiki.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Suresh Marru 
Reply-To: "general@incubator.apache.org" 
Date: Saturday, June 1, 2013 9:12 PM
To: "general@incubator.apache.org" 
Subject: Re: [PROPOSAL] Apache Spark for the Incubator

>On May 31, 2013, at 2:03 PM, "Mattmann, Chris A (398J)"
> wrote:
>
>> Hi Folks,
>> 
>> I'm pleased to bring you a proposal to the Apache Incubator for the
>>Apache
>> Spark project: https://wiki.apache.org/incubator/SparkProposal
>> 
>> The work originates from the Berkeley AMPLab and through a number of
>> industry
>> participants, and other institutions. Spark is a framework for
>>large-scale
>> data 
>> analysis on clusters, with a particular focus on low latency operations.
>> The
>> source code is written in Scala, and provides a number of APIs and
>>bindings
>> in various programming languages.
>> 
>> The proposal text is copied to the bottom of this email. I'm going to
>>leave
>> this thread open for the next week for discussion. Once it's died down,
>> I'll
>> call an official VOTE.
>> 
>> Suresh, Ross G. -- heads up -- this project may be of interest to you
>>both
>> and would welcome you guys as additional mentors. We currently have 3
>> mentors
>> committed to the project, but would love to have more.
>
>Thanks Chris for the alert. Great proposal indeed, if the podling needs
>help I am in.
>
>Suresh
>
>
>> People interested in
>> contributing should declare their interest here on the general@incubator
>> thread
>> and those potential contributors will be discussed by the incoming Spark
>> community.
>> 
>> Questions -- let's hear em'! :)
>> 
>> Cheers,
>> Chris
>> ("Champion", incoming Apache Spark)
>> 
>> === Abstract ===
>> Spark is an open source system for large-scale data analysis on
>>clusters.
>> 
>> === Proposal ===
>> Spark is an open source system for fast and flexible large-scale data
>> analysis. Spark provides a general purpose runtime that supports
>> low-latency execution in several forms. These include interactive
>> exploration of very large datasets, near real-time stream processing,
>>and
>> ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
>> with HDFS, HBase, Cassandra and several other storage storage layers,
>>and
>> exposes APIs in Scala, Java and Python.
>> Background
>> Spark started as U.C. Berkeley research project, designed to efficiently
>> run machine learning algorithms on large datasets. Over time, it has
>> evolved into a general computing engine as outlined above. Spark¹s
>> developer community has also grown to include additional institutions,
>> such as universities, research labs, and corporations. Funding has been
>> provided by various institutions including the U.S. National Science
>> Foundation, DARPA, and a number of industry sponsors. See:
>> https://amplab.cs.berkeley.edu/sponsors/ for full details.
>> 
>> === Rationale ===
>> As the number of contributors to Spark has grown, we have sought for a
>> long-term home for the project, and we believe the Apache foundation
>>would
>> be a great fit. Spark is a natural fit for the Apache foundation: Spark
>> already interoperates with several existing Apache projects (HDFS,
>>HBase,
>> Hive, Cassandra, Avro and Flume to name a few). The Spark team is
>>familiar
>> with the Apache process and and subscribes to the Apache mission - the
>> team includes multiple Apache committers already. Finally, joining
>>Apache
>> will help coordinate the development effort of the growing number of
>> organizations which contribute to Spark.
>> 
>> == Initial Goals ==
>> The initial goals will most likely be to move the existing codebase to
>> Apache and integrate with the Apache development process. Furthermore,
>>we
>> plan for incrementa

Re: [PROPOSAL] Apache Spark for the Incubator

2013-06-03 Thread Mattmann, Chris A (398J)
Hi Konstantin,

Thanks for your kind words and expressed interest. I will leave it
to Matei and the incoming Spark community members to comment on adding
you (or anyone else) as a contributor to the wiki. If they are OK with
it, then I am very much too.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Konstantin Boudnik 
Reply-To: "general@incubator.apache.org" 
Date: Friday, May 31, 2013 12:29 PM
To: "general@incubator.apache.org" 
Subject: Re: [PROPOSAL] Apache Spark for the Incubator

>Great news!
>
>Definitely +1 (non-binding, I guess) on adding Spark to the family
>of ASF project!
>
>I also express the interest to contribute to the project and move it
>forward
>to the graduation! Bigtop has been packaging and providing Spark as a
>part of
>Hadoop 1.x software stacks for some time; and hopefully would be able to
>offer
>it as a part of Hadoop 2.x line in the coming days.
>
>Dr. Konstantin Boudnik
>  Hadoop committer
>  BigTop PMC
>
>On Fri, May 31, 2013 at 06:03PM, Mattmann, Chris A (398J) wrote:
>> Hi Folks,
>> 
>> I'm pleased to bring you a proposal to the Apache Incubator for the
>>Apache
>> Spark project: https://wiki.apache.org/incubator/SparkProposal
>> 
>> The work originates from the Berkeley AMPLab and through a number of
>> industry
>> participants, and other institutions. Spark is a framework for
>>large-scale
>> data 
>> analysis on clusters, with a particular focus on low latency operations.
>> The
>> source code is written in Scala, and provides a number of APIs and
>>bindings
>> in various programming languages.
>> 
>> The proposal text is copied to the bottom of this email. I'm going to
>>leave
>> this thread open for the next week for discussion. Once it's died down,
>> I'll
>> call an official VOTE.
>> 
>> Suresh, Ross G. -- heads up -- this project may be of interest to you
>>both
>> and would welcome you guys as additional mentors. We currently have 3
>> mentors
>> committed to the project, but would love to have more. People
>>interested in
>> contributing should declare their interest here on the general@incubator
>> thread
>> and those potential contributors will be discussed by the incoming Spark
>> community.
>> 
>> Questions -- let's hear em'! :)
>> 
>> Cheers,
>> Chris
>> ("Champion", incoming Apache Spark)
>> 
>> === Abstract ===
>> Spark is an open source system for large-scale data analysis on
>>clusters.
>> 
>> === Proposal ===
>> Spark is an open source system for fast and flexible large-scale data
>> analysis. Spark provides a general purpose runtime that supports
>> low-latency execution in several forms. These include interactive
>> exploration of very large datasets, near real-time stream processing,
>>and
>> ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
>> with HDFS, HBase, Cassandra and several other storage storage layers,
>>and
>> exposes APIs in Scala, Java and Python.
>> Background
>> Spark started as U.C. Berkeley research project, designed to efficiently
>> run machine learning algorithms on large datasets. Over time, it has
>> evolved into a general computing engine as outlined above. Spark╧s
>> developer community has also grown to include additional institutions,
>> such as universities, research labs, and corporations. Funding has been
>> provided by various institutions including the U.S. National Science
>> Foundation, DARPA, and a number of industry sponsors. See:
>> https://amplab.cs.berkeley.edu/sponsors/ for full details.
>> 
>> === Rationale ===
>> As the number of contributors to Spark has grown, we have sought for a
>> long-term home for the project, and we believe the Apache foundation
>>would
>> be a great fit. Spark is a natural fit for the Apache foundation: Spark
>> already interoperates with several existing Apache projects (HDFS,
>>HBase,
>> Hive, Cassandra, Avro and Flume to name a few). The Spark team is
>>familiar
>> 

Re: [PROPOSAL] Apache Spark for the Incubator

2013-06-03 Thread Mattmann, Chris A (398J)
Hi Henry,

I've conferred with the incoming Spark community and we are
very happy to have you as a mentor on the project.
Please feel free to add yourself to the wiki.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Henry Saputra 
Reply-To: "general@incubator.apache.org" 
Date: Friday, May 31, 2013 12:38 PM
To: "general@incubator.apache.org" 
Subject: Re: [PROPOSAL] Apache Spark for the Incubator

>Wow! I have been using Shark, which runs on top of Shark, with Mesos in
>our
>prototype for API analytics for a while and would LOVE to help as mentor
>and initial contributors.
>
>
>- Henry
>
>
>
>On Fri, May 31, 2013 at 11:03 AM, Mattmann, Chris A (398J) <
>chris.a.mattm...@jpl.nasa.gov> wrote:
>
>> Hi Folks,
>>
>> I'm pleased to bring you a proposal to the Apache Incubator for the
>>Apache
>> Spark project: https://wiki.apache.org/incubator/SparkProposal
>>
>> The work originates from the Berkeley AMPLab and through a number of
>> industry
>> participants, and other institutions. Spark is a framework for
>>large-scale
>> data
>> analysis on clusters, with a particular focus on low latency operations.
>> The
>> source code is written in Scala, and provides a number of APIs and
>>bindings
>> in various programming languages.
>>
>> The proposal text is copied to the bottom of this email. I'm going to
>>leave
>> this thread open for the next week for discussion. Once it's died down,
>> I'll
>> call an official VOTE.
>>
>> Suresh, Ross G. -- heads up -- this project may be of interest to you
>>both
>> and would welcome you guys as additional mentors. We currently have 3
>> mentors
>> committed to the project, but would love to have more. People
>>interested in
>> contributing should declare their interest here on the general@incubator
>> thread
>> and those potential contributors will be discussed by the incoming Spark
>> community.
>>
>> Questions -- let's hear em'! :)
>>
>> Cheers,
>> Chris
>> ("Champion", incoming Apache Spark)
>>
>> === Abstract ===
>> Spark is an open source system for large-scale data analysis on
>>clusters.
>>
>> === Proposal ===
>> Spark is an open source system for fast and flexible large-scale data
>> analysis. Spark provides a general purpose runtime that supports
>> low-latency execution in several forms. These include interactive
>> exploration of very large datasets, near real-time stream processing,
>>and
>> ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
>> with HDFS, HBase, Cassandra and several other storage storage layers,
>>and
>> exposes APIs in Scala, Java and Python.
>> Background
>> Spark started as U.C. Berkeley research project, designed to efficiently
>> run machine learning algorithms on large datasets. Over time, it has
>> evolved into a general computing engine as outlined above. Spark¹s
>> developer community has also grown to include additional institutions,
>> such as universities, research labs, and corporations. Funding has been
>> provided by various institutions including the U.S. National Science
>> Foundation, DARPA, and a number of industry sponsors. See:
>> https://amplab.cs.berkeley.edu/sponsors/ for full details.
>>
>> === Rationale ===
>> As the number of contributors to Spark has grown, we have sought for a
>> long-term home for the project, and we believe the Apache foundation
>>would
>> be a great fit. Spark is a natural fit for the Apache foundation: Spark
>> already interoperates with several existing Apache projects (HDFS,
>>HBase,
>> Hive, Cassandra, Avro and Flume to name a few). The Spark team is
>>familiar
>> with the Apache process and and subscribes to the Apache mission - the
>> team includes multiple Apache committers already. Finally, joining
>>Apache
>> will help coordinate the development effort of the growing number of
>> organizations which contribute to Spark.
>>
>> == Initial Goals ==
>> The ini

Re: [PROPOSAL] Apache Spark for the Incubator

2013-06-03 Thread Mattmann, Chris A (398J)
Hi Roman, I've conferred with the incoming Spark community and we are
happy to have you
as a mentor for the project.

Feel free to add yourself to the wiki proposal.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Roman Shaposhnik 
Reply-To: "general@incubator.apache.org" 
Date: Friday, May 31, 2013 3:25 PM
To: "general@incubator.apache.org" 
Subject: Re: [PROPOSAL] Apache Spark for the Incubator

>Extremely enthusiastic +1!!!
>
>If you ever need help with mentorship -- please let me know.
>
>Also, looking forward to seeing this in Bigtop!
>
>Thanks,
>Roman.
>
>On Fri, May 31, 2013 at 11:03 AM, Mattmann, Chris A (398J)
> wrote:
>> Hi Folks,
>>
>> I'm pleased to bring you a proposal to the Apache Incubator for the
>>Apache
>> Spark project: https://wiki.apache.org/incubator/SparkProposal
>>
>> The work originates from the Berkeley AMPLab and through a number of
>> industry
>> participants, and other institutions. Spark is a framework for
>>large-scale
>> data
>> analysis on clusters, with a particular focus on low latency operations.
>> The
>> source code is written in Scala, and provides a number of APIs and
>>bindings
>> in various programming languages.
>>
>> The proposal text is copied to the bottom of this email. I'm going to
>>leave
>> this thread open for the next week for discussion. Once it's died down,
>> I'll
>> call an official VOTE.
>>
>> Suresh, Ross G. -- heads up -- this project may be of interest to you
>>both
>> and would welcome you guys as additional mentors. We currently have 3
>> mentors
>> committed to the project, but would love to have more. People
>>interested in
>> contributing should declare their interest here on the general@incubator
>> thread
>> and those potential contributors will be discussed by the incoming Spark
>> community.
>>
>> Questions -- let's hear em'! :)
>>
>> Cheers,
>> Chris
>> ("Champion", incoming Apache Spark)
>>
>> === Abstract ===
>> Spark is an open source system for large-scale data analysis on
>>clusters.
>>
>> === Proposal ===
>> Spark is an open source system for fast and flexible large-scale data
>> analysis. Spark provides a general purpose runtime that supports
>> low-latency execution in several forms. These include interactive
>> exploration of very large datasets, near real-time stream processing,
>>and
>> ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
>> with HDFS, HBase, Cassandra and several other storage storage layers,
>>and
>> exposes APIs in Scala, Java and Python.
>> Background
>> Spark started as U.C. Berkeley research project, designed to efficiently
>> run machine learning algorithms on large datasets. Over time, it has
>> evolved into a general computing engine as outlined above. Spark¹s
>> developer community has also grown to include additional institutions,
>> such as universities, research labs, and corporations. Funding has been
>> provided by various institutions including the U.S. National Science
>> Foundation, DARPA, and a number of industry sponsors. See:
>> https://amplab.cs.berkeley.edu/sponsors/ for full details.
>>
>> === Rationale ===
>> As the number of contributors to Spark has grown, we have sought for a
>> long-term home for the project, and we believe the Apache foundation
>>would
>> be a great fit. Spark is a natural fit for the Apache foundation: Spark
>> already interoperates with several existing Apache projects (HDFS,
>>HBase,
>> Hive, Cassandra, Avro and Flume to name a few). The Spark team is
>>familiar
>> with the Apache process and and subscribes to the Apache mission - the
>> team includes multiple Apache committers already. Finally, joining
>>Apache
>> will help coordinate the development effort of the growing number of
>> organizations which contribute to Spark.
>>
>> == Initial Goals ==
>> The initial goals will most likely be to move the existing codebase to
>&

Re: [PROPOSAL] Apache Spark for the Incubator

2013-06-03 Thread Mattmann, Chris A (398J)
Dear Konstantin,

Thanks! The incoming Spark project is excited about the relationship
with Bigtop that could happen here.

As for new committers, after conferring with the Spark project
members, we would like to adopt a simple policy of having all new
committers not add themselves to the wiki as of yet, but simply
join the project mailing lists when they are created, and then from
there, contribute. I and other mentors, and the Spark community are
committed to being inclusive, so hopefully won't take too long for
anybody to become a PPMC member/committer on the project after some
demonstrated contributions.

Thanks for your interest and again for your kind words.

Cheers!

Chris


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Konstantin Boudnik 
Reply-To: "general@incubator.apache.org" 
Date: Friday, May 31, 2013 12:29 PM
To: "general@incubator.apache.org" 
Subject: Re: [PROPOSAL] Apache Spark for the Incubator

>Great news!
>
>Definitely +1 (non-binding, I guess) on adding Spark to the family
>of ASF project!
>
>I also express the interest to contribute to the project and move it
>forward
>to the graduation! Bigtop has been packaging and providing Spark as a
>part of
>Hadoop 1.x software stacks for some time; and hopefully would be able to
>offer
>it as a part of Hadoop 2.x line in the coming days.
>
>Dr. Konstantin Boudnik
>  Hadoop committer
>  BigTop PMC
>
>On Fri, May 31, 2013 at 06:03PM, Mattmann, Chris A (398J) wrote:
>> Hi Folks,
>> 
>> I'm pleased to bring you a proposal to the Apache Incubator for the
>>Apache
>> Spark project: https://wiki.apache.org/incubator/SparkProposal
>> 
>> The work originates from the Berkeley AMPLab and through a number of
>> industry
>> participants, and other institutions. Spark is a framework for
>>large-scale
>> data 
>> analysis on clusters, with a particular focus on low latency operations.
>> The
>> source code is written in Scala, and provides a number of APIs and
>>bindings
>> in various programming languages.
>> 
>> The proposal text is copied to the bottom of this email. I'm going to
>>leave
>> this thread open for the next week for discussion. Once it's died down,
>> I'll
>> call an official VOTE.
>> 
>> Suresh, Ross G. -- heads up -- this project may be of interest to you
>>both
>> and would welcome you guys as additional mentors. We currently have 3
>> mentors
>> committed to the project, but would love to have more. People
>>interested in
>> contributing should declare their interest here on the general@incubator
>> thread
>> and those potential contributors will be discussed by the incoming Spark
>> community.
>> 
>> Questions -- let's hear em'! :)
>> 
>> Cheers,
>> Chris
>> ("Champion", incoming Apache Spark)
>> 
>> === Abstract ===
>> Spark is an open source system for large-scale data analysis on
>>clusters.
>> 
>> === Proposal ===
>> Spark is an open source system for fast and flexible large-scale data
>> analysis. Spark provides a general purpose runtime that supports
>> low-latency execution in several forms. These include interactive
>> exploration of very large datasets, near real-time stream processing,
>>and
>> ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
>> with HDFS, HBase, Cassandra and several other storage storage layers,
>>and
>> exposes APIs in Scala, Java and Python.
>> Background
>> Spark started as U.C. Berkeley research project, designed to efficiently
>> run machine learning algorithms on large datasets. Over time, it has
>> evolved into a general computing engine as outlined above. Spark╧s
>> developer community has also grown to include additional institutions,
>> such as universities, research labs, and corporations. Funding has been
>> provided by various institutions including the U.S. National Science
>> Foundation, DARPA, and a number of industry sponsors. See:
>> https://amplab.cs.berkeley.edu/sponsors/ for full details.
>> 
>> === Rationale ===
>> As the number of contributors 

Re: [PROPOSAL] Apache Spark for the Incubator

2013-06-03 Thread Henry Saputra
Thanks Chris, looking forward for this project to be part of ASF family.

I have added my name as mentor in the proposal.

- Henry


On Mon, Jun 3, 2013 at 6:41 PM, Mattmann, Chris A (398J) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Hi Henry,
>
> I've conferred with the incoming Spark community and we are
> very happy to have you as a mentor on the project.
> Please feel free to add yourself to the wiki.
>
> Cheers,
> Chris
>
> ++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>
>
>
>
>
> -Original Message-
> From: Henry Saputra 
> Reply-To: "general@incubator.apache.org" 
> Date: Friday, May 31, 2013 12:38 PM
> To: "general@incubator.apache.org" 
> Subject: Re: [PROPOSAL] Apache Spark for the Incubator
>
> >Wow! I have been using Shark, which runs on top of Shark, with Mesos in
> >our
> >prototype for API analytics for a while and would LOVE to help as mentor
> >and initial contributors.
> >
> >
> >- Henry
> >
> >
> >
> >On Fri, May 31, 2013 at 11:03 AM, Mattmann, Chris A (398J) <
> >chris.a.mattm...@jpl.nasa.gov> wrote:
> >
> >> Hi Folks,
> >>
> >> I'm pleased to bring you a proposal to the Apache Incubator for the
> >>Apache
> >> Spark project: https://wiki.apache.org/incubator/SparkProposal
> >>
> >> The work originates from the Berkeley AMPLab and through a number of
> >> industry
> >> participants, and other institutions. Spark is a framework for
> >>large-scale
> >> data
> >> analysis on clusters, with a particular focus on low latency operations.
> >> The
> >> source code is written in Scala, and provides a number of APIs and
> >>bindings
> >> in various programming languages.
> >>
> >> The proposal text is copied to the bottom of this email. I'm going to
> >>leave
> >> this thread open for the next week for discussion. Once it's died down,
> >> I'll
> >> call an official VOTE.
> >>
> >> Suresh, Ross G. -- heads up -- this project may be of interest to you
> >>both
> >> and would welcome you guys as additional mentors. We currently have 3
> >> mentors
> >> committed to the project, but would love to have more. People
> >>interested in
> >> contributing should declare their interest here on the general@incubator
> >> thread
> >> and those potential contributors will be discussed by the incoming Spark
> >> community.
> >>
> >> Questions -- let's hear em'! :)
> >>
> >> Cheers,
> >> Chris
> >> ("Champion", incoming Apache Spark)
> >>
> >> === Abstract ===
> >> Spark is an open source system for large-scale data analysis on
> >>clusters.
> >>
> >> === Proposal ===
> >> Spark is an open source system for fast and flexible large-scale data
> >> analysis. Spark provides a general purpose runtime that supports
> >> low-latency execution in several forms. These include interactive
> >> exploration of very large datasets, near real-time stream processing,
> >>and
> >> ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
> >> with HDFS, HBase, Cassandra and several other storage storage layers,
> >>and
> >> exposes APIs in Scala, Java and Python.
> >> Background
> >> Spark started as U.C. Berkeley research project, designed to efficiently
> >> run machine learning algorithms on large datasets. Over time, it has
> >> evolved into a general computing engine as outlined above. Spark¹s
> >> developer community has also grown to include additional institutions,
> >> such as universities, research labs, and corporations. Funding has been
> >> provided by various institutions including the U.S. National Science
> >> Foundation, DARPA, and a number of industry sponsors. See:
> >> https://amplab.cs.berkeley.edu/sponsors/ for full details.
> >>
> >> === Rationale ===
>

Re: [PROPOSAL] Apache Spark for the Incubator

2013-06-08 Thread Mattmann, Chris A (398J)
Note: we discussed adding Roman before the VOTE and it was
fine with the incoming Spark community, so Roman is now on
the wiki for the proposal.

In case this changes anyone's VOTE on the VOTE thread, feel
free to speak up or change your VOTE. Otherwise, nothing else
to see here folks.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Roman Shaposhnik 
Date: Saturday, June 8, 2013 3:03 PM
To: jpluser 
Subject: Re: [PROPOSAL] Apache Spark for the Incubator

>On Mon, Jun 3, 2013 at 6:40 PM, Mattmann, Chris A (398J)
> wrote:
>> Hi Roman, I've conferred with the incoming Spark community and we are
>> happy to have you
>> as a mentor for the project.
>>
>> Feel free to add yourself to the wiki proposal.
>
>Great news! Done.
>
>Thanks,
>Roman.


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Apache Spark for the Incubator

2013-06-08 Thread Marvin Humphrey
On Sat, Jun 8, 2013 at 4:55 PM, Mattmann, Chris A (398J)
 wrote:
> Note: we discussed adding Roman before the VOTE and it was
> fine with the incoming Spark community, so Roman is now on
> the wiki for the proposal.
>
> In case this changes anyone's VOTE on the VOTE thread, feel
> free to speak up or change your VOTE. Otherwise, nothing else
> to see here folks.

+1 for the original proposal.

+0.9 for the new proposal.

Yes, I expect you to tally my vote that way.  :)

Next time, please be more careful when starting a VOTE and please don't change
the proposal text in the middle of a vote.  Personnel issues in proposals have
caused significant problems in the past.  That's unlikely to happen in this
case, but I want to register my protest now because it might save us hundreds
or thousands of emails in the future.

Good luck, Spark!

Marvin Humphrey

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Apache Spark for the Incubator

2013-06-10 Thread Mohammad Nour El-Din
Hi Marvin


On Sun, Jun 9, 2013 at 5:15 AM, Marvin Humphrey wrote:

> On Sat, Jun 8, 2013 at 4:55 PM, Mattmann, Chris A (398J)
>  wrote:
> > Note: we discussed adding Roman before the VOTE and it was
> > fine with the incoming Spark community, so Roman is now on
> > the wiki for the proposal.
> >
> > In case this changes anyone's VOTE on the VOTE thread, feel
> > free to speak up or change your VOTE. Otherwise, nothing else
> > to see here folks.
>
> +1 for the original proposal.
>
> +0.9 for the new proposal.
>
> Yes, I expect you to tally my vote that way.  :)
>
> Next time, please be more careful when starting a VOTE and please don't
> change
> the proposal text in the middle of a vote.  Personnel issues in proposals
> have
> caused significant problems in the past.  That's unlikely to happen in this
> case, but I want to register my protest now because it might save us
> hundreds
> or thousands of emails in the future.
>

This is *not* a [VOTE] yet, this is a [PROPOSAL] in which case the proposal
can be updated and enhanced if required. So allow me to disagree about what
you replied regarding *not to make changes to the proposal in such phase*


>
> Good luck, Spark!
>
> Marvin Humphrey
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


-- 
Thanks
- Mohammad Nour

"Life is like riding a bicycle. To keep your balance you must keep moving"
- Albert Einstein


Re: [PROPOSAL] Apache Spark for the Incubator

2013-06-10 Thread Mohammad Nour El-Din
Hi


On Mon, Jun 10, 2013 at 5:03 PM, Mohammad Nour El-Din <
nour.moham...@gmail.com> wrote:

> Hi Marvin
>
>
> On Sun, Jun 9, 2013 at 5:15 AM, Marvin Humphrey wrote:
>
>> On Sat, Jun 8, 2013 at 4:55 PM, Mattmann, Chris A (398J)
>>  wrote:
>> > Note: we discussed adding Roman before the VOTE and it was
>> > fine with the incoming Spark community, so Roman is now on
>> > the wiki for the proposal.
>> >
>> > In case this changes anyone's VOTE on the VOTE thread, feel
>> > free to speak up or change your VOTE. Otherwise, nothing else
>> > to see here folks.
>>
>> +1 for the original proposal.
>>
>> +0.9 for the new proposal.
>>
>> Yes, I expect you to tally my vote that way.  :)
>>
>> Next time, please be more careful when starting a VOTE and please don't
>> change
>> the proposal text in the middle of a vote.  Personnel issues in proposals
>> have
>> caused significant problems in the past.  That's unlikely to happen in
>> this
>> case, but I want to register my protest now because it might save us
>> hundreds
>> or thousands of emails in the future.
>>
>
> This is *not* a [VOTE] yet, this is a [PROPOSAL] in which case the
> proposal can be updated and enhanced if required. So allow me to disagree
> about what you replied regarding *not to make changes to the proposal in
> such phase*
>

My wrong, actually no it is Google Mail's wrong :S the [VOTE] e-mails came
under the [PROPOSAL] thread


>
>
>>
>> Good luck, Spark!
>>
>> Marvin Humphrey
>>
>> -
>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> For additional commands, e-mail: general-h...@incubator.apache.org
>>
>>
>
>
> --
> Thanks
> - Mohammad Nour
> 
> "Life is like riding a bicycle. To keep your balance you must keep moving"
> - Albert Einstein
>



-- 
Thanks
- Mohammad Nour

"Life is like riding a bicycle. To keep your balance you must keep moving"
- Albert Einstein


Re: [PROPOSAL] Apache Spark for the Incubator

2013-06-26 Thread Matt Franklin
Yes. d...@spark.incubator.apache.org

On Wednesday, June 26, 2013, karthik tunga wrote:

> Hi,
>
> Is the mailing list setup ?
>
> Cheers,
> Karthik
>
>
> On 20 June 2013 02:38, Matei Zaharia  wrote:
>
> > Thanks Chris! We'll get started on all the required steps.
> >
> > Matei
> >
> > On Jun 20, 2013, at 4:35 AM, "Mattmann, Chris A (398J)" <
> > chris.a.mattm...@jpl.nasa.gov> wrote:
> >
> > > Hi Folks,
> > >
> > > This VOTE has passed with the following tallies:
> > >
> > > +1
> > > Chris Mattmann*
> > > Konstantin Boudnik
> > > Henry Saputra*
> > > Reynold Xin
> > > Pei Chen
> > > Roman Shaposhnik*
> > > Suresh Marru*
> > > Scott Deboy
> > > Ted Dunning*
> > > Hitesh Shah
> > > Paul Ramirez*
> > > Ralph Goers*
> > > Alan Cabrera*
> > > Thilina Gunarathne
> > > Marcel Offermans*
> > > Alex Karasulu*
> > > Chris Douglas*
> > > Andrew Hart*
> > > Deepal jayasinghe
> > > Ashish
> > > Joe Brockmeier*
> > > Mohammad Nour El-Din*
> > > Arun C Murthy*
> > > Tim Williams*
> > > Arvind Prabhakar*
> > > Matt Franklin*
> > > Matei Zaharia
> > > Andy Konwinski
> > >
> > > +0.9
> > >
> > >
> > > Marvin Humphrey
> > >
> > > * -indicates IPMC
> > >
> > >
> > > I'll go ahead and get the JIRA tickets filed for email/issue
> > tracking/Git,
> > > and then work with the community to get them moving on' over. Thanks
> for
> > > VOTE'ing!
> > >
> > > Cheers,
> > > Chris
> > >
> > >
> > > ++
> > > Chris Mattmann, Ph.D.
> > > Senior Computer Scientist
> > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > > Office: 171-266B, Mailstop: 171-246
> > > Email: chris.a.mattm...@nasa.gov
> > > WWW:  http://sunset.usc.edu/~mattmann/
> > > ++
> > > Adjunct Assistant Professor, Computer Science Department
> > > University of Southern California, Los Angeles, CA 90089 USA
> > > ++
> > >
> > >
> > >
> > >
> > >
> > >
> > > -Original Message-
> > > From: , jpluser 
> > > Reply-To: "general@incubator.apache.org"  >
> > > Date: Friday, June 7, 2013 10:34 PM
> > > To: "general@incubator.apache.org" 
> > > Subject: [VOTE] Apache Spark for the Incubator
> > >
> > >> Hi Folks,
> > >>
> > >> OK discussion has died down, time to VOTE to accept Spark into the
> > >> Apache Incubator. I'll let the VOTE run for at least a week.
> > >>
> > >> So far I've heard +1s from the following folks, so no need for them
> > >> to VOTE again unless they want to change their VOTE:
> > >>
> > >> +1
> > >>
> > >> Chris Mattmann*
> > >> Konstantin Boudnik
> > >> Henry Saputra*
> > >> Reynold Xin
> > >> Pei Chen
> > >> Roman Shaposhnik*
> > >> Suresh Marru*
> > >>
> > >> * -indicates IPMC
> > >>
> > >> [ ] +1 Accept Spark into the Apache Incubator.
> > >> [ ] +0 Don't care.
> > >> [ ] -1 Don't accept Spark into the Apache Incubator because..
> > >>
> > >> Proposal text is below.
> > >>
> > >> === Abstract ===
> > >> Spark is an open source system for large-scale data analysis on
> > clusters.


Re: [PROPOSAL] Apache Spark for the Incubator

2013-06-28 Thread Konstantin Boudnik
That makes sense. Thanks for the update - I am still catching up on my emails
backed up because of the Hadoop summit.

Cos

On Tue, Jun 04, 2013 at 01:44AM, Mattmann, Chris A (398J) wrote:
> Dear Konstantin,
> 
> Thanks! The incoming Spark project is excited about the relationship
> with Bigtop that could happen here.
> 
> As for new committers, after conferring with the Spark project
> members, we would like to adopt a simple policy of having all new
> committers not add themselves to the wiki as of yet, but simply
> join the project mailing lists when they are created, and then from
> there, contribute. I and other mentors, and the Spark community are
> committed to being inclusive, so hopefully won't take too long for
> anybody to become a PPMC member/committer on the project after some
> demonstrated contributions.
> 
> Thanks for your interest and again for your kind words.
> 
> Cheers!
> 
> Chris
> 
> 
> ++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
> 
> 
> 
> 
> 
> 
> -Original Message-
> From: Konstantin Boudnik 
> Reply-To: "general@incubator.apache.org" 
> Date: Friday, May 31, 2013 12:29 PM
> To: "general@incubator.apache.org" 
> Subject: Re: [PROPOSAL] Apache Spark for the Incubator
> 
> >Great news!
> >
> >Definitely +1 (non-binding, I guess) on adding Spark to the family
> >of ASF project!
> >
> >I also express the interest to contribute to the project and move it
> >forward
> >to the graduation! Bigtop has been packaging and providing Spark as a
> >part of
> >Hadoop 1.x software stacks for some time; and hopefully would be able to
> >offer
> >it as a part of Hadoop 2.x line in the coming days.
> >
> >Dr. Konstantin Boudnik
> >  Hadoop committer
> >  BigTop PMC
> >
> >On Fri, May 31, 2013 at 06:03PM, Mattmann, Chris A (398J) wrote:
> >> Hi Folks,
> >> 
> >> I'm pleased to bring you a proposal to the Apache Incubator for the
> >>Apache
> >> Spark project: https://wiki.apache.org/incubator/SparkProposal
> >> 
> >> The work originates from the Berkeley AMPLab and through a number of
> >> industry
> >> participants, and other institutions. Spark is a framework for
> >>large-scale
> >> data 
> >> analysis on clusters, with a particular focus on low latency operations.
> >> The
> >> source code is written in Scala, and provides a number of APIs and
> >>bindings
> >> in various programming languages.
> >> 
> >> The proposal text is copied to the bottom of this email. I'm going to
> >>leave
> >> this thread open for the next week for discussion. Once it's died down,
> >> I'll
> >> call an official VOTE.
> >> 
> >> Suresh, Ross G. -- heads up -- this project may be of interest to you
> >>both
> >> and would welcome you guys as additional mentors. We currently have 3
> >> mentors
> >> committed to the project, but would love to have more. People
> >>interested in
> >> contributing should declare their interest here on the general@incubator
> >> thread
> >> and those potential contributors will be discussed by the incoming Spark
> >> community.
> >> 
> >> Questions -- let's hear em'! :)
> >> 
> >> Cheers,
> >> Chris
> >> ("Champion", incoming Apache Spark)
> >> 
> >> === Abstract ===
> >> Spark is an open source system for large-scale data analysis on
> >>clusters.
> >> 
> >> === Proposal ===
> >> Spark is an open source system for fast and flexible large-scale data
> >> analysis. Spark provides a general purpose runtime that supports
> >> low-latency execution in several forms. These include interactive
> >> exploration of very large datasets, near real-time stream processing,
> >>and
> >> ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
> >> with HDFS, HBase, Cassandra and several other storage storage layers,
>