Spark-Kernel naming & mentors

2015-11-19 Thread David Fallside
There seems to be a consensus here that the Spark-Kernel name should change, and
so I think the Spark-Kernel team can generate a new name that does not contain
"Spark" or "Kernel". This will ensure it is not perceived as necessarily being a
core part of Spark or as having a primary association with Jupyter or Notebooks.
It also sounds like we should create a new name before entering incubation to
minimize expectations during incubation and to ensure nothing gets baked into
class hierarchies and the like.
With regard mentors, the project would be very well served by the three people
who have offered -- Julien, Hitesh, and Reynold -- thank you!
David

Re: [DISCUSS] Spark-Kernel Incubator Proposal

2015-11-13 Thread David Fallside
Hi Taylor, I don't know the Spark community's opinion on the "outright vs
subproject" issue, although I have told a couple people in that community about
the proposal and have posted an FYI to the spark-dev list. From a technical
perspective, Spark-Kernel mainly uses public Spark APIs (except for some SparkR
usage,
https://github.com/ibm-et/spark-kernel/blob/master/sparkr-interpreter/src/main/resources/README.md)
and so I guess the answer could go either way depending on the Spark community.
Thanks,
David

> On November 12, 2015 at 8:05 PM "P. Taylor Goetz" <ptgo...@gmail.com> wrote:
>
>
> Just a quick (or maybe not :) ) question...
>
> Given the tight coupling to the Apache Spark project, were there any
> considerations or discussions with the Spark community regarding including the
> Spark-Kernel functionality outright in Spark, or the possibility of becoming a
> subproject?
>
> I'm just curious. I don't think an answer one way or another would necessarily
> block incubation.
>
> -Taylor
>
> > On Nov 12, 2015, at 7:17 PM, da...@fallside.com wrote:
> >
> > Hello, we would like to start a discussion on accepting the Spark-Kernel,
> > a mechanism for applications to interactively and remotely access Apache
> > Spark, into the Apache Incubator.
> >
> > The proposal is available online at
> > https://wiki.apache.org/incubator/SparkKernelProposal, and it is appended
> > to this email.
> >
> > We are looking for additional mentors to help with this project, and we
> > would much appreciate your guidance and advice.
> >
> > Thank-you in advance,
> > David Fallside
> >
> >
> >
> > = Spark-Kernel Proposal =
> >
> > == Abstract ==
> > Spark-Kernel provides applications with a mechanism to interactively and
> > remotely access Apache Spark.
> >
> > == Proposal ==
> > The Spark-Kernel enables interactive applications to access Apache Spark
> > clusters. More specifically:
> > * Applications can send code-snippets and libraries for execution by Spark
> > * Applications can be deployed separately from Spark clusters and
> > communicate with the Spark-Kernel using the provided Spark-Kernel client
> > * Execution results and streaming data can be sent back to calling
> > applications
> > * Applications no longer have to be network connected to the workers on a
> > Spark cluster because the Spark-Kernel acts as each application’s proxy
> > * Work has started on enabling Spark-Kernel to support languages in
> > addition to Scala, namely Python (with PySpark), R (with SparkR), and SQL
> > (with SparkSQL)
> >
> > == Background & Rationale ==
> > Apache Spark provides applications with a fast and general purpose
> > distributed computing engine that supports static and streaming data,
> > tabular and graph representations of data, and an extensive library of
> > machine learning libraries. Consequently, a wide variety of applications
> > will be written for Spark and there will be interactive applications that
> > require relatively frequent function evaluations, and batch-oriented
> > applications that require one-shot or only occasional evaluation.
> >
> > Apache Spark provides two mechanisms for applications to connect with
> > Spark. The primary mechanism launches applications on Spark clusters using
> > spark-submit
> > (http://spark.apache.org/docs/latest/submitting-applications.html); this
> > requires developers to bundle their application code plus any dependencies
> > into JAR files, and then submit them to Spark. A second mechanism is an
> > ODBC/JDBC API
> > (http://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine)
> > which enables applications to issue SQL queries against SparkSQL.
> >
> > Our experience when developing interactive applications, such as analytic
> > applications and Jupyter Notebooks, to run against Spark was that the
> > spark-submit mechanism was overly cumbersome and slow (requiring JAR
> > creation and forking processes to run spark-submit), and the SQL interface
> > was too limiting and did not offer easy access to components other than
> > SparkSQL, such as streaming. The most promising mechanism provided by
> > Apache Spark was the command-line shell
> > (http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell)
> > which enabled us to execute code snippets and dynamically control the
> > tasks submitted to a Spark cluster. Spark does not provide the
> > command-line shell as a consumable service but it provided us with the
> > starting point from which we d