Standalone scheduling - document inconsistent

2014-11-27 Thread Praveen Sripati
Hi, There is a bit of inconsistent in the document. Which is the correct statement? `http://spark.apache.org/docs/latest/spark-standalone.html` says The standalone cluster mode currently only supports a simple FIFO scheduler across applications. while

[mllib] Which is the correct package to add a new algorithm?

2014-11-27 Thread Yu Ishikawa
Hi all, Spark ML alpha version exists in the current master branch on Github. If we want to add new machine learning algorithms or to modify algorithms which already exists, which package should we implement them at org.apache.spark.mllib or org.apache.spark.ml? thanks, Yu - -- Yu

Re: Time taken to merge Spark PR's?

2014-11-27 Thread Nicholas Chammas
1.1.1 was just released, and 1.2 is close to a release. That, plus Thanksgiving in the US (where most Spark committers AFAIK are located), probably means a temporary lull in committer activity on non-critical items should be expected. On Mon Nov 24 2014 at 9:33:27 AM York, Brennon

Re: Standalone scheduling - document inconsistent

2014-11-27 Thread Reynold Xin
The 1st was referring to different Spark applications connecting to the standalone cluster manager, and the 2nd one was referring to within a single Spark application, the jobs can be scheduled using a fair scheduler. On Thu, Nov 27, 2014 at 3:47 AM, Praveen Sripati praveensrip...@gmail.com

Creating a SchemaRDD from an existing API

2014-11-27 Thread Niranda Perera
Hi, I am evaluating Spark for an analytic component where we do batch processing of data using SQL. So, I am particularly interested in Spark SQL and in creating a SchemaRDD from an existing API [1]. This API exposes elements in a database as datasources. Using the methods allowed by this data