date:20141127

Creating a SchemaRDD from an existing API

2014-11-27 Thread Niranda Perera

Hi,

I am evaluating Spark for an analytic component where we do batch
processing of data using SQL.

So, I am particularly interested in Spark SQL and in creating a SchemaRDD
from an existing API [1].

This API exposes elements in a database as datasources. Using the methods
allowed by this data source, we can access and edit data.

So, I want to create a custom SchemaRDD using the methods and provisions of
this API. I tried going through Spark documentation and the Java Docs, but
unfortunately, I was unable to come to a final conclusion if this was
actually possible.

I would like to ask the Spark Devs,
1. As of the current Spark release, can we make a custom SchemaRDD?
2. What is the extension point to a custom SchemaRDD? or are there
particular interfaces?
3. Could you please point me the specific docs regarding this matter?

Your help in this regard is highly appreciated.

Cheers

[1]
https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics

-- 
*Niranda Perera*
Software Engineer, WSO2 Inc.
Mobile: +94-71-554-8430
Twitter: @n1r44

Re: Standalone scheduling - document inconsistent

2014-11-27 Thread Reynold Xin

The 1st was referring to different Spark applications connecting to the
standalone cluster manager, and the 2nd one was referring to within a
single Spark application, the jobs can be scheduled using a fair scheduler.

On Thu, Nov 27, 2014 at 3:47 AM, Praveen Sripati 
wrote:

> Hi,
>
> There is a bit of inconsistent in the document. Which is the correct
> statement?
>
> `http://spark.apache.org/docs/latest/spark-standalone.html` says
>
> The standalone cluster mode currently only supports a simple FIFO scheduler
> across applications.
>
> while `http://spark.apache.org/docs/latest/job-scheduling.html` says
>
> Starting in Spark 0.8, it is also possible to configure fair sharing
> between jobs.
>
> Thanks,
> Praveen
>

Re: Time taken to merge Spark PR's?

2014-11-27 Thread Nicholas Chammas

1.1.1 was just released, and 1.2 is close to a release. That, plus
Thanksgiving in the US (where most Spark committers AFAIK are located),
probably means a temporary lull in committer activity on non-critical items
should be expected.

On Mon Nov 24 2014 at 9:33:27 AM York, Brennon 
wrote:

> All, I just finished the SPARK-3182 feature and, for me, its raised a
> larger question of how to ensure patches that are awaiting review get noted
> / tagged upstream. Since I don’t have access writes to assign the above
> issue to myself I can’t tag it as “In Progress” like Matei mentioned so, at
> this rate, its just going to sit in the queue. Did I miss something on the
> “Contributing to Spark” page, is there a ‘tribal-knowledge’ way to let a
> set of commiters know that patches are ready, or is it that everyone is
> already too slammed and we’re all waiting diligently? :) Just trying to get
> some clarity on this topic, thanks!
> 
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed.  If the reader of this message is not the
> intended recipient, you are hereby notified that any review,
> retransmission, dissemination, distribution, copying or other use of, or
> taking of any action in reliance upon this information is strictly
> prohibited. If you have received this communication in error, please
> contact the sender and delete the material from your computer.
>

[mllib] Which is the correct package to add a new algorithm?

2014-11-27 Thread Yu Ishikawa

Hi all, 

Spark ML alpha version exists in the current master branch on Github.
If we want to add new machine learning algorithms or to modify algorithms
which already exists, 
which package should we implement them at org.apache.spark.mllib or
org.apache.spark.ml?

thanks,
Yu



-
-- Yu Ishikawa
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Which-is-the-correct-package-to-add-a-new-algorithm-tp9540.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Standalone scheduling - document inconsistent

2014-11-27 Thread Praveen Sripati

Hi,

There is a bit of inconsistent in the document. Which is the correct
statement?

`http://spark.apache.org/docs/latest/spark-standalone.html` says

The standalone cluster mode currently only supports a simple FIFO scheduler
across applications.

while `http://spark.apache.org/docs/latest/job-scheduling.html` says

Starting in Spark 0.8, it is also possible to configure fair sharing
between jobs.

Thanks,
Praveen

Creating a SchemaRDD from an existing API

Re: Standalone scheduling - document inconsistent

Re: Time taken to merge Spark PR's?

[mllib] Which is the correct package to add a new algorithm?

Standalone scheduling - document inconsistent

5 matches

Site Navigation

Mail list logo

Footer information