Batch prediciton for ALS

2015-02-10 Thread Debasish Das
Hi, Will it be possible to merge this PR to 1.3 ? https://github.com/apache/spark/pull/3098 The batch prediction API in ALS will be useful for us who want to cross validate on prec@k and MAP... Thanks. Deb

Build spark failed with maven

2015-02-10 Thread Yi Tian
Hi, all I got an ERROR when I build spark master branch with maven (commit: |2d1e916730492f5d61b97da6c483d3223ca44315|) |[INFO] [INFO] [INFO] Building Spark Project Catalyst 1.3.0-SNAPSHOT [INFO]

Re: Keep or remove Debian packaging in Spark?

2015-02-10 Thread jay vyas
@patrick @nate good idea, might as well join forces... right now in bigtop we already have - packaging of both deb and rpm versions of spark in bigtop, + - puppet recipes which work for standalone deployment, + - curation of e2e vagrant tests + bigpetstore-spark, for automated testing spark in

R: Powered by Spark: Concur

2015-02-10 Thread Paolo Platter
Thank you! Paolo Inviata dal mio Windows Phone Da: Patrick Wendellmailto:pwend...@gmail.com Inviato: ‎10/‎02/‎2015 08:59 A: Paolo Plattermailto:paolo.plat...@agilelab.it Cc: Denny Leemailto:denny.g@gmail.com; Matei Zahariamailto:matei.zaha...@gmail.com;

Re: Powered by Spark: Concur

2015-02-10 Thread Patrick Wendell
Thanks Paolo - I've fixed it. On Mon, Feb 9, 2015 at 11:10 PM, Paolo Platter paolo.plat...@agilelab.it wrote: Hi, I checked the powered by wiki too and Agile Labs should be Agile Lab. The link is wrong too, it should be www.agilelab.it. The description is correct. Thanks a lot Paolo

Spark On HPC Podcast

2015-02-10 Thread Brock Palen
Sorry to pollute the list. I am one half the HPC podcast www.rce-cast.com and we are looking to feature Spark on the show. We are looking for a developer or two who can answer questions to educate the research community about Spark. Please contact me off list. It takes about an hour over the

Re: Unit tests

2015-02-10 Thread Iulian Dragoș
Thank, Josh, I missed that PR. On Mon, Feb 9, 2015 at 7:45 PM, Josh Rosen rosenvi...@gmail.com wrote: Hi Iulian, I think the AkakUtilsSuite failure that you observed has been fixed in https://issues.apache.org/jira/browse/SPARK-5548 / https://github.com/apache/spark/pull/4343 On February

FYI: Prof John Canny is giving a talk on Machine Learning at the limit in SF Big Analytics Meetup

2015-02-10 Thread Chester Chen
Just in case you are in San Francisco, we are having a meetup by Prof John Canny http://www.meetup.com/SF-Big-Analytics/events/220427049/ Chester

new committer criteria

2015-02-10 Thread Imran Rashid
Hi all, We've been considering changing criteria for being a committer ( http://s.apache.org/VFw), but I don't think there are any conclusions yet. I had proposed eliminating (or at least weakening) this requirement: ...have contributed at least one major component where they have taken an

Re: renaming SchemaRDD - DataFrame

2015-02-10 Thread Koert Kuipers
thanks matei its good to know i can create them like that reynold, yeah somehow the words sql gets me going :) sorry... yeah agreed that you need new transformations to preserve the schema info. i misunderstood and thought i had to implement the bunch but that is clearly not necessary as matei

Re: renaming SchemaRDD - DataFrame

2015-02-10 Thread Reynold Xin
Koert, Don't get too hang up on the name SQL. This is exactly what you want: a collection with record-like objects with field names and runtime types. Almost all of the 40 methods are transformations for structured data, such as aggregation on a field, or filtering on a field. If all you have is

Re: renaming SchemaRDD - DataFrame

2015-02-10 Thread Matei Zaharia
You're not really supposed to subclass DataFrame, instead you can make it from an RDD of Rows and a schema (e.g. with SQLContext.applySchema). Actually the Spark SQL data source API supports that too (org.apache.spark.sql.sources). Think of DataFrame as a container for structured data, not as a

Re: renaming SchemaRDD - DataFrame

2015-02-10 Thread Reynold Xin
It's a good point. I will update the documentation to say that this is not meant to be subclassed externally. On Tue, Feb 10, 2015 at 12:10 PM, Koert Kuipers ko...@tresata.com wrote: thanks matei its good to know i can create them like that reynold, yeah somehow the words sql gets me going

Re: renaming SchemaRDD - DataFrame

2015-02-10 Thread Koert Kuipers
so i understand the success or spark.sql. besides the fact that anything with the words SQL in its name will have thousands of developers running towards it because of the familiarity, there is also a genuine need for a generic RDD that holds record-like objects, with field names and runtime

Spark Summit East - March 18-19 - NYC

2015-02-10 Thread Scott walent
The inaugural Spark Summit East, an event to bring the Apache Spark community together, will be in New York City on March 18, 2015. We are excited about the growth of Spark and to bring the event to the east coast. At Spark Summit East you can look forward to hearing from Matei Zaharia,

RE: Using CUDA within Spark / boosting linear algebra

2015-02-10 Thread Ulanov, Alexander
Thanks, Evan! It seems that ticket was marked as duplicate though the original one discusses slightly different topic. I was able to link netlib with MKL from BIDMat binaries. Indeed, MKL is statically linked inside a 60MB library. |A*B size | BIDMat MKL | Breeze+Netlib-MKL from BIDMat|