Unsubscribe

2014-10-27 Thread Ian Ferreira
unsubscribe

Is Hadoop MR now comparable with Spark?

2014-06-02 Thread Ian Ferreira
http://hortonworks.com/blog/ddm/#.U4yn3gJgfts.twitter

RE: Announcing Spark 1.0.0

2014-05-30 Thread Ian Ferreira
Congrats Sent from my Windows Phone From: Dean Wamplermailto:deanwamp...@gmail.com Sent: ‎5/‎30/‎2014 6:53 AM To: user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: Announcing Spark 1.0.0 Congratulations!! On Fri, May 30, 2014 at 5:12 AM, Patrick

Re: Debugging Spark AWS S3

2014-05-16 Thread Ian Ferreira
Did you check the executor stderr logs? On 5/16/14, 2:37 PM, Robert James srobertja...@gmail.com wrote: I have Spark code which runs beautifully when MASTER=local. When I run it with MASTER set to a spark ec2 cluster, the workers seem to run, but the results, which are supposed to be put to AWS

Re: Easy one

2014-05-07 Thread Ian Ferreira
in spark-env.sh on the workers as export SPARK_WORKER_MEMORY=4g On Tue, May 6, 2014 at 5:29 PM, Ian Ferreira ianferre...@hotmail.com wrote: Hi there, Why can¹t I seem to kick the executor memory higher? See below from EC2 deployment using m1.large And in the spark-env.sh export

Easy one

2014-05-06 Thread Ian Ferreira
Hi there, Why can¹t I seem to kick the executor memory higher? See below from EC2 deployment using m1.large And in the spark-env.sh export SPARK_MEM=6154m And in the spark context sconf.setExecutorEnv(spark.executor.memory, 4g²) Cheers - Ian

Re: Can't be built on MAC

2014-05-01 Thread Ian Ferreira
HI Zhige, I had the same issue and revert to using JDK 1.7.055 From: Zhige Xin xinzhi...@gmail.com Reply-To: user@spark.apache.org Date: Thursday, May 1, 2014 at 12:32 PM To: user@spark.apache.org Subject: Can't be built on MAC Hi dear all, When I tried to build Spark 0.9.1 on my Mac OS X

Setting the Scala version in the EC2 script?

2014-05-01 Thread Ian Ferreira
Is this possible, it is very annoying to have such a great script, but still have to manually update stuff afterwards.

Getting the following error using EC2 deployment

2014-05-01 Thread Ian Ferreira
I have a custom app that was compiled with scala 2.10.3 which I believe is what the latest spark-ec2 script installs. However running it on the master yields this cryptic error which according to the web implies incompatible jar versions. Exception in thread main java.lang.NoClassDefFoundError:

Running parallel jobs in the same driver with Futures?

2014-04-28 Thread Ian Ferreira
I recall asking about this, and I think Matei suggest it was, but is the scheduler thread safe? I am running mllib libraries as futures in the same driver using the same dataset as input and this error 14/04/28 08:29:48 ERROR TaskSchedulerImpl: Exception in statusUpdate

Failed to run count?

2014-04-23 Thread Ian Ferreira
I am getting this cryptic error running LinearRegressionwithSGD Data sample LabeledPoint(39.0, [144.0, 1521.0, 20736.0, 59319.0, 2985984.0]) 14/04/23 15:15:34 INFO SparkContext: Starting job: first at GeneralizedLinearAlgorithm.scala:121 14/04/23 15:15:34 INFO DAGScheduler: Got job 2 (first at

Adding to an RDD

2014-04-21 Thread Ian Ferreira
Feels like a silly questions, But what if I wanted to apply a map to each element in a RDD, but instead of replacing it, I wanted to add new columns of the manipulate value I.e. res0: Array[String] = Array(1 2, 1 3, 1 4, 2 1, 3 1, 4 1) Becomes res0: Array[String] = Array(1 2 2 4, 1 3 1 6,

Combining RDD's columns

2014-04-18 Thread Ian Ferreira
This may seem contrived but, suppose I wanted to create a collection of single column RDD's that contain calculated values, so I want to cache these to avoid re-calc. i.e. rdd1 = {Names] rdd2 = {Star Sign} rdd3 = {Age} Then I want to create a new virtual RDD that is a collection of these

Re: Scala vs Python performance differences

2014-04-15 Thread Ian Ferreira
This would be super useful. Thanks. On 4/15/14, 1:30 AM, Jeremy Freeman freeman.jer...@gmail.com wrote: Hi Andrew, I'm putting together some benchmarks for PySpark vs Scala. I'm focusing on ML algorithms, as I'm particularly curious about the relative performance of MLlib in Scala vs the Python

Multi-tenant?

2014-04-15 Thread Ian Ferreira
What is the support for multi-tenancy in Spark. I assume more than one driver can share the same cluster, but can a driver run two jobs in parallel?

RE: Multi-tenant?

2014-04-15 Thread Ian Ferreira
://spark.apache.org/docs/latest/job-scheduling.html, which includes scheduling concurrent jobs within the same driver. Matei On Apr 15, 2014, at 4:08 PM, Ian Ferreira ianferre...@hotmail.com wrote: What is the support for multi-tenancy in Spark. I assume more than one driver can share the same cluster

Re: Spark resilience

2014-04-14 Thread Ian Ferreira
resources, but does not affect currently-running jobs. Workers can fail and will simply cause jobs to lose their current Executors. New Workers can be added at any point. On Mon, Apr 14, 2014 at 11:00 AM, Ian Ferreira ianferre...@hotmail.com wrote: Folks, I was wondering what the failure support

Pyspark with Cython

2014-04-14 Thread Ian Ferreira
Has anyone used Cython closures with Spark? We have a large investment in Python code that we don¹t want to port to Scala. Curious about any performance issues with the interop between the Scala engine and the Cython closures. I believe it is sockets on the driver and pipe on the executors?

Re: Error when run Spark on mesos

2014-04-02 Thread Ian Ferreira
I think this is related to a known issue (regression) in 0.9.0. Try using explicit IP other than loop back. Sent from a mobile device On Apr 2, 2014, at 8:53 PM, panfei cnwe...@gmail.com wrote: any advice ? 2014-04-03 11:35 GMT+08:00 felix cnwe...@gmail.com: I deployed mesos and test