Re: Benchmaking col vs row similarities

2015-04-10 Thread Debasish Das
I will increase memory for the job...that will also fix it right ? On Apr 10, 2015 12:43 PM, Reza Zadeh r...@databricks.com wrote: You should pull in this PR: https://github.com/apache/spark/pull/5364 It should resolve that. It is in master. Best, Reza On Fri, Apr 10, 2015 at 8:32 AM,

Re: Benchmaking col vs row similarities

2015-04-10 Thread Burak Yavuz
Depends... The heartbeat you received happens due to GC pressure (probably due to Full GC). If you increase the memory too much, the GC's may be less frequent, but the Full GC's may take longer. Try increasing the following confs: spark.executor.heartbeatInterval

Re: Benchmaking col vs row similarities

2015-04-10 Thread Reza Zadeh
You should pull in this PR: https://github.com/apache/spark/pull/5364 It should resolve that. It is in master. Best, Reza On Fri, Apr 10, 2015 at 8:32 AM, Debasish Das debasish.da...@gmail.com wrote: Hi, I am benchmarking row vs col similarity flow on 60M x 10M matrices... Details are in

Getting outofmemory errors on spark

2015-04-10 Thread Anshul Singhle
Hi, I'm reading data stored in S3 and aggregating and storing it in Cassandra using a spark job. When I run the job with approx 3Mil records (about 3-4 GB of data) stored in text files, I get the following error: (11529/14925)15/04/10 19:32:43 INFO TaskSetManager: Starting task 11609.0 in

The $ notation for DataFrame Column

2015-04-10 Thread Justin Yip
Hello, The DataFrame documentation always uses $columnX to annotates a column. But I cannot find much information about it. Maybe I have missed something. Can anyone point me to the doc about the $, if there is any? Thanks. Justin

Re: ClassCastException when calling updateStateKey

2015-04-10 Thread Pradeep Rai
Hi Marcelo, I am not including Spark's classes. When I used the userClasspathFirst flag, I started getting those errors. Been there, done that. Removing guava classes was one of the first things I tried. I saw your replies to a similar problem from Sept.

How to use the --files arg

2015-04-10 Thread Udit Mehta
Hi, Suppose I have a command and I pass the --files arg as below: bin/spark-submit --class com.test.HelloWorld --master yarn-cluster --num-executors 8 --driver-memory 512m --executor-memory 2048m --executor-cores 4 --queue public * --files $HOME/myfile.txt* --name test_1

RE: Is the disk space in SPARK_LOCAL_DIRS cleanned up?

2015-04-10 Thread Wang, Ningjun (LNG-NPV)
Does anybody have an answer for this? Thanks Ningjun From: Wang, Ningjun (LNG-NPV) Sent: Thursday, April 02, 2015 12:14 PM To: user@spark.apache.org Subject: Is the disk space in SPARK_LOCAL_DIRS cleanned up? I set SPARK_LOCAL_DIRS to C:\temp\spark-temp. When RDDs are shuffled, spark

Re: coalesce(*, false) problem

2015-04-10 Thread Tathagata Das
Coalesce tries to reduce the number of partitions into smaller number of partitions, without moving the data around (as much as possible). Since most of received data is in a few machines (those running receivers), coallesce just makes bigger merged partitions in those. Without coalesce Machine

Re: Streaming anomaly detection using ARIMA

2015-04-10 Thread Corey Nolet
Sean, I do agree about the inside out parallelization but my curiosity is mostly in what type of performance I can expect to have by piping out to R. I'm playing with Twitter's new Anomaly Detection library btw, this could be a solution if I can get the calls to R to stand up to the massive

DataFrame column name restriction

2015-04-10 Thread Justin Yip
Hello, Are there any restriction in the column name? I tried to use ., but sqlContext.sql cannot find the column. I would guess that . is tricky as this affects accessing StructType, but are there any more restriction on column name? scala case class A(a: Int) defined class A scala

foreach going in infinite loop

2015-04-10 Thread Jeetendra Gangele
Hi All I am running below code before calling foreach i did 3 transformation using MapTopair. In my application there are 16 executed but no executed running anything. rddWithscore.foreach(new VoidFunctionTuple2VendorRecord,MapInteger,Double() { @Override public void call(Tuple2VendorRecord,