date:20150410

Re: Benchmaking col vs row similarities

2015-04-10 Thread Debasish Das

I will increase memory for the job...that will also fix it right ? On Apr 10, 2015 12:43 PM, Reza Zadeh r...@databricks.com wrote: You should pull in this PR: https://github.com/apache/spark/pull/5364 It should resolve that. It is in master. Best, Reza On Fri, Apr 10, 2015 at 8:32 AM,

Re: Benchmaking col vs row similarities

2015-04-10 Thread Burak Yavuz

Depends... The heartbeat you received happens due to GC pressure (probably due to Full GC). If you increase the memory too much, the GC's may be less frequent, but the Full GC's may take longer. Try increasing the following confs: spark.executor.heartbeatInterval

Re: Benchmaking col vs row similarities

2015-04-10 Thread Reza Zadeh

You should pull in this PR: https://github.com/apache/spark/pull/5364 It should resolve that. It is in master. Best, Reza On Fri, Apr 10, 2015 at 8:32 AM, Debasish Das debasish.da...@gmail.com wrote: Hi, I am benchmarking row vs col similarity flow on 60M x 10M matrices... Details are in

Getting outofmemory errors on spark

2015-04-10 Thread Anshul Singhle

Hi, I'm reading data stored in S3 and aggregating and storing it in Cassandra using a spark job. When I run the job with approx 3Mil records (about 3-4 GB of data) stored in text files, I get the following error: (11529/14925)15/04/10 19:32:43 INFO TaskSetManager: Starting task 11609.0 in

The $ notation for DataFrame Column

2015-04-10 Thread Justin Yip

Hello, The DataFrame documentation always uses $columnX to annotates a column. But I cannot find much information about it. Maybe I have missed something. Can anyone point me to the doc about the $, if there is any? Thanks. Justin

Re: ClassCastException when calling updateStateKey

2015-04-10 Thread Pradeep Rai

Hi Marcelo, I am not including Spark's classes. When I used the userClasspathFirst flag, I started getting those errors. Been there, done that. Removing guava classes was one of the first things I tried. I saw your replies to a similar problem from Sept.

How to use the --files arg

2015-04-10 Thread Udit Mehta

Hi, Suppose I have a command and I pass the --files arg as below: bin/spark-submit --class com.test.HelloWorld --master yarn-cluster --num-executors 8 --driver-memory 512m --executor-memory 2048m --executor-cores 4 --queue public * --files $HOME/myfile.txt* --name test_1

RE: Is the disk space in SPARK_LOCAL_DIRS cleanned up?

2015-04-10 Thread Wang, Ningjun (LNG-NPV)

Does anybody have an answer for this? Thanks Ningjun From: Wang, Ningjun (LNG-NPV) Sent: Thursday, April 02, 2015 12:14 PM To: user@spark.apache.org Subject: Is the disk space in SPARK_LOCAL_DIRS cleanned up? I set SPARK_LOCAL_DIRS to C:\temp\spark-temp. When RDDs are shuffled, spark

Re: coalesce(*, false) problem

2015-04-10 Thread Tathagata Das

Coalesce tries to reduce the number of partitions into smaller number of partitions, without moving the data around (as much as possible). Since most of received data is in a few machines (those running receivers), coallesce just makes bigger merged partitions in those. Without coalesce Machine

Re: Streaming anomaly detection using ARIMA

2015-04-10 Thread Corey Nolet

Sean, I do agree about the inside out parallelization but my curiosity is mostly in what type of performance I can expect to have by piping out to R. I'm playing with Twitter's new Anomaly Detection library btw, this could be a solution if I can get the calls to R to stand up to the massive

DataFrame column name restriction

2015-04-10 Thread Justin Yip

Hello, Are there any restriction in the column name? I tried to use ., but sqlContext.sql cannot find the column. I would guess that . is tricky as this affects accessing StructType, but are there any more restriction on column name? scala case class A(a: Int) defined class A scala

foreach going in infinite loop

2015-04-10 Thread Jeetendra Gangele

Hi All I am running below code before calling foreach i did 3 transformation using MapTopair. In my application there are 16 executed but no executed running anything. rddWithscore.foreach(new VoidFunctionTuple2VendorRecord,MapInteger,Double() { @Override public void call(Tuple2VendorRecord,

Re: Benchmaking col vs row similarities

Re: Benchmaking col vs row similarities

Re: Benchmaking col vs row similarities

Getting outofmemory errors on spark

The $ notation for DataFrame Column

Re: ClassCastException when calling updateStateKey

How to use the --files arg

RE: Is the disk space in SPARK_LOCAL_DIRS cleanned up?

Re: coalesce(*, false) problem

Re: Streaming anomaly detection using ARIMA

DataFrame column name restriction

foreach going in infinite loop

12 matches

Site Navigation

Mail list logo

Footer information