The stage slow when I have for loop inside (Java)

2015-05-25 Thread allanjie
Hi all, I only have one stage which is mapToPair and inside the function, I have a for loop which will do about 133433 times. But then it becomes slow, when I replace 133433 with just 133, it works very fast. But I think this is just a simple operation even in normal Java. You can look at the

Spark dramatically slow when I add saveAsTextFile

2015-05-24 Thread allanjie
*Problem Description*: The program running in stand-alone spark cluster (1 master, 6 workers with 8g ram and 2 cores). Input: a 468MB file with 133433 records stored in HDFS. Output: just 2MB file will stored in HDFS The program has two map operations and one reduceByKey operation. Finally I

Re: java program got Stuck at broadcasting

2015-05-21 Thread allanjie
Sure, the code is very simple. I think u guys can understand from the main function. public class Test1 { public static double[][] createBroadcastPoints(String localPointPath, int row, int col) throws IOException{ BufferedReader br = RAWF.reader(localPointPath);

java program Get Stuck at broadcasting

2015-05-20 Thread allanjie
​Hi All, The variable I need to broadcast is just 468 MB. When broadcasting, it just “stop” at here: * 15/05/20 11:36:14 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 15/05/20 11:36:14 INFO Configuration.deprecation: mapred.task.id is deprecated.

java program got Stuck at broadcasting

2015-05-20 Thread allanjie
The variable I need to broadcast is just 468 MB. When broadcasting, it just “stop” at here: *15/05/20 11:36:14 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 15/05/20 11:36:14 INFO Configuration.deprecation: mapred.task.id is deprecated.

Re: save column values of DataFrame to text file

2015-05-20 Thread allanjie
Sorry, bt how does that work? Can u specify the detail about the problem? On 20 May 2015 at 21:32, oubrik [via Apache Spark User List] ml-node+s1001560n2295...@n3.nabble.com wrote: hi, try like thiis DataFrame df = sqlContext.load(com.databricks.spark.csv, options); df.select(year,