Hi all,
I only have one stage which is mapToPair and inside the function, I have a
for loop which will do about 133433 times.
But then it becomes slow, when I replace 133433 with just 133, it works very
fast.
But I think this is just a simple operation even in normal Java.
You can look at the
*Problem Description*:
The program running in stand-alone spark cluster (1 master, 6 workers with
8g ram and 2 cores).
Input: a 468MB file with 133433 records stored in HDFS.
Output: just 2MB file will stored in HDFS
The program has two map operations and one reduceByKey operation.
Finally I
Sure, the code is very simple. I think u guys can understand from the main
function.
public class Test1 {
public static double[][] createBroadcastPoints(String localPointPath,
int
row, int col) throws IOException{
BufferedReader br = RAWF.reader(localPointPath);
Hi All,
The variable I need to broadcast is just 468 MB.
When broadcasting, it just “stop” at here:
*
15/05/20 11:36:14 INFO Configuration.deprecation: mapred.tip.id is
deprecated. Instead, use mapreduce.task.id
15/05/20 11:36:14 INFO Configuration.deprecation: mapred.task.id is
deprecated.
The variable I need to broadcast is just 468 MB.
When broadcasting, it just “stop” at here:
*15/05/20 11:36:14 INFO Configuration.deprecation: mapred.tip.id is
deprecated. Instead, use mapreduce.task.id
15/05/20 11:36:14 INFO Configuration.deprecation: mapred.task.id is
deprecated.
Sorry, bt how does that work?
Can u specify the detail about the problem?
On 20 May 2015 at 21:32, oubrik [via Apache Spark User List]
ml-node+s1001560n2295...@n3.nabble.com wrote:
hi,
try like thiis
DataFrame df = sqlContext.load(com.databricks.spark.csv, options);
df.select(year,