I am seeing this same issue with Spark 1.0.1 (tried with file:// for local file
) :
scala val lines = sc.textFile(file:///home/monir/.bashrc)
lines: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at
console:12
scala val linecount = lines.count
of the worker nodes have the file in question in their local file system.
Still fairly new to Spark so bear with me if this is easily tunable by some
config params.
Bests,
-Monir
-Original Message-
From: Mozumder, Monir
Sent: Thursday, September 11, 2014 12:15 PM
To: user@spark.apache.org
I have this 2-node cluster setup, where each node has 4-cores.
MASTER
(Worker-on-master) (Worker-on-node1)
(slaves(master,node1))
SPARK_WORKER_INSTANCES=1
I am trying to understand Spark's parallelize behavior. The sparkPi example has
this code:
val slices =
An on-list follow up: http://prof.ict.ac.cn/BigDataBench/#Benchmarks looks
promising as it has spark as one of the platforms used.
Bests,
-Monir
From: Mozumder, Monir
Sent: Monday, August 11, 2014 7:18 PM
To: user@spark.apache.org
Subject: Benchmark on physical Spark cluster
I am trying