Using mapPartitions and passing the big index object as a parameter to it was
not the best option, given the size of the big object and my RAM. The
workers died before starting the actual computation.
Anyway, creating a singleton object worked for me:
Hi,
I have a simple accumulator that needs to be passed to a foo() function
inside a map job:
val myCounter = sc.accumulator(0)
val myRDD = sc.textFile(inputpath) // :spark.RDD[String]
myRDD.flatMap(line = foo(line))
def foo(line: String) = {
myCounter += 1 // line throwing
Hi all,
I tried to use accumulators without any success so far.
My code is simple:
val sc = new SparkContext(conf)
val accum = sc.accumulator(0)
val partialStats = sc.textFile(f.getAbsolutePath())
.map(line = { val key = line.split(\t).head; (key , line)} )
There is for sure a bug in the Accumulators code.
More specifically, the following code works well as expected:
def main(args: Array[String]) {
val conf = new SparkConf().setAppName(EL LBP SPARK)
val sc = new SparkContext(conf)
val accum = sc.accumulator(0)
Sorry, I forgot to say that this gives the above error just when run on a
cluster, not in local mode.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Bug-in-Accumulators-tp17263p17277.html
Sent from the Apache Spark User List mailing list archive at
Hi Akhil,
Please see this related message.
http://apache-spark-user-list.1001560.n3.nabble.com/Bug-in-Accumulators-td17263.html
I am curious if this works for you also.
--
View this message in context:
I am also using spark 1.1.0 and I ran it on a cluster of nodes (it works if I
run it in local mode! )
If I put the accumulator inside the for loop, everything will work fine. I
guess the bug is that an accumulator can be applied to JUST one RDD.
Still another undocumented 'feature' of Spark
One month later, the same problem. I think that someone (e.g. inventors of
Spark) should show us a big example of how to use accumulators. I can start
telling that we need to see an example of the following form:
val accum = sc.accumulator(0)
sc.parallelize(Array(1, 2, 3, 4)).map(x =
Hi Sowen,
You're right, that example works, but look what example does not work for
me:
object Main {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName(name)
val sc = new SparkContext(conf)
val accum = sc.accumulator(0)
for (i - 1 to 10) {
val
Here is the first error I get at the executors:
15/01/26 17:27:04 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception
in thread Thread[handle-message-executor-16,5,main]
java.lang.StackOverflowError
at
In case someone has the same problem:
The singleton hack works for me sometimes, sometimes it doesn't in spark
1.2.0, that is, sometimes I get nullpointerexception. Anyway, if you really
need to work with big indexes and you want to have the smallest amount of
communication between master and
The singleton hack works very different in spark 1.2.0 (it does not work if
the program has multiple map-reduce jobs in the same program). I guess there
should be an official documentation on how to have each machine/node do an
init step locally before executing any other instructions (e.g.
Hi,
Please help me with this problem. I would really appreciate your help !
I am using spark 1.2.0. I have a map-reduce job written in spark in the
following way:
val sumW = splittedTrainingDataRDD.map(localTrainingData = LocalSGD(w,
localTrainingData, numeratorCtEta, numitorCtEta, regularizer,
The singleton hack works very different in spark 1.2.0 (it does not work if
the program has multiple map-reduce jobs in the same program). I guess there
should be an official documentation on how to have each machine/node do an
init step locally before executing any other instructions (e.g.
Hi,
I am running a program that executes map-reduce jobs in a loop. The first
time the loop runs, everything is ok. After that, it starts giving the
following error, first it gives it for one task, then for more tasks and
eventually the entire program fails:
15/01/26 01:41:25 WARN
I am trying to use it, but without success. Any sample code that works with
Spark would be highly appreciated. :)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-memcached-with-spark-tp13409p21103.html
Sent from the Apache Spark User List mailing
I was tried using reduceByKey, without success.
I also tried this: rdd.persist(MEMORY_AND_DISK).flatMap(...).reduceByKey .
However, I got the same error as before, namely the error described here:
Solved, is SPARK_PID_DIR from spark-env.sh. Changing this directory from /tmp
to smthg different actually changed the error that I got, now showing where
the actual error was coming from (a null pointer in my program). The first
error was not helpful at all though.
--
View this message in
Hi all,
I'm using spark 1.3.1 and ran the following code:
sc.textFile(path)
.map(line = (getEntId(line), line))
.persist(StorageLevel.MEMORY_AND_DISK)
.groupByKey
.flatMap(x = func(x))
.reduceByKey((a,b) = (a + b).toShort)
I get the following error in
Hi,
Is there any way to force the output RDD of a flatMap op to be stored in
both memory and disk as it is computed ? My RAM would not be able to fit the
entire output of flatMap, so it really needs to starts using disk after the
RAM gets full. I didn't find any way to force this.
Also, what
Dear all,
Does anyone know how can I force Spark to use only the disk when doing a
simple flatMap(..).groupByKey.reduce(_ + _) ? Thank you!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/map-reduce-only-with-disk-tp23102.html
Sent from the Apache Spark
Hi,
In my app, I have a Params scala object that keeps all the specific
(hyper)parameters of my program. This object is read in each worker. I would
like to be able to pass specific values of the Params' fields in the command
line. One way would be to simply update all the fields of the Params
Hi,
I've seen in a few cases that when calling a reduce operation, it is
executed sequentially rather than in parallel.
For example, I have the following code that performs a simple word counting
on very big data using hashmaps (instead of (word,1) pairs that would
overflow the memory at
23 matches
Mail list logo