I think
https://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence
might shed some light on the behaviour you’re seeing.
Mark
From: canan chen [mailto:ccn...@gmail.com]
Sent: June-17-15 5:57 AM
To: spark users
Subject: Intermedate stage will be cached automatically ?
Here's
Hi there,
I am looking to use Mockito to mock out some functionality while unit testing a
Spark application.
I currently have code that happily runs on a cluster, but fails when I try to
run unit tests against it, throwing a SparkException:
org.apache.spark.SparkException: Job aborted due to
I would like to work with RDD pairs of Tuple2byte[], obj, but byte[]s with
the same contents are considered as different values because their reference
values are different.
I didn't see any to pass in a custom comparer. I could convert the byte[] into
a String with an explicit charset, but
Makes sense – I suspect what you suggested should work.
However, I think the overhead between this and using `String` would be similar
enough to warrant just using `String`.
Mark
From: Sonal Goyal [mailto:sonalgoy...@gmail.com]
Sent: June-11-15 12:58 PM
To: Mark Tse
Cc: user@spark.apache.org