Hi im trying to improve the performance of some code im running but have
noticed that my distribution of my RDD across executors isn't exactly even
(see pic below). Im using yarn and kicking it off with 11 executors. Not
sure how to get a more even spread or if this is normal. thanks
Hi having some problems with the piece of code I inherited:
the error messages i get are:
the code runs if i exclude the following line:
any help appreciated.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/serialization-error-tp25131.html
Sent
Hi im currently using graphx for some analysis and have come into a bit of a
hurdle. If use my test dataset of 20 nodes and about 30 links it runs really
quickly. I have two other data sets i use one of 10million links and one of
20 million. When i create my graphs seems to work okay and i can get
Hi all, im currently reading a csv file shich has the following format:
(String, Double, Double,Double, Double, Double)
and can map this no problems using:
val dataRDD = sc.textFile(file.csv).
map(_.split (,)).
map(a= (Array(a(0)), Array(a(1).toDouble, a(2).toDouble), a(3),
Excellent thanks Ankur, looks like what im looking for Only one problem the
line
val dists = initDists.pregel[DistanceMap](Map())(vprog, sendMsg, mergeMsg)
produces an error
Job aborted: Task 268.0:5 had a not serializable result:
java.io.NotSerializableException: