RDD distribution

2016-02-10 Thread daze5112
Hi im trying to improve the performance of some code im running but have noticed that my distribution of my RDD across executors isn't exactly even (see pic below). Im using yarn and kicking it off with 11 executors. Not sure how to get a more even spread or if this is normal. thanks

serialization error

2015-10-19 Thread daze5112
Hi having some problems with the piece of code I inherited: the error messages i get are: the code runs if i exclude the following line: any help appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/serialization-error-tp25131.html Sent

graphx running time

2015-04-06 Thread daze5112
Hi im currently using graphx for some analysis and have come into a bit of a hurdle. If use my test dataset of 20 nodes and about 30 links it runs really quickly. I have two other data sets i use one of 10million links and one of 20 million. When i create my graphs seems to work okay and i can get

reading a csv dynamically

2015-01-21 Thread daze5112
Hi all, im currently reading a csv file shich has the following format: (String, Double, Double,Double, Double, Double) and can map this no problems using: val dataRDD = sc.textFile(file.csv). map(_.split (,)). map(a= (Array(a(0)), Array(a(1).toDouble, a(2).toDouble), a(3),

Re: counting degrees graphx

2014-05-26 Thread daze5112
Excellent thanks Ankur, looks like what im looking for Only one problem the line val dists = initDists.pregel[DistanceMap](Map())(vprog, sendMsg, mergeMsg) produces an error Job aborted: Task 268.0:5 had a not serializable result: java.io.NotSerializableException: