Hello, I am currently using MLlib ALS to process a large volume of data, about 1.2 billion Rating(userId, productId, rates) triples. The dataset was sepatated into 4000 partitions for parallized computation on our yarn clusters.
I encountered this error "Errors communicating with MapOutputTracker", when trying to get the prediciton rates [model.predict(userproducts)] after iterations. val predictions = model.predict(usersProducts).map{ case Rating(user, product, rate) => ((user, product), rate) } I tried to separate the iteration process and the process of culating prediction rates value by storing the two feature matirces into file system first; and the loading them for prediction. This time, the error occurred at the stage of loading userFeatures. userfData: userId:[0.3,0.5,0.002,.....] val userfTuple =userfData.map{ case (line) => { val arr = line.split(splitmark_1) val featureArr = arr(1).split(splitmark_2) (arr(0),featureArr) } } Here is part of the log: ---------------------------------------------------------------------------------------------------------------------------------------------------------------- 14-05-21 14:37:17 WARN [Result resolver thread-0] TaskSetManager: Loss was due to org.apache.spark.SparkException org.apache.spark.SparkException: Error communicating with MapOutputTracker at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:79) at org.apache.spark.MapOutputTracker.getServerStatuses(MapOutputTracker.scala:126) at org.apache.spark.BlockStoreShuffleFetcher.fetch(BlockStoreShuffleFetcher.scala:43) at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:61) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:244) at org.apache.spark.rdd.RDD.iterator(RDD.scala:235) at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:90) at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:89) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:727) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:723) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:884) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:884) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:220) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:184) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) ------------------------------------------------------------------------------------------------------------------------------------------------------------------- I have tried several methods to solve this problem, one way was to decrease the number of partitions(from 4000 to 3000), another was to increase the memory of masters. Both worked, but it is still vital to track the underneath causes there, right? Could anyone help me to check this problem? Thanks a lot. Sue Cai -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-ALS-Errors-communicating-with-MapOutputTracker-tp6740.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.