Hi, I have a problem, it is easy in Scala code, but I can not take the top N from RDD as RDD.
There are 10000 Student Score, ask take top 10 age, and then take top 10 from each age, the result is 100 records. The Scala code is here, but how can I do it in RDD, *for RDD.take return is Array, but other RDD.* example Scala code: import scala.util.Random case class StudentScore(age: Int, num: Int, score: Int, name: Int) val scores = for { i <- 1 to 10000 } yield { StudentScore(Random.nextInt(100), Random.nextInt(100), Random.nextInt(), Random.nextInt()) } def takeTop(scores: Seq[StudentScore], byKey: StudentScore => Int): Seq[(Int, Seq[StudentScore])] = { val groupedScore = scores.groupBy(byKey) .map{case (_, _scores) => (_scores.foldLeft(0)((acc, v) => acc + v.score), _scores)}.toSeq groupedScore.sortBy(_._1).take(10) } val topScores = for { (_, ageScores) <- takeTop(scores, _.age) (_, numScores) <- takeTop(ageScores, _.num) } yield { numScores } topScores.size -- ~Yours, Xuefeng Wu/吴雪峰 敬上