Hi, I am trying to rewrite my program to use dataFrames, and I see that I can perform a mapPartitions and a foreachPartition, but can I perform a partitionBy/set a partitioner? Or is there some other way to make my data land in the right partition for *Partition to use? (I see that PartitionBy is only available on pairRDD's, this might have something to with it..)
I am using the spark master branch. The error: [error] /home/th/spark-1.5.0/spark/IBM_ARL_teraSort_v4-01/src/main/scala/IBM_ARL_teraSort.scala:107: value partitionBy is not a member of org.apache.spark.sql.DataFrame Thanks, Tom Hubregtsen -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/PartitionBy-Partitioner-for-dataFrames-tp23420.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org