Ah I got it - Seq[(Int, Float)] is actually represented as Seq[Row] (seq of struct type) internally.
So a further extraction is required, e.g. row => row.getSeq[Row](1).map { r => r.getInt(0) } On Wed, 6 Apr 2016 at 13:35 Nick Pentreath <nick.pentre...@gmail.com> wrote: > Hi there, > > In writing some tests for a PR I'm working on, with a more complex array > type in a DF, I ran into this issue (running off latest master). > > Any thoughts? > > *// create DF with a column of Array[(Int, Double)]* > val df = sc.parallelize(Seq( > (0, Array((1, 6.0), (1, 4.0))), > (1, Array((1, 3.0), (2, 1.0))), > (2, Array((3, 3.0), (4, 6.0)))) > ).toDF("id", "predictions") > > *// extract the field from the Row, and use map to extract first element > of tuple* > *// the type of RDD appears correct* > scala> df.rdd.map { row => row.getSeq[(Int, Double)](1).map(_._1) } > res14: org.apache.spark.rdd.RDD[Seq[Int]] = MapPartitionsRDD[32] at map at > <console>:27 > > *// however, calling collect on the same expression throws > ClassCastException* > scala> df.rdd.map { row => row.getSeq[(Int, Double)](1).map(_._1) }.collect > 16/04/06 13:02:49 ERROR Executor: Exception in task 5.0 in stage 10.0 (TID > 74) > java.lang.ClassCastException: > org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be > cast to scala.Tuple2 > at > $line54.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1$$anonfun$apply$1.apply(<console>:27) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > $line54.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.apply(<console>:27) > at > $line54.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.apply(<console>:27) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:370) > at scala.collection.Iterator$class.foreach(Iterator.scala:742) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) > at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) > at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) > at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:308) > at scala.collection.AbstractIterator.to(Iterator.scala:1194) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:300) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1194) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:287) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1194) > at > org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:880) > at > org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:880) > > *// can collect the extracted field* > *// again, return type appears correct* > scala> df.rdd.map { row => row.getSeq[(Int, Double)](1) }.collect > res23: Array[Seq[(Int, Double)]] = Array(WrappedArray([1,6.0], [1,4.0]), > WrappedArray([1,3.0], [2,1.0]), WrappedArray([3,3.0], [4,6.0])) > > *// trying to apply map to extract first element of tuple fails* > scala> df.rdd.map { row => row.getSeq[(Int, Double)](1) > }.collect.map(_.map(_._1)) > java.lang.ClassCastException: > org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be > cast to scala.Tuple2 > at $anonfun$2$$anonfun$apply$1.apply(<console>:27) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at $anonfun$2.apply(<console>:27) > at $anonfun$2.apply(<console>:27) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) >