[ https://issues.apache.org/jira/browse/SPARK-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025363#comment-14025363 ]
Kan Zhang commented on SPARK-2079: ---------------------------------- PR: https://github.com/apache/spark/pull/1023 > Skip unnecessary wrapping in List when serializing SchemaRDD to Python > ---------------------------------------------------------------------- > > Key: SPARK-2079 > URL: https://issues.apache.org/jira/browse/SPARK-2079 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL > Affects Versions: 1.0.0 > Reporter: Kan Zhang > Assignee: Kan Zhang > > Finishing the TODO: > {code} > private[sql] def javaToPython: JavaRDD[Array[Byte]] = { > val fieldNames: Seq[String] = > this.queryExecution.analyzed.output.map(_.name) > this.mapPartitions { iter => > val pickle = new Pickler > iter.map { row => > val map: JMap[String, Any] = new java.util.HashMap > // TODO: We place the map in an ArrayList so that the object is > pickled to a List[Dict]. > // Ideally we should be able to pickle an object directly into a > Python collection so we > // don't have to create an ArrayList every time. > val arr: java.util.ArrayList[Any] = new java.util.ArrayList > row.zip(fieldNames).foreach { case (obj, name) => > map.put(name, obj) > } > arr.add(map) > pickle.dumps(arr) > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)