Shubhanshu Mishra created SPARK-12916: -----------------------------------------
Summary: Support Row.fromSeq and Row.toSeq methods in pyspark Key: SPARK-12916 URL: https://issues.apache.org/jira/browse/SPARK-12916 Project: Spark Issue Type: Improvement Components: PySpark, SQL Reporter: Shubhanshu Mishra Priority: Minor Pyspark should also have access to the Row functions like fromSeq and toSeq which are exposed in the scala api. https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Row This will be useful when constructing custom columns from function called in dataframes. A good example is present in the following SO threat: http://stackoverflow.com/questions/32196207/derive-multiple-columns-from-a-single-column-in-a-spark-dataframe {code:python} import org.apache.spark.sql.types._ import org.apache.spark.sql.Row def foobarFunc(x: Long, y: Double, z: String): Seq[Any] = Seq(x * y, z.head.toInt * y) val schema = StructType(df.schema.fields ++ Array(StructField("foo", DoubleType), StructField("bar", DoubleType))) val rows = df.rdd.map(r => Row.fromSeq( r.toSeq ++ foobarFunc(r.getAs[Long]("x"), r.getAs[Double]("y"), r.getAs[String]("z")))) val df2 = sqlContext.createDataFrame(rows, schema) df2.show // +---+----+---+----+-----+ // | x| y| z| foo| bar| // +---+----+---+----+-----+ // | 1| 3.0| a| 3.0|291.0| // | 2|-1.0| b|-2.0|-98.0| // | 3| 0.0| c| 0.0| 0.0| // +---+----+---+----+-----+ {code} I am ready to work on this feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org