[jira] [Created] (SPARK-12916) Support Row.fromSeq and Row.toSeq methods in pyspark

Shubhanshu Mishra (JIRA) Tue, 19 Jan 2016 20:34:05 -0800

Shubhanshu Mishra created SPARK-12916:
-----------------------------------------


             Summary: Support Row.fromSeq and Row.toSeq methods in pyspark
                 Key: SPARK-12916
                 URL: https://issues.apache.org/jira/browse/SPARK-12916
             Project: Spark
          Issue Type: Improvement
          Components: PySpark, SQL
            Reporter: Shubhanshu Mishra
            Priority: Minor


Pyspark should also have access to the Row functions like fromSeq and toSeq 
which are exposed in the scala api. 
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Row

This will be useful when constructing custom columns from function called in 
dataframes. A good example is present in the following SO threat: 

http://stackoverflow.com/questions/32196207/derive-multiple-columns-from-a-single-column-in-a-spark-dataframe

{code:python}
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row

def foobarFunc(x: Long, y: Double, z: String): Seq[Any] = 
  Seq(x * y, z.head.toInt * y)

val schema = StructType(df.schema.fields ++
  Array(StructField("foo", DoubleType), StructField("bar", DoubleType)))

val rows = df.rdd.map(r => Row.fromSeq(
  r.toSeq ++
  foobarFunc(r.getAs[Long]("x"), r.getAs[Double]("y"), r.getAs[String]("z"))))

val df2 = sqlContext.createDataFrame(rows, schema)

df2.show
// +---+----+---+----+-----+
// |  x|   y|  z| foo|  bar|
// +---+----+---+----+-----+
// |  1| 3.0|  a| 3.0|291.0|
// |  2|-1.0|  b|-2.0|-98.0|
// |  3| 0.0|  c| 0.0|  0.0|
// +---+----+---+----+-----+
{code}

I am ready to work on this feature. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12916) Support Row.fromSeq and Row.toSeq methods in pyspark

Reply via email to