Ruslan Dautkhanov created SPARK-23074: -----------------------------------------
Summary: Dataframe-ified zipwithindex Key: SPARK-23074 URL: https://issues.apache.org/jira/browse/SPARK-23074 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 2.3.0, 2.4.0 Reporter: Ruslan Dautkhanov Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex(): {code:java} import org.apache.spark.sql.DataFrame import org.apache.spark.sql.types.{LongType, StructField, StructType} import org.apache.spark.sql.Row def dfZipWithIndex( df: DataFrame, offset: Int = 1, colName: String = "id", inFront: Boolean = true ) : DataFrame = { df.sqlContext.createDataFrame( df.rdd.zipWithIndex.map(ln => Row.fromSeq( (if (inFront) Seq(ln._2 + offset) else Seq()) ++ ln._1.toSeq ++ (if (inFront) Seq() else Seq(ln._2 + offset)) ) ), StructType( (if (inFront) Array(StructField(colName,LongType,false)) else Array[StructField]()) ++ df.schema.fields ++ (if (inFront) Array[StructField]() else Array(StructField(colName,LongType,false))) ) ) } {code} credits: [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex] -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org