Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21944#discussion_r207248446 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1367,6 +1367,22 @@ class Dataset[T] private[sql]( }: _*) } + /** + * Casts all the values of the current Dataset following the types of a specific StructType. + * This method works also with nested structTypes. + * + * @group typedrel + * @since 2.4.0 + */ + def castBySchema(schema: StructType): DataFrame = { + assert(schema.fields.map(_.name).toList.sameElements(this.schema.fields.map(_.name).toList), + "schema should have the same fields as the original schema") + + selectExpr(schema.map( --- End diff -- There are many good one liner tricks and I would just leave those good tricks in mailing list or something. I wouldn't add an API only because it _might be_ helpful to some users. We shouldn't add an API only because it _might be_ useful. I would consider adding this if there's a request for this PR multiple times, it is not one liner change and there's no easy workaround for it. Otherwise, every system will have an API to send an email.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org