cloud-fan commented on a change in pull request #26969: [SPARK-30319][SQL] Add a stricter version of `as[T]` URL: https://github.com/apache/spark/pull/26969#discussion_r367908939
########## File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ########## @@ -495,6 +495,25 @@ class Dataset[T] private[sql]( select(newCols : _*) } + /** + * Returns a new Dataset where each record has been mapped on to the specified type. + * This only supports `U` being a class. Fields for the class will be mapped to columns of the + * same name (case sensitivity is determined by `spark.sql.caseSensitive`). + * + * If the schema of the Dataset does not match the desired `U` type, you can use `select` + * along with `alias` or `as` to rearrange or rename as required. + * + * This method eagerly projects away any columns that are not present in the specified class. + * It further guarantees the order of columns as well as data types to match `U`. + * + * @group basic + * @since 3.0.0 + */ + def toDS[U : Encoder]: Dataset[U] = { + val columns = implicitly[Encoder[U]].schema.fields.map(f => col(f.name).cast(f.dataType)) Review comment: I'm OK to make the scope small and only support common cases like case class. But we must make sure the implementation is defensive and we give a clear better error message for unsupported cases. Now this just throws attribute not found exception if the encoder is `Encoders.javaSerialization`. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org