Enrico Minack created SPARK-30319: ------------------------------------- Summary: Adds a stricter version of as[T] Key: SPARK-30319 URL: https://issues.apache.org/jira/browse/SPARK-30319 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 2.4.4 Reporter: Enrico Minack Fix For: 3.0.0
The behaviour of as[T] is not intuitive when you read code like df.as[T].write.csv("data.csv"). The result depends on the actual schema of df, where def as[T](): Dataset[T] should be agnostic to the schema of df. The expected behaviour is not provided elsewhere: * Extra columns that are not part of the type {{T}} are not dropped. * Order of columns is not aligned with schema of {{T}}. * Columns are not cast to the types of {{T}}'s fields. They have to be cast explicitly. A method that enforces schema of T on a given Dataset would be very convenient and allows to articulate and guarantee above assumptions about your data with the native Spark Dataset API. This method plays a more explicit and enforcing role than as[T] with respect to columns, column order and column type. Possible naming of a stricter version of {{as[T]}}: * {{as[T](strict = true)}} * {{toDS[T]}} (as in {{toDF}}) * {{selectAs[T]}} (as this is merely selecting the columns of schema {{T}}) The naming {{toDS[T]}} is chosen here. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org