[ https://issues.apache.org/jira/browse/SPARK-15856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheng Lian updated SPARK-15856: ------------------------------- Description: In Spark 2.0, after unifying Datasets and DataFrames, we made two API breaking changes: # {{DataFrameReader.text()}} now returns {{Dataset\[String\]}} instead of {{DataFrame}} # {{SQLContext.range()}} now returns {{Dataset\[java.lang.Long\]}} instead of {{DataFrame}} However, these two changes introduced several inconsistencies and problems: # {{spark.read.text()}} silently discards partitioned columns when reading a partitioned table in text format since {{Dataset\[String\]}} only contains a single field. Users have to use {{spark.read.format("text").load()}} to workaround this, which is pretty confusing and error-prone. # All data source shortcut methods in `DataFrameReader` return {{DataFrame}} (aka {{Dataset\[Row\]}}) except for {{DataFrameReader.text()}}. # When applying typed operations over Datasets returned by {{spark.range()}}, weird schema changes may happen. Please refer to SPARK-15632 for more details. Due to these reasons, we decided to revert these two changes. was: In Spark 2.0, after unifying Datasets and DataFrames, we made two API breaking changes: # {{DataFrameReader.text()}} now returns {{Dataset\[String\]}} instead of {{DataFrame}} # {{SQLContext.range()}} now returns {{Dataset\[java.lang.Long\]}} instead of {{DataFrame}} However, these two changes introduced several inconsistencies and problems: # {{spark.read.text()}} silently discards partitioned columns when reading a partitioned table in text format since {{Dataset\[String\]}} only contains a single field. Users have to use {{spark.read.format("text").load()}} to workaround this, which is pretty confusing and error-prone. # All data source shortcut methods in `DataFrameReader` returns a {{DataFrame}} (aka {{Dataset\[Row\]}} except for {{DataFrameReader.text()}}. # When applying typed operations over Datasets returned by {{spark.range()}}, weird schema changes may happen. Please refer to SPARK-15632 for more details. Due to these reasons, we decided to revert these two changes. > Revert API breaking changes made in DataFrameReader.text and SQLContext.range > ----------------------------------------------------------------------------- > > Key: SPARK-15856 > URL: https://issues.apache.org/jira/browse/SPARK-15856 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Cheng Lian > > In Spark 2.0, after unifying Datasets and DataFrames, we made two API > breaking changes: > # {{DataFrameReader.text()}} now returns {{Dataset\[String\]}} instead of > {{DataFrame}} > # {{SQLContext.range()}} now returns {{Dataset\[java.lang.Long\]}} instead of > {{DataFrame}} > However, these two changes introduced several inconsistencies and problems: > # {{spark.read.text()}} silently discards partitioned columns when reading a > partitioned table in text format since {{Dataset\[String\]}} only contains a > single field. Users have to use {{spark.read.format("text").load()}} to > workaround this, which is pretty confusing and error-prone. > # All data source shortcut methods in `DataFrameReader` return {{DataFrame}} > (aka {{Dataset\[Row\]}}) except for {{DataFrameReader.text()}}. > # When applying typed operations over Datasets returned by {{spark.range()}}, > weird schema changes may happen. Please refer to SPARK-15632 for more details. > Due to these reasons, we decided to revert these two changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org