[ https://issues.apache.org/jira/browse/SPARK-15856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326065#comment-15326065 ]
Reynold Xin commented on SPARK-15856: ------------------------------------- Note that we have decided to only revert the SQLContext.range API in this ticket. > Revert API breaking changes made in SQLContext.range > ---------------------------------------------------- > > Key: SPARK-15856 > URL: https://issues.apache.org/jira/browse/SPARK-15856 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Cheng Lian > Assignee: Wenchen Fan > Fix For: 2.0.0 > > > In Spark 2.0, after unifying Datasets and DataFrames, we made two API > breaking changes: > # {{DataFrameReader.text()}} now returns {{Dataset\[String\]}} instead of > {{DataFrame}} > # {{SQLContext.range()}} now returns {{Dataset\[java.lang.Long\]}} instead of > {{DataFrame}} > However, these two changes introduced several inconsistencies and problems: > # {{spark.read.text()}} silently discards partitioned columns when reading a > partitioned table in text format since {{Dataset\[String\]}} only contains a > single field. Users have to use {{spark.read.format("text").load()}} to > workaround this, which is pretty confusing and error-prone. > # All data source shortcut methods in `DataFrameReader` return {{DataFrame}} > (aka {{Dataset\[Row\]}}) except for {{DataFrameReader.text()}}. > # When applying typed operations over Datasets returned by {{spark.range()}}, > weird schema changes may happen. Please refer to SPARK-15632 for more details. > Due to these reasons, we decided to revert these two changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org