[ https://issues.apache.org/jira/browse/SPARK-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246222#comment-15246222 ]
Reynold Xin commented on SPARK-14463: ------------------------------------- This is just a problem with the text method because it returns Dataset[String]. I think we can disable partitioning in this case. If they want to load a two level folder, they can use glob {code} read.text("/path/to/data/*/*") {code} If users want to use partitioning, they can still use {code} format("text").load("...") {code} which returns a DataFrame rather than a Dataset[String]. > read.text broken for partitioned tables > --------------------------------------- > > Key: SPARK-14463 > URL: https://issues.apache.org/jira/browse/SPARK-14463 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: Michael Armbrust > Priority: Critical > > Strongly typing the return values of {{read.text}} as {{Dataset\[String]}} > breaks when trying to load a partitioned table (or any table where the path > looks partitioned) > {code} > Seq((1, "test")) > .toDF("a", "b") > .write > .format("text") > .partitionBy("a") > .save("/home/michael/text-part-bug") > sqlContext.read.text("/home/michael/text-part-bug") > {code} > {code} > org.apache.spark.sql.AnalysisException: Try to map struct<value:string,a:int> > to Tuple1, but failed as the number of fields does not line up. > - Input schema: struct<value:string,a:int> > - Target schema: struct<value:string>; > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.org$apache$spark$sql$catalyst$encoders$ExpressionEncoder$$fail$1(ExpressionEncoder.scala:265) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.validate(ExpressionEncoder.scala:279) > at org.apache.spark.sql.Dataset.<init>(Dataset.scala:197) > at org.apache.spark.sql.Dataset.<init>(Dataset.scala:168) > at org.apache.spark.sql.Dataset$.apply(Dataset.scala:57) > at org.apache.spark.sql.Dataset.as(Dataset.scala:357) > at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:450) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org