[ https://issues.apache.org/jira/browse/SPARK-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiao Li updated SPARK-15808: ---------------------------- Description: Example 1: PARQUET -> CSV {noformat} createDF(0, 9).write.format("parquet").saveAsTable("appendParquetToOrc") createDF(10, 19).write.mode(SaveMode.Append).format("orc").saveAsTable("appendParquetToOrc") {noformat} Error we got: {noformat} Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2, localhost): java.lang.RuntimeException: file:/private/var/folders/4b/sgmfldk15js406vk7lw5llzw0000gn/T/warehouse-bc8fedf2-aa6a-4002-a18b-524c6ac859d4/appendorctoparquet/part-r-00000-c0e3f365-1d46-4df5-a82c-b47d7af9feb9.snappy.orc is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [79, 82, 67, 23] {noformat} Example 2: Json -> CSV createDF(0, 9).write.format("json").saveAsTable("appendJsonToCSV") createDF(10, 19).write.mode(SaveMode.Append).format("parquet").saveAsTable("appendJsonToCSV") No exception, but wrong results: {noformat} +----+----+ | c1| c2| +----+----+ |null|null| |null|null| |null|null| |null|null| | 0|str0| | 1|str1| | 2|str2| | 3|str3| | 4|str4| | 5|str5| | 6|str6| | 7|str7| | 8|str8| | 9|str9| +----+----+ {noformat} Example 3: Json -> Text {noformat} createDF(0, 9).write.format("json").saveAsTable("appendJsonToText") createDF(10, 19).write.mode(SaveMode.Append).format("text").saveAsTable("appendJsonToText") {noformat} Error we got: {noformat} Text data source supports only a single column, and you have 2 columns. {noformat} was: Example 1: PARQUET -> CSV {noformat} createDF(0, 9).write.format("parquet").saveAsTable("appendParquetToOrc") createDF(10, 19).write.mode(SaveMode.Append).format("orc").saveAsTable("appendParquetToOrc") {noformat} Error we got: {noformat} Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2, localhost): java.lang.RuntimeException: file:/private/var/folders/4b/sgmfldk15js406vk7lw5llzw0000gn/T/warehouse-bc8fedf2-aa6a-4002-a18b-524c6ac859d4/appendorctoparquet/part-r-00000-c0e3f365-1d46-4df5-a82c-b47d7af9feb9.snappy.orc is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [79, 82, 67, 23] {noformat} Example 2: Json -> CSV createDF(0, 9).write.format("json").saveAsTable("appendJsonToCSV") createDF(10, 19).write.mode(SaveMode.Append).format("parquet").saveAsTable("appendJsonToCSV") No exception, but wrong results: {noformat} +----+----+ | c1| c2| +----+----+ |null|null| |null|null| |null|null| |null|null| | 0|str0| | 1|str1| | 2|str2| | 3|str3| | 4|str4| | 5|str5| | 6|str6| | 7|str7| | 8|str8| | 9|str9| +----+----+ {noformat} Example 3: Json -> Text createDF(0, 9).write.format("json").saveAsTable("appendJsonToText") createDF(10, 19).write.mode(SaveMode.Append).format("text").saveAsTable("appendJsonToText") Error we got: {noformat} Text data source supports only a single column, and you have 2 columns. {noformat} > Wrong Results or Strange Errors In Append-mode DataFrame Writing Due to > Mismatched File Formats > ----------------------------------------------------------------------------------------------- > > Key: SPARK-15808 > URL: https://issues.apache.org/jira/browse/SPARK-15808 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Reporter: Xiao Li > > Example 1: PARQUET -> CSV > {noformat} > createDF(0, 9).write.format("parquet").saveAsTable("appendParquetToOrc") > createDF(10, > 19).write.mode(SaveMode.Append).format("orc").saveAsTable("appendParquetToOrc") > {noformat} > Error we got: > {noformat} > Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most > recent failure: Lost task 0.0 in stage 2.0 (TID 2, localhost): > java.lang.RuntimeException: > file:/private/var/folders/4b/sgmfldk15js406vk7lw5llzw0000gn/T/warehouse-bc8fedf2-aa6a-4002-a18b-524c6ac859d4/appendorctoparquet/part-r-00000-c0e3f365-1d46-4df5-a82c-b47d7af9feb9.snappy.orc > is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but > found [79, 82, 67, 23] > {noformat} > Example 2: Json -> CSV > createDF(0, 9).write.format("json").saveAsTable("appendJsonToCSV") > createDF(10, > 19).write.mode(SaveMode.Append).format("parquet").saveAsTable("appendJsonToCSV") > No exception, but wrong results: > {noformat} > +----+----+ > | c1| c2| > +----+----+ > |null|null| > |null|null| > |null|null| > |null|null| > | 0|str0| > | 1|str1| > | 2|str2| > | 3|str3| > | 4|str4| > | 5|str5| > | 6|str6| > | 7|str7| > | 8|str8| > | 9|str9| > +----+----+ > {noformat} > Example 3: Json -> Text > {noformat} > createDF(0, 9).write.format("json").saveAsTable("appendJsonToText") > createDF(10, > 19).write.mode(SaveMode.Append).format("text").saveAsTable("appendJsonToText") > {noformat} > Error we got: > {noformat} > Text data source supports only a single column, and you have 2 columns. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org