[ https://issues.apache.org/jira/browse/SPARK-18407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-18407: ------------------------------------ Assignee: Apache Spark > Inferred partition columns cause assertion error > ------------------------------------------------ > > Key: SPARK-18407 > URL: https://issues.apache.org/jira/browse/SPARK-18407 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 2.0.2 > Reporter: Michael Armbrust > Assignee: Apache Spark > Priority: Critical > > [This > assertion|https://github.com/apache/spark/blob/16eaad9daed0b633e6a714b5704509aa7107d6e5/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L408] > fails when you run a stream against json data that is stored in partitioned > folders, if you manually specify the schema and that schema omits the > partitioned columns. > My hunch is that we are inferring those columns even though the schema is > being passed in manually and adding them to the end. > While we are fixing this bug, it would be nice to make the assertion better. > Truncating is not terribly useful as, at least in my case, it truncated the > most interesting part. I changed it to this while debugging: > {code} > s""" > |Batch does not have expected schema > |Expected: ${output.mkString(",")} > |Actual: ${newPlan.output.mkString(",")} > | > |== Original == > |$logicalPlan > | > |== Batch == > |$newPlan > """.stripMargin > {code} > I also tried specifying the partition columns in the schema and now it > appears that they are filled with corrupted data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org