[jira] [Commented] (SPARK-18407) Inferred partition columns cause assertion error
[ https://issues.apache.org/jira/browse/SPARK-18407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15696337#comment-15696337 ] Burak Yavuz commented on SPARK-18407: - This is also resolved as part of https://issues.apache.org/jira/browse/SPARK-18510 > Inferred partition columns cause assertion error > > > Key: SPARK-18407 > URL: https://issues.apache.org/jira/browse/SPARK-18407 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.0.2 >Reporter: Michael Armbrust >Priority: Critical > > [This > assertion|https://github.com/apache/spark/blob/16eaad9daed0b633e6a714b5704509aa7107d6e5/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L408] > fails when you run a stream against json data that is stored in partitioned > folders, if you manually specify the schema and that schema omits the > partitioned columns. > My hunch is that we are inferring those columns even though the schema is > being passed in manually and adding them to the end. > While we are fixing this bug, it would be nice to make the assertion better. > Truncating is not terribly useful as, at least in my case, it truncated the > most interesting part. I changed it to this while debugging: > {code} > s""" > |Batch does not have expected schema > |Expected: ${output.mkString(",")} > |Actual: ${newPlan.output.mkString(",")} > | > |== Original == > |$logicalPlan > | > |== Batch == > |$newPlan >""".stripMargin > {code} > I also tried specifying the partition columns in the schema and now it > appears that they are filled with corrupted data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18407) Inferred partition columns cause assertion error
[ https://issues.apache.org/jira/browse/SPARK-18407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15680352#comment-15680352 ] Apache Spark commented on SPARK-18407: -- User 'brkyvz' has created a pull request for this issue: https://github.com/apache/spark/pull/15942 > Inferred partition columns cause assertion error > > > Key: SPARK-18407 > URL: https://issues.apache.org/jira/browse/SPARK-18407 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.0.2 >Reporter: Michael Armbrust >Priority: Critical > > [This > assertion|https://github.com/apache/spark/blob/16eaad9daed0b633e6a714b5704509aa7107d6e5/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L408] > fails when you run a stream against json data that is stored in partitioned > folders, if you manually specify the schema and that schema omits the > partitioned columns. > My hunch is that we are inferring those columns even though the schema is > being passed in manually and adding them to the end. > While we are fixing this bug, it would be nice to make the assertion better. > Truncating is not terribly useful as, at least in my case, it truncated the > most interesting part. I changed it to this while debugging: > {code} > s""" > |Batch does not have expected schema > |Expected: ${output.mkString(",")} > |Actual: ${newPlan.output.mkString(",")} > | > |== Original == > |$logicalPlan > | > |== Batch == > |$newPlan >""".stripMargin > {code} > I also tried specifying the partition columns in the schema and now it > appears that they are filled with corrupted data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org