[
https://issues.apache.org/jira/browse/SPARK-18407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-18407:
------------------------------------
Assignee: Apache Spark
> Inferred partition columns cause assertion error
> ------------------------------------------------
>
> Key: SPARK-18407
> URL: https://issues.apache.org/jira/browse/SPARK-18407
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 2.0.2
> Reporter: Michael Armbrust
> Assignee: Apache Spark
> Priority: Critical
>
> [This
> assertion|https://github.com/apache/spark/blob/16eaad9daed0b633e6a714b5704509aa7107d6e5/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L408]
> fails when you run a stream against json data that is stored in partitioned
> folders, if you manually specify the schema and that schema omits the
> partitioned columns.
> My hunch is that we are inferring those columns even though the schema is
> being passed in manually and adding them to the end.
> While we are fixing this bug, it would be nice to make the assertion better.
> Truncating is not terribly useful as, at least in my case, it truncated the
> most interesting part. I changed it to this while debugging:
> {code}
> s"""
> |Batch does not have expected schema
> |Expected: ${output.mkString(",")}
> |Actual: ${newPlan.output.mkString(",")}
> |
> |== Original ==
> |$logicalPlan
> |
> |== Batch ==
> |$newPlan
> """.stripMargin
> {code}
> I also tried specifying the partition columns in the schema and now it
> appears that they are filled with corrupted data.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]