[ https://issues.apache.org/jira/browse/SPARK-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273837#comment-15273837 ]
Cheng Lian commented on SPARK-15112: ------------------------------------ Actually there's another issue that contributes to this bug. The problem lies in the {{EmbedSerializerInFilter}} optimization rule. In short, this rule optimizes plan fragments like {noformat} SerializeFromObject <--. Filter | column order may differ DeserializeToObject | <child-plan> <--' {noformat} into {noformat} Filter <child-plan> {noformat} by embedding the deserializer expression into the {{Filter}} condition expression. Namely, when filtering an input row, always deserialize the input row into a Scala object, then uses the object as argument to invoke the user-provided Scala predicate function. The problem here is that, output column order of {{SerializeFromObject}} may differ from column order of the child plan (as explained in my comment above). Thus the simplified plan fragment may produce wrong result because the column order isn't adjusted accordingly. To fix this issue, we should add a {{Project}} on top of the result {{Filter}} plan when necessary to adjust output column order. > Dataset filter returns garbage > ------------------------------ > > Key: SPARK-15112 > URL: https://issues.apache.org/jira/browse/SPARK-15112 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: Reynold Xin > Assignee: Cheng Lian > Priority: Blocker > Attachments: demo 1 dataset - Databricks.htm > > > See the following notebook: > https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/6122906529858466/2727501386611535/5382278320999420/latest.html > I think it happens only when using JSON. I'm also going to attach it to the > ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org