[ https://issues.apache.org/jira/browse/SPARK-16164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345977#comment-15345977 ]
Dongjoon Hyun commented on SPARK-16164: --------------------------------------- It's my pleasure. :) > Filter pushdown should keep the ordering in the logical plan > ------------------------------------------------------------ > > Key: SPARK-16164 > URL: https://issues.apache.org/jira/browse/SPARK-16164 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Reporter: Xiangrui Meng > > [~cmccubbin] reported a bug when he used StringIndexer in an ML pipeline with > additional filters. It seems that during filter pushdown, we changed the > ordering in the logical plan. I'm not sure whether we should treat this as a > bug. > {code} > val df1 = (0 until 3).map(_.toString).toDF > val indexer = new StringIndexer() > .setInputCol("value") > .setOutputCol("idx") > .setHandleInvalid("skip") > .fit(df1) > val df2 = (0 until 5).map(_.toString).toDF > val predictions = indexer.transform(df2) > predictions.show() // this is okay > predictions.where('idx > 2).show() // this will throw an exception > {code} > Please see the notebook at > https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1233855/2159162931615821/588180/latest.html > for error messages. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org