Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22573#discussion_r221476951 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -437,53 +436,65 @@ object DataSourceStrategy { * @return a `Some[Filter]` if the input [[Expression]] is convertible, otherwise a `None`. */ protected[sql] def translateFilter(predicate: Expression): Option[Filter] = { + // Recursively try to find an attribute name from the top level that can be pushed down. + def attrName(e: Expression): Option[String] = e match { + // In Spark and many data sources such as parquet, dots are used as a column path delimiter; + // thus, we don't translate such expressions. + case a: Attribute if !a.name.contains(".") => + Some(a.name) --- End diff -- Yes, @dbtsai . This PR has a regression on ORC at least. The following is ORC result in Spark 2.3.2 and it will slowdown at least 5 times like Parquet. > I know ORC doesn't work for now. We can have another followup PR to address this. ```scala scala> val df = spark.range(Int.MaxValue).sample(0.2).toDF("col.with.dot") scala> df.write.mode("overwrite").orc("/tmp/orc") scala> df.write.mode("overwrite").parquet("/tmp/parquet") scala> spark.sql("set spark.sql.orc.impl=native") scala> spark.sql("set spark.sql.orc.filterPushdown=true") scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` = 50000").count) Time taken: 803 ms scala> spark.time(spark.read.parquet("/tmp/parquet").where("`col.with.dot` = 50000").count) Time taken: 5573 ms scala> spark.version res6: String = 2.3.2 ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org