[ https://issues.apache.org/jira/browse/SPARK-25579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16633612#comment-16633612 ]
Apache Spark commented on SPARK-25579: -------------------------------------- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/22597 > Use quoted attribute names if needed in pushed ORC predicates > ------------------------------------------------------------- > > Key: SPARK-25579 > URL: https://issues.apache.org/jira/browse/SPARK-25579 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.0 > Reporter: Dongjoon Hyun > Assignee: Dongjoon Hyun > Priority: Critical > > This issue aims to fix an ORC performance regression at Spark 2.4.0 RCs from > Spark 2.3.2. For column names with `.`, the pushed predicates are ignored. > *Test Data* > {code:java} > scala> val df = spark.range(Int.MaxValue).sample(0.2).toDF("col.with.dot") > scala> df.write.mode("overwrite").orc("/tmp/orc") > {code} > *Spark 2.3.2* > {code:java} > scala> spark.sql("set spark.sql.orc.impl=native") > scala> spark.sql("set spark.sql.orc.filterPushdown=true") > scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < > 10").show) > +------------+ > |col.with.dot| > +------------+ > | 1| > | 8| > +------------+ > Time taken: 1486 ms > scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < > 10").show) > +------------+ > |col.with.dot| > +------------+ > | 1| > | 8| > +------------+ > Time taken: 163 ms > {code} > *Spark 2.4.0 RC2* > {code:java} > scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < > 10").show) > +------------+ > |col.with.dot| > +------------+ > | 1| > | 8| > +------------+ > Time taken: 4087 ms > scala> spark.time(spark.read.orc("/tmp/orc").where("`col.with.dot` < > 10").show) > +------------+ > |col.with.dot| > +------------+ > | 1| > | 8| > +------------+ > Time taken: 1998 ms > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org