[ https://issues.apache.org/jira/browse/SPARK-18642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15733992#comment-15733992 ]
Dongjoon Hyun commented on SPARK-18642: --------------------------------------- I see. If then, I'll record that, too. > Spark SQL: Catalyst is scanning undesired columns > ------------------------------------------------- > > Key: SPARK-18642 > URL: https://issues.apache.org/jira/browse/SPARK-18642 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.6.2, 1.6.3 > Environment: Ubuntu 14.04 > Spark: Local Mode > Reporter: Mohit > Labels: performance > Fix For: 2.0.0 > > > When doing a left-join between two tables, say A and B, Catalyst has > information about the projection required for table B. Only the required > columns should be scanned. > Code snippet below explains the scenario: > scala> val dfA = sqlContext.read.parquet("/home/mohit/ruleA") > dfA: org.apache.spark.sql.DataFrame = [aid: int, aVal: string] > scala> val dfB = sqlContext.read.parquet("/home/mohit/ruleB") > dfB: org.apache.spark.sql.DataFrame = [bid: int, bVal: string] > scala> dfA.registerTempTable("A") > scala> dfB.registerTempTable("B") > scala> sqlContext.sql("select A.aid, B.bid from A left join B on A.aid=B.bid > where B.bid<2").explain > == Physical Plan == > Project [aid#15,bid#17] > +- Filter (bid#17 < 2) > +- BroadcastHashOuterJoin [aid#15], [bid#17], LeftOuter, None > :- Scan ParquetRelation[aid#15,aVal#16] InputPaths: > file:/home/mohit/ruleA > +- Scan ParquetRelation[bid#17,bVal#18] InputPaths: > file:/home/mohit/ruleB > This is a watered-down example from a production issue which has a huge > performance impact. > External reference: > http://stackoverflow.com/questions/40783675/spark-sql-catalyst-is-scanning-undesired-columns -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org