[ https://issues.apache.org/jira/browse/SPARK-12998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118486#comment-15118486 ]
Apache Spark commented on SPARK-12998: -------------------------------------- User 'rajeshbalamohan' has created a pull request for this issue: https://github.com/apache/spark/pull/10938 > Enable OrcRelation when connecting via spark thrift server > ---------------------------------------------------------- > > Key: SPARK-12998 > URL: https://issues.apache.org/jira/browse/SPARK-12998 > Project: Spark > Issue Type: Improvement > Components: SQL > Reporter: Rajesh Balamohan > > When a user connects via spark-thrift server to execute SQL, it does not > enable PPD with ORC. It ends up creating MetastoreRelation which does not > have ORC PPD. Purpose of this JIRA is to convert MetastoreRelation to > OrcRelation in HiveMetastoreCatalog, so that users can benefit from PPD even > when connecting to spark-thrift server. > {noformat} > For example, "explain select count(1) from tpch_flat_orc_1000.lineitem where > l_shipdate = '1990-04-18'", current plan is > +------------------------------------------------------------------------------------------------------------------+--+ > | plan > | > +------------------------------------------------------------------------------------------------------------------+--+ > | == Physical Plan == > | > | TungstenAggregate(key=[], > functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#17L]) > | > | +- Exchange SinglePartition, None > | > | +- WholeStageCodegen > | > | : +- TungstenAggregate(key=[], > functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#20L]) | > | : +- Project > | > | : +- Filter (l_shipdate#11 = 1990-04-18) > | > | : +- INPUT > | > | +- HiveTableScan [l_shipdate#11], MetastoreRelation tpch_1000, > lineitem, None | > +------------------------------------------------------------------------------------------------------------------+--+ > It would be good to change it to OrcRelation to do PPD with ORC, which > reduces the runtime by large margin. > > +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+ > | > plan > | > +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+ > | == Physical Plan == > > | > | TungstenAggregate(key=[], > functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#70L]) > > | > | +- Exchange SinglePartition, None > > | > | +- WholeStageCodegen > > | > | : +- TungstenAggregate(key=[], > functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#106L]) > | > | : +- Project > > | > | : +- Filter (_col10#64 = 1990-04-18) > > | > | : +- INPUT > > | > | +- Scan OrcRelation[_col10#64] InputPaths: > hdfs://nn:8020/apps/hive/warehouse/tpch_1000.db/lineitem, PushedFilters: > [EqualTo(_col10,1990-04-18)] | > +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org