[jira] [Commented] (SPARK-12998) Enable OrcRelation when connecting via spark thrift server

Apache Spark (JIRA) Tue, 26 Jan 2016 18:25:05 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-12998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118486#comment-15118486
 ]


Apache Spark commented on SPARK-12998:
--------------------------------------

User 'rajeshbalamohan' has created a pull request for this issue:
https://github.com/apache/spark/pull/10938

> Enable OrcRelation when connecting via spark thrift server
> ----------------------------------------------------------
>
>                 Key: SPARK-12998
>                 URL: https://issues.apache.org/jira/browse/SPARK-12998
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Rajesh Balamohan
>
> When a user connects via spark-thrift server to execute SQL, it does not 
> enable PPD with ORC. It ends up creating MetastoreRelation which does not 
> have ORC PPD.  Purpose of this JIRA is to convert MetastoreRelation to 
> OrcRelation in HiveMetastoreCatalog, so that users can benefit from PPD even 
> when connecting to spark-thrift server.
> {noformat}
> For example, "explain select count(1) from  tpch_flat_orc_1000.lineitem where 
> l_shipdate = '1990-04-18'", current plan is 
> +------------------------------------------------------------------------------------------------------------------+--+
> |                                                       plan                  
>                                      |
> +------------------------------------------------------------------------------------------------------------------+--+
> | == Physical Plan ==                                                         
>                                      |
> | TungstenAggregate(key=[], 
> functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#17L])         
>          |
> | +- Exchange SinglePartition, None                                           
>                                      |
> |    +- WholeStageCodegen                                                     
>                                      |
> |       :  +- TungstenAggregate(key=[], 
> functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#20L])  |
> |       :     +- Project                                                      
>                                      |
> |       :        +- Filter (l_shipdate#11 = 1990-04-18)                       
>                                      |
> |       :           +- INPUT                                                  
>                                      |
> |       +- HiveTableScan [l_shipdate#11], MetastoreRelation tpch_1000, 
> lineitem, None                     |
> +------------------------------------------------------------------------------------------------------------------+--+
> It would be good to change it to OrcRelation to do PPD with ORC, which 
> reduces the runtime by large margin.
>  
> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
> |                                                                             
>                 plan                                                          
>                                     |
> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
> | == Physical Plan ==                                                         
>                                                                               
>                                     |
> | TungstenAggregate(key=[], 
> functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#70L])         
>                                                                               
>         |
> | +- Exchange SinglePartition, None                                           
>                                                                               
>                                     |
> |    +- WholeStageCodegen                                                     
>                                                                               
>                                     |
> |       :  +- TungstenAggregate(key=[], 
> functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#106L])    
>                                                                           |
> |       :     +- Project                                                      
>                                                                               
>                                     |
> |       :        +- Filter (_col10#64 = 1990-04-18)                           
>                                                                               
>                                     |
> |       :           +- INPUT                                                  
>                                                                               
>                                     |
> |       +- Scan OrcRelation[_col10#64] InputPaths: 
> hdfs://nn:8020/apps/hive/warehouse/tpch_1000.db/lineitem, PushedFilters: 
> [EqualTo(_col10,1990-04-18)]  |
> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12998) Enable OrcRelation when connecting via spark thrift server

Reply via email to