[ 
https://issues.apache.org/jira/browse/HIVE-17486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang updated HIVE-17486:
------------------------------
    Attachment: explain.28.scan.share.false
                explain.28.scan.share.true

I set the flag {{hive.spark.optimize.shared.work}} to enable the 
SharedWorkOptimizer in Hive on Spark.  The attach explain.28.scan.share.true is 
the explain when enabling the flag and explain.28.scan.share.false is the 
explain when disabling the flag for 
[DS/query28.sql|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query28.sql]

> Enable SharedWorkOptimizer in tez on HOS
> ----------------------------------------
>
>                 Key: HIVE-17486
>                 URL: https://issues.apache.org/jira/browse/HIVE-17486
>             Project: Hive
>          Issue Type: Bug
>            Reporter: liyunzhang
>            Assignee: liyunzhang
>         Attachments: explain.28.scan.share.false, explain.28.scan.share.true, 
> scanshare.after.svg, scanshare.before.svg
>
>
> in HIVE-16602, Implement shared scans with Tez.
> Given a query plan, the goal is to identify scans on input tables that can be 
> merged so the data is read only once. Optimization will be carried out at the 
> physical level.  In Hive on Spark, it caches the result of spark work if the 
> spark work is used by more than 1 child spark work. After sharedWorkOptimizer 
> is enabled in physical plan in HoS, the identical table scans are merged to 1 
> table scan. This result of table scan will be used by more 1 child spark 
> work. Thus we need not do the same computation because of cache mechanism.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to