Rajesh Balamohan created HIVE-26997:
---------------------------------------

             Summary: Iceberg: Vectorization gets disabled at runtime in 
merge-into statements
                 Key: HIVE-26997
                 URL: https://issues.apache.org/jira/browse/HIVE-26997
             Project: Hive
          Issue Type: Improvement
          Components: Iceberg integration
            Reporter: Rajesh Balamohan
         Attachments: explain_merge_into.txt

*Query:*

Think of "ssv" table as a table containing trickle feed data in the following 
query. "store_sales_delete_1" is the destination table.

 
{noformat}
MERGE INTO tpcds_1000_iceberg_mor_v4.store_sales_delete_1 t USING 
tpcds_1000_update.ssv s ON (t.ss_item_sk = s.ss_item_sk
                                                                                
              AND t.ss_customer_sk=s.ss_customer_sk
                                                                                
              AND t.ss_sold_date_sk = "2451181"
                                                                                
              AND ((Floor((s.ss_item_sk) / 1000) * 1000) BETWEEN 1000 AND 2000)
                                                                                
              AND s.ss_ext_discount_amt < 0.0) WHEN matched
AND t.ss_ext_discount_amt IS NULL THEN
UPDATE
SET ss_ext_discount_amt = 0.0 WHEN NOT matched THEN
INSERT (ss_sold_time_sk,
        ss_item_sk,
        ss_customer_sk,
        ss_cdemo_sk,
        ss_hdemo_sk,
        ss_addr_sk,
        ss_store_sk,
        ss_promo_sk,
        ss_ticket_number,
        ss_quantity,
        ss_wholesale_cost,
        ss_list_price,
        ss_sales_price,
        ss_ext_discount_amt,
        ss_ext_sales_price,
        ss_ext_wholesale_cost,
        ss_ext_list_price,
        ss_ext_tax,
        ss_coupon_amt,
        ss_net_paid,
        ss_net_paid_inc_tax,
        ss_net_profit,
        ss_sold_date_sk)
VALUES (s.ss_sold_time_sk,
        s.ss_item_sk,
        s.ss_customer_sk,
        s.ss_cdemo_sk,
        s.ss_hdemo_sk,
        s.ss_addr_sk,
        s.ss_store_sk,
        s.ss_promo_sk,
        s.ss_ticket_number,
        s.ss_quantity,
        s.ss_wholesale_cost,
        s.ss_list_price,
        s.ss_sales_price,
        s.ss_ext_discount_amt,
        s.ss_ext_sales_price,
        s.ss_ext_wholesale_cost,
        s.ss_ext_list_price,
        s.ss_ext_tax,
        s.ss_coupon_amt,
        s.ss_net_paid,
        s.ss_net_paid_inc_tax,
        s.ss_net_profit,
        "2451181")

 {noformat}
 

 

*Issue:*
 # Map phase is not getting vectorized due to "PARTITION_{_}SPEC{_}_ID" column

{noformat}
Map notVectorizedReason: Select expression for SELECT operator: Virtual column 
PARTITION__SPEC__ID is not supported {noformat}
 

2. "Reducer 2" stage isn't vectorized. 
{noformat}
Reduce notVectorizedReason: exception: java.lang.RuntimeException: Full Outer 
Small Table Key Mapping duplicate column 0 in ordered column map {0=(value 
column: 30, type info: int), 1=(value column: 31, type info: int)} when adding 
value column 53, type into int stack trace: 
org.apache.hadoop.hive.ql.exec.vector.VectorColumnOrderedMap.add(VectorColumnOrderedMap.java:102),
 
org.apache.hadoop.hive.ql.exec.vector.VectorColumnSourceMapping.add(VectorColumnSourceMapping.java:41),
 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.canSpecializeMapJoin(Vectorizer.java:3865),
 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperator(Vectorizer.java:5246),
 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChild(Vectorizer.java:988),
 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChildren(Vectorizer.java:874),
 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperatorTree(Vectorizer.java:841),
 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.access$2400(Vectorizer.java:251),
 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeReduceOperators(Vectorizer.java:2298),
 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeReduceOperators(Vectorizer.java:2246),
 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeReduceWork(Vectorizer.java:2224),
 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertReduceWork(Vectorizer.java:2206),
 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Vectorizer.java:1038),
 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111),
 org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180), 
... {noformat}
 

I have attached the explain plan for this, which has details on this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to