Rajesh Balamohan created HIVE-26997: ---------------------------------------
Summary: Iceberg: Vectorization gets disabled at runtime in merge-into statements Key: HIVE-26997 URL: https://issues.apache.org/jira/browse/HIVE-26997 Project: Hive Issue Type: Improvement Components: Iceberg integration Reporter: Rajesh Balamohan Attachments: explain_merge_into.txt *Query:* Think of "ssv" table as a table containing trickle feed data in the following query. "store_sales_delete_1" is the destination table. {noformat} MERGE INTO tpcds_1000_iceberg_mor_v4.store_sales_delete_1 t USING tpcds_1000_update.ssv s ON (t.ss_item_sk = s.ss_item_sk AND t.ss_customer_sk=s.ss_customer_sk AND t.ss_sold_date_sk = "2451181" AND ((Floor((s.ss_item_sk) / 1000) * 1000) BETWEEN 1000 AND 2000) AND s.ss_ext_discount_amt < 0.0) WHEN matched AND t.ss_ext_discount_amt IS NULL THEN UPDATE SET ss_ext_discount_amt = 0.0 WHEN NOT matched THEN INSERT (ss_sold_time_sk, ss_item_sk, ss_customer_sk, ss_cdemo_sk, ss_hdemo_sk, ss_addr_sk, ss_store_sk, ss_promo_sk, ss_ticket_number, ss_quantity, ss_wholesale_cost, ss_list_price, ss_sales_price, ss_ext_discount_amt, ss_ext_sales_price, ss_ext_wholesale_cost, ss_ext_list_price, ss_ext_tax, ss_coupon_amt, ss_net_paid, ss_net_paid_inc_tax, ss_net_profit, ss_sold_date_sk) VALUES (s.ss_sold_time_sk, s.ss_item_sk, s.ss_customer_sk, s.ss_cdemo_sk, s.ss_hdemo_sk, s.ss_addr_sk, s.ss_store_sk, s.ss_promo_sk, s.ss_ticket_number, s.ss_quantity, s.ss_wholesale_cost, s.ss_list_price, s.ss_sales_price, s.ss_ext_discount_amt, s.ss_ext_sales_price, s.ss_ext_wholesale_cost, s.ss_ext_list_price, s.ss_ext_tax, s.ss_coupon_amt, s.ss_net_paid, s.ss_net_paid_inc_tax, s.ss_net_profit, "2451181") {noformat} *Issue:* # Map phase is not getting vectorized due to "PARTITION_{_}SPEC{_}_ID" column {noformat} Map notVectorizedReason: Select expression for SELECT operator: Virtual column PARTITION__SPEC__ID is not supported {noformat} 2. "Reducer 2" stage isn't vectorized. {noformat} Reduce notVectorizedReason: exception: java.lang.RuntimeException: Full Outer Small Table Key Mapping duplicate column 0 in ordered column map {0=(value column: 30, type info: int), 1=(value column: 31, type info: int)} when adding value column 53, type into int stack trace: org.apache.hadoop.hive.ql.exec.vector.VectorColumnOrderedMap.add(VectorColumnOrderedMap.java:102), org.apache.hadoop.hive.ql.exec.vector.VectorColumnSourceMapping.add(VectorColumnSourceMapping.java:41), org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.canSpecializeMapJoin(Vectorizer.java:3865), org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperator(Vectorizer.java:5246), org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChild(Vectorizer.java:988), org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChildren(Vectorizer.java:874), org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperatorTree(Vectorizer.java:841), org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.access$2400(Vectorizer.java:251), org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeReduceOperators(Vectorizer.java:2298), org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeReduceOperators(Vectorizer.java:2246), org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeReduceWork(Vectorizer.java:2224), org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertReduceWork(Vectorizer.java:2206), org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Vectorizer.java:1038), org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111), org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180), ... {noformat} I have attached the explain plan for this, which has details on this. -- This message was sent by Atlassian Jira (v8.20.10#820010)