[jira] [Work logged] (HIVE-25902) Vectorized reading of Parquet tables via Iceberg

ASF GitHub Bot (Jira) Thu, 27 Jan 2022 03:46:07 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-25902?focusedWorklogId=716363&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-716363
 ]


ASF GitHub Bot logged work on HIVE-25902:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Jan/22 11:45
            Start Date: 27/Jan/22 11:45
    Worklog Time Spent: 10m 
      Work Description: marton-bod commented on a change in pull request #2976:
URL: https://github.com/apache/hive/pull/2976#discussion_r793525273



##########
File path: 
iceberg/iceberg-handler/src/test/queries/positive/vectorized_iceberg_read_mixed.q
##########
@@ -0,0 +1,86 @@
+set hive.vectorized.execution.enabled=true;
+
+drop table if exists tbl_ice_mixed;
+create external table tbl_ice_mixed(a int, b string) stored by iceberg stored 
as orc;
+insert into table tbl_ice_mixed values (1, 'one'), (2, 'two'), (3, 'three'), 
(4, 'four'), (5, 'five'), (111, 'one'), (22, 'two'), (11, 'one'), (44444, 
'four'), (44, 'four');
+alter table tbl_ice_mixed set tblproperties ('write.format.default'='parquet');
+insert into table tbl_ice_mixed values (10, 'ten'), (20, 'twenty'), (30, 
'thirty'), (40, 'fourty'), (50, 'fifty'), (1110, 'ten'), (220, 'twenty'),  
(44445, 'four'),  (10, 'one');
+
+analyze table tbl_ice_mixed compute statistics for columns;
+
+explain select b, max(a) from tbl_ice_mixed group by b;
+select b, max(a) from tbl_ice_mixed group by b;
+
+create external table tbl_ice_mixed_all_types (
+    t_float FLOAT,
+    t_double DOUBLE,
+    t_boolean BOOLEAN,
+    t_int INT,
+    t_bigint BIGINT,
+    t_binary BINARY,
+    t_string STRING,
+    t_timestamp TIMESTAMP,
+    t_date DATE,
+    t_decimal DECIMAL(4,2)
+    ) stored by iceberg stored as orc;
+
+insert into tbl_ice_mixed_all_types values (1.1, 1.2, false, 4, 
567890123456789, '6', "col7", cast('2012-10-03 19:58:08' as timestamp), 
date('1234-09-09'), cast('10.01' as decimal(4,2)));
+alter table tbl_ice_mixed_all_types set tblproperties 
('write.format.default'='parquet');
+insert into tbl_ice_mixed_all_types values (5.1, 6.2, true, 40, 
567890123456780, '8', "col07", cast('2012-10-03 19:58:09' as timestamp), 
date('1234-09-10'), cast('10.02' as decimal(4,2)));
+
+explain select max(t_float), t_double, t_boolean, t_int, t_bigint, t_binary, 
t_string, t_timestamp, t_date, t_decimal from tbl_ice_mixed_all_types
+    group by t_double, t_boolean, t_int, t_bigint, t_binary, t_string, 
t_timestamp, t_date, t_decimal;
+select max(t_float), t_double, t_boolean, t_int, t_bigint, t_binary, t_string, 
t_timestamp, t_date, t_decimal from tbl_ice_mixed_all_types
+        group by t_double, t_boolean, t_int, t_bigint, t_binary, t_string, 
t_timestamp, t_date, t_decimal;
+
+create external table tbl_ice_mixed_parted (
+    a int,
+    b string
+    ) partitioned by (p1 string, p2 string)
+    stored by iceberg stored as orc location 'file:/tmp/tbl_ice_mixed_parted';
+
+insert into tbl_ice_mixed_parted values
+                                      (1, 'aa', 'Europe', 'Hungary'),
+                                      (1, 'bb', 'Europe', 'Hungary'),
+                                      (2, 'aa', 'America', 'USA'),
+                                      (2, 'bb', 'America', 'Canada');
+
+alter table tbl_ice_mixed_parted set tblproperties 
('write.format.default'='parquet');
+
+insert into tbl_ice_mixed_parted values
+                                     (1, 'a', 'Europe', 'Hungary'),
+                                     (10, 'bbb', 'Europe', 'Hungary'),
+                                     (20, 'aaa', 'America', 'USA'),
+                                     (20, 'bbb', 'America', 'Mexico');
+
+-- query with projection of partition columns' subset
+select p1, a, min(b) from tbl_ice_mixed_parted group by p1, a;
+
+-- required for reordering between different types
+set hive.metastore.disallow.incompatible.col.type.changes=false;

Review comment:
       do we still need this? I thought the icebergSerde has been added to some 
exception list on the HMS-side




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 716363)
    Remaining Estimate: 0h
            Time Spent: 10m

> Vectorized reading of Parquet tables via Iceberg
> ------------------------------------------------
>
>                 Key: HIVE-25902
>                 URL: https://issues.apache.org/jira/browse/HIVE-25902
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ádám Szita
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Related Iceberg PR: https://github.com/apache/iceberg/pull/3980



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25902) Vectorized reading of Parquet tables via Iceberg

Reply via email to