[GitHub] [hive] marton-bod commented on a change in pull request #2976: HIVE-25902 - adding support for vectorized Parquet reads

GitBox Thu, 27 Jan 2022 03:47:35 -0800


marton-bod commented on a change in pull request #2976:
URL: https://github.com/apache/hive/pull/2976#discussion_r793526510




##########
File path: 
iceberg/iceberg-handler/src/test/queries/positive/vectorized_iceberg_read_parquet.q
##########
@@ -0,0 +1,69 @@
+set hive.vectorized.execution.enabled=true;
+
+drop table if exists tbl_ice_parquet;
+create external table tbl_ice_parquet(a int, b string) stored by iceberg 
stored as parquet;
+insert into table tbl_ice_parquet values (1, 'one'), (2, 'two'), (3, 'three'), 
(4, 'four'), (5, 'five'), (111, 'one'), (22, 'two'), (11, 'one'), (44444, 
'four'), (44, 'four');
+analyze table tbl_ice_parquet compute statistics for columns;
+
+explain select b, max(a) from tbl_ice_parquet group by b;
+select b, max(a) from tbl_ice_parquet group by b;
+
+create external table tbl_ice_parquet_all_types (
+    t_float FLOAT,
+    t_double DOUBLE,
+    t_boolean BOOLEAN,
+    t_int INT,
+    t_bigint BIGINT,
+    t_binary BINARY,
+    t_string STRING,
+    t_timestamp TIMESTAMP,
+    t_date DATE,
+    t_decimal DECIMAL(4,2)
+    ) stored by iceberg stored as parquet;
+
+insert into tbl_ice_parquet_all_types values (1.1, 1.2, false, 4, 
567890123456789, '6', "col7", cast('2012-10-03 19:58:08' as timestamp), 
date('1234-09-09'), cast('10.01' as decimal(4,2)));
+
+explain select max(t_float), t_double, t_boolean, t_int, t_bigint, t_binary, 
t_string, t_timestamp, t_date, t_decimal from tbl_ice_parquet_all_types
+    group by t_double, t_boolean, t_int, t_bigint, t_binary, t_string, 
t_timestamp, t_date, t_decimal;
+select max(t_float), t_double, t_boolean, t_int, t_bigint, t_binary, t_string, 
t_timestamp, t_date, t_decimal from tbl_ice_parquet_all_types
+        group by t_double, t_boolean, t_int, t_bigint, t_binary, t_string, 
t_timestamp, t_date, t_decimal;
+
+create external table tbl_ice_parquet_parted (
+    a int,
+    b string
+    ) partitioned by (p1 string, p2 string)
+    stored by iceberg stored as parquet location 
'file:/tmp/tbl_ice_parquet_parted';
+
+insert into tbl_ice_parquet_parted values
+                                      (1, 'aa', 'Europe', 'Hungary'),
+                                      (1, 'bb', 'Europe', 'Hungary'),
+                                      (2, 'aa', 'America', 'USA'),
+                                      (2, 'bb', 'America', 'Canada');
+-- query with projection of partition columns' subset
+select p1, a, min(b) from tbl_ice_parquet_parted group by p1, a;
+
+-- required for reordering between different types
+set hive.metastore.disallow.incompatible.col.type.changes=false;
+
+-- move partition columns
+alter table tbl_ice_parquet_parted change column p1 p1 string after a;
+
+-- should yield to the same result as previously
+select p1, a, min(b) from tbl_ice_parquet_parted group by p1, a;
+
+-- move non-partition columns
+alter table tbl_ice_parquet_parted change column a a int after b;

Review comment:
       is it worth testing column renames too w/ vectorization?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] marton-bod commented on a change in pull request #2976: HIVE-25902 - adding support for vectorized Parquet reads

Reply via email to