[ https://issues.apache.org/jira/browse/HAWQ-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887672#comment-15887672 ]
Lili Ma commented on HAWQ-1366: ------------------------------- The title is optimized in Hive to dictionary storage. Since HAWQ doesn't support this, the output information is a little werid. In short team, HAWQ should throw error out for this case. In long term, HAWQ should support Parquet 2.0 data read/write. > HAWQ should throw error if finding dictionary encoding type for Parquet > ----------------------------------------------------------------------- > > Key: HAWQ-1366 > URL: https://issues.apache.org/jira/browse/HAWQ-1366 > Project: Apache HAWQ > Issue Type: Bug > Components: Storage > Reporter: Lili Ma > Assignee: Ed Espino > Fix For: 2.2.0.0-incubating > > > Since HAWQ is based on Parquet format version 1.0, which does not support > dictionary page, and hawq register may register Parquet format version 2.0 > data into HAWQ, we should throw error if finding unsupported page for column. > Reproduce Steps: > 1. In Hive, create a table and insert into 8 records: > {code} > (hive> create table tt (i int, > > fname varchar(100), > > title varchar(100), > > salary double > > ) > > STORED AS PARQUET; > OK > Time taken: 0.029 seconds > hive> insert into tt values (5, 'OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW', > 'Sales', 80282.54), > > (7, 'UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE', 'Engineer', 10206.65), > > (4, 'PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ', 'Director', 63691.23), > > (9, 'CTDCDYRURBZMBLNWHQNOQCYFFVULOP', 'Engineer', 63867.44), > > (10, 'WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK', 'Sales', 97720.08); > WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the > future versions. Consider using a different execution engine (i.e. spark, > tez) or using Hive 1.X releases. > Query ID = malili_20170228173956_f370414c-ddc8-4e6d-99e9-7c1fa1f678d1 > Total jobs = 3 > Launching Job 1 out of 3 > Number of reduce tasks is set to 0 since there's no reduce operator > Job running in-process (local Hadoop) > 2017-02-28 17:39:58,713 Stage-1 map = 100%, reduce = 0% > Ended Job = job_local2046305831_0004 > Stage-4 is selected by condition resolver. > Stage-3 is filtered out by condition resolver. > Stage-5 is filtered out by condition resolver. > Moving data to directory > hdfs://127.0.0.1:8020/user/hive/warehouse/tt/.hive-staging_hive_2017-02-28_17-39-56_806_3518057455919651199-1/-ext-10000 > Loading data to table default.tt > MapReduce Jobs Launched: > Stage-Stage-1: HDFS Read: 3945 HDFS Write: 4226 SUCCESS > Total MapReduce CPU Time Spent: 0 msec > OK > Time taken: 1.975 seconds > hive> select * from tt; > OK > 5 OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW Sales 80282.54 > 7 UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE Engineer 10206.65 > 4 PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ Director 63691.23 > 9 CTDCDYRURBZMBLNWHQNOQCYFFVULOP Engineer 63867.44 > 10 WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK Sales 97720.08 > Time taken: 0.056 seconds, Fetched: 5 row(s) > {code} > 2. Create table in HAWQ > {code} > CREATE TABLE public.tt > (i int, > fname varchar(100), > title varchar(100), > salary float8) > WITH (appendonly=true,orientation=parquet); > {code} > 3. run hawq register > {code} > malilis-MacBook-Pro:Hawq_register malili$ hawq register -d postgres -f > hdfs://localhost:8020/user/hive/warehouse/tt tt > 20170228:17:40:25:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-try > to connect database localhost:5432 postgres > 20170228:17:40:33:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-New > file(s) to be registered: > ['hdfs://localhost:8020/user/hive/warehouse/tt/000000_0'] > hdfscmd: "hadoop fs -mv hdfs://localhost:8020/user/hive/warehouse/tt/000000_0 > hdfs://localhost:8020/hawq_default/16385/16387/49281/1" > 20170228:17:40:41:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-Hawq > Register Succeed. > {code} > 4. select from hawq > {code} > postgres=# select * from tt; > i | fname | title | salary > ----+--------------------------------+-------+---------- > 5 | OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW | | 80282.54 > 7 | UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE | | 10206.65 > 4 | PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ | | 63691.23 > 9 | CTDCDYRURBZMBLNWHQNOQCYFFVULOP | | 63867.44 > 10 | WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK | | 97720.08 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)