Narasimha created PARQUET-723: --------------------------------- Summary: parquet is not storing the type for the column. Key: PARQUET-723 URL: https://issues.apache.org/jira/browse/PARQUET-723 Project: Parquet Issue Type: Bug Components: parquet-format Reporter: Narasimha
1. Create Text file format table CREATE EXTERNAL TABLE IF NOT EXISTS emp( id INT, first_name STRING, last_name STRING, dateofBirth STRING, join_date INT ) COMMENT 'This is Employee Table Date Of Birth of type String' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/employee/beforePartition'; 2. Load the data into table load data inpath '/user/somupoc_timestamp/employeeData_partitioned.csv' into table emp; select * from emp; 3. Create Partitioned table with file format as Parquet (dateofBirth STRING)) create external table emp_afterpartition( id int, first_name STRING, last_name STRING, dateofBirth STRING) COMMENT 'Employee partitioned table with dateofBirth of type string' partitioned by (join_date int) STORED as parquet LOCATION '/user/employee/afterpartition'; 4. Fetch the data from Partitioned column set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; insert overwrite table emp_afterpartition partition (join_date) select * from emp; select * from emp_afterpartition; 5. Create Partitioned table with file format as Parquet (dateofBirth TIMESTAMP)) CREATE EXTERNAL TABLE IF NOT EXISTS employee_afterpartition_timestamp_parq( id INT,first_name STRING,last_name STRING,dateofBirth TIMESTAMP) COMMENT 'employee partitioned table with dateofBirth of type TIMESTAMP' PARTITIONED BY (join_date INT) STORED AS PARQUET LOCATION '/user/employee/afterpartition'; select * from employee_afterpartition_timestamp_parq; -- 0 records returned impala :: alter table employee_afterpartition_timestamp_parq RECOVER PARTITIONS; Hive :: MSCK REPAIR TABLE employee_afterpartition_timestamp_parq; -- MSCK works in Hive and RECOVER PARTITIONS works in Impala -- metastore check command with the repair table option: select * from employee_afterpartition_timestamp_parq; Actual Result :: Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritable Expected Result :: Data should display Note: if file format is text file instead of Parquet then I am able to fetch the data. Observation : Two tables having different column type pointing to same location(HDFS ). sample Data ========= 1,Joyce,Garza,2016-07-17 14:42:18,201607 2,Jerry,Ortiz,2016-08-17 21:36:54,201608 3,Steven,Ryan,2016-09-10 01:32:40,201609 4,Lisa,Black,2015-10-12 15:05:13,201610 5,Jose,Turner,2015-011-10 06:38:40,201611 6,Joyce,Garza,2016-08-02,201608 7,Jerry,Ortiz,2016-01-01,201601 8,Steven,Ryan,2016/08/20,201608 9,Lisa,Black,2016/09/12,201609 10,Jose,Turner,09/19/2016,201609 11,Jose,Turner,20160915,201609 -- This message was sent by Atlassian JIRA (v6.3.4#6332)