[jira] [Created] (PARQUET-723) parquet is not storing the type for the column.

Narasimha (JIRA) Wed, 21 Sep 2016 11:01:45 -0700

Narasimha created PARQUET-723:
---------------------------------

             Summary: parquet is not storing the type for the column.
                 Key: PARQUET-723
                 URL: https://issues.apache.org/jira/browse/PARQUET-723
             Project: Parquet
          Issue Type: Bug
          Components: parquet-format
            Reporter: Narasimha



1. Create Text file format table 
        CREATE EXTERNAL TABLE IF NOT EXISTS emp(
        id INT,
        first_name STRING,
        last_name STRING,
        dateofBirth STRING,
        join_date INT
        )
        COMMENT 'This is Employee Table Date Of Birth of type String'
        ROW FORMAT DELIMITED
        FIELDS TERMINATED BY ','
        LINES TERMINATED BY '\n'
        STORED AS TEXTFILE
        LOCATION '/user/employee/beforePartition';

2. Load the data into table
        load data inpath '/user/somupoc_timestamp/employeeData_partitioned.csv' 
into table emp;
        select * from emp;

3. Create Partitioned table with file format as Parquet (dateofBirth STRING))

        create external table emp_afterpartition(
        id int, first_name STRING, last_name STRING, dateofBirth STRING)
        COMMENT 'Employee partitioned table with dateofBirth of type string'
        partitioned by (join_date int)
        STORED as parquet
        LOCATION '/user/employee/afterpartition';

4.  Fetch the data from Partitioned column

        set hive.exec.dynamic.partition=true;  
        set hive.exec.dynamic.partition.mode=nonstrict; 
        insert overwrite table emp_afterpartition partition (join_date) select 
* from emp;
        select * from emp_afterpartition;
5. Create Partitioned table with file format as Parquet (dateofBirth TIMESTAMP))

        CREATE EXTERNAL TABLE IF NOT EXISTS 
employee_afterpartition_timestamp_parq(
        id INT,first_name STRING,last_name STRING,dateofBirth TIMESTAMP)
        COMMENT 'employee partitioned table with dateofBirth of type TIMESTAMP'
        PARTITIONED BY (join_date INT)
        STORED AS PARQUET
        LOCATION '/user/employee/afterpartition';

        select * from employee_afterpartition_timestamp_parq;
        -- 0 records returned
        impala ::       alter table employee_afterpartition_timestamp_parq 
RECOVER PARTITIONS;
        Hive ::         MSCK REPAIR TABLE 
employee_afterpartition_timestamp_parq;
        -- MSCK works in Hive and  RECOVER PARTITIONS works in Impala -- 
metastore check command with the repair table option:

        select * from employee_afterpartition_timestamp_parq;

Actual Result :: Failed with exception 
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
org.apache.hadoop.hive.serde2.io.TimestampWritable

Expected Result :: Data should display

Note: if file format is text file instead of Parquet then I am able to fetch 
the data.
Observation : Two tables having different column type pointing to same 
location(HDFS ).

sample Data
=========

1,Joyce,Garza,2016-07-17 14:42:18,201607
2,Jerry,Ortiz,2016-08-17 21:36:54,201608
3,Steven,Ryan,2016-09-10 01:32:40,201609
4,Lisa,Black,2015-10-12 15:05:13,201610
5,Jose,Turner,2015-011-10 06:38:40,201611
6,Joyce,Garza,2016-08-02,201608
7,Jerry,Ortiz,2016-01-01,201601
8,Steven,Ryan,2016/08/20,201608
9,Lisa,Black,2016/09/12,201609
10,Jose,Turner,09/19/2016,201609
11,Jose,Turner,20160915,201609





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (PARQUET-723) parquet is not storing the type for the column.

Reply via email to