[ 
https://issues.apache.org/jira/browse/HIVE-6784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tongjie Chen resolved HIVE-6784.
--------------------------------

    Resolution: Won't Fix

> parquet-hive should allow column type change
> --------------------------------------------
>
>                 Key: HIVE-6784
>                 URL: https://issues.apache.org/jira/browse/HIVE-6784
>             Project: Hive
>          Issue Type: Bug
>          Components: File Formats, Serializers/Deserializers
>    Affects Versions: 0.13.0
>            Reporter: Tongjie Chen
>             Fix For: 0.14.0
>
>         Attachments: HIVE-6784.1.patch.txt, HIVE-6784.2.patch.txt
>
>
> see also in the following parquet issue:
> https://github.com/Parquet/parquet-mr/issues/323
> Currently, if we change parquet format hive table using "alter table 
> parquet_table change c1 c1 bigint " ( assuming original type of c1 is int), 
> it will result in exception thrown from SerDe: 
> "org.apache.hadoop.io.IntWritable cannot be cast to 
> org.apache.hadoop.io.LongWritable" in query runtime.
> This is different behavior from hive (using other file format), where it will 
> try to perform cast (null value in case of incompatible type).
> Parquet Hive's RecordReader returns an ArrayWritable (based on schema stored 
> in footers of parquet files); ParquetHiveSerDe also creates an corresponding 
> ArrayWritableObjectInspector (but using column type info from metastore). 
> Whenever there is column type change, the objector inspector will throw 
> exception, since WritableLongObjectInspector cannot inspect an IntWritable 
> etc...
> Conversion has to happen somewhere if we want to allow type change. SerDe's 
> deserialize method seems a natural place for it.
> Currently, serialize method calls createStruct (then createPrimitive) for 
> every record, but it creates a new object regardless, which seems expensive. 
> I think that could be optimized a bit by just returning the object passed if 
> already of the right type. deserialize also reuse this method, if there is a 
> type change, there will be new object to be created, which I think is 
> inevitable. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to