[
https://issues.apache.org/jira/browse/HIVE-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977710#comment-13977710
]
Anthony Hsu commented on HIVE-6835:
-----------------------------------
Yes, this is possible, but I would have to add these "instanceof AbstractSerde"
checks and then cast the Deserializer as an AbstractSerde before I can use the
new initialize() method. There are dozens of usages of .initialize() and
adding all this type checking/casting code in so many places just for this new
method doesn't seem very clean to me.
Also, if we add the new initialize() method, what should we do for table-level
serde initialization? When dealing with the table, there are no partition
properties, so are we supposed to pass the table properties for both the
tblProps and partProps arguments? If we leave partProps null, then the default
new initialize() method implementation will just pass null to the old
initialize() method.
There doesn't seem to be a very clean way of adding a new initialize() method
without creating a lot of redundant boilerplate code and creating confusion
which initialize() method to use and what values to pass in. Given these
concerns, I feel that prepending "table." might be a cleaner and less confusing
approach. What are your thoughts on this?
> Reading of partitioned Avro data fails if partition schema does not match
> table schema
> --------------------------------------------------------------------------------------
>
> Key: HIVE-6835
> URL: https://issues.apache.org/jira/browse/HIVE-6835
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.12.0
> Reporter: Anthony Hsu
> Assignee: Anthony Hsu
> Attachments: HIVE-6835.1.patch, HIVE-6835.2.patch, HIVE-6835.3.patch
>
>
> To reproduce:
> {code}
> create table testarray (a array<string>);
> load data local inpath '/home/ahsu/test/array.txt' into table testarray;
> # create partitioned Avro table with one array column
> create table avroarray partitioned by (y string) row format serde
> 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties
> ('avro.schema.literal'='{"namespace":"test","name":"avroarray","type":
> "record", "fields": [ { "name":"a", "type":{"type":"array","items":"string"}
> } ] }') STORED as INPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';
> insert into table avroarray partition(y=1) select * from testarray;
> # add an int column with a default value of 0
> alter table avroarray set serde
> 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with
> serdeproperties('avro.schema.literal'='{"namespace":"test","name":"avroarray","type":
> "record", "fields": [ {"name":"intfield","type":"int","default":0},{
> "name":"a", "type":{"type":"array","items":"string"} } ] }');
> # fails with ClassCastException
> select * from avroarray;
> {code}
> The select * fails with:
> {code}
> Failed with exception java.io.IOException:java.lang.ClassCastException:
> org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector
> cannot be cast to
> org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)