Andrei Stankevich created PARQUET-1046:
------------------------------------------

             Summary: Impossible to read thrift object from parquet file if it 
has List<Enum> field that was removed from thrift schema.
                 Key: PARQUET-1046
                 URL: https://issues.apache.org/jira/browse/PARQUET-1046
             Project: Parquet
          Issue Type: Bug
            Reporter: Andrei Stankevich


If thrift class has a field with type List<some_enum> ParquetReader makes 
list's elements type as enum (type id = 16) but it has to make it Int32.

What happens is all fields that have field type as enum in thrift schema file 
in java class have field type as Int32. Same is true for List fields if list's 
elements are enum.

But when ParquetReader creates an object it uses type enum for list's elements 
instead of Int32.
Because of this fact we have an issue. We can not remove list field if it has 
enum elements. If we remove field like this from schema file but it will 
present in parquet file, when ParquetReader reads this field it tries to skip it
because this field is not in the schema and it calls method TProtocolUtil.skip 
method with type = 15 for list and then it calls same method for each list 
element with type 16 for enum but TProtocolUtil.skip doesn't have
this type in switch-case and it is not skipping list elements and because of 
this it throws exception later when it tries to skip List end.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to