[jira] [Commented] (HIVE-12080) Support auto type widening for Parquet table

JIRA Tue, 20 Oct 2015 08:38:48 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-12080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965258#comment-14965258
 ]


Sergio Peña commented on HIVE-12080:
------------------------------------

The patch looks good. It is exactly what I had in mind.
I just have some questions and comments.

1. {{private TypeInfo hiveTypeInfo;}}
This variable is a member of the class, and the value is assigned on {{init}}, 
and read on {{prepareForRead}}. This means
that the class must be instantiated in order to call both init and 
prepareForRead.

I don't see where this class {{DataWritableReadSupport}} is instantiated 
besides this code below on {{ParquetRecordReaderWrapper}}:
{code}
final ReadContext readContext = new DataWritableReadSupport().init(new 
InitContext(jobConf,
          null, fileMetaData.getSchema()));
{code}

Question: How is the {{hiveTypeInfo}} value able to have the same value on 
prepareForRead? I don't remember how this works. 

2. I think we need to do add more test cases. We should test maps, arrays and 
primitive columns as well. You added the struct on 
{{parquet_schema_type_evolution.q}}. Let's add more to be sure we test every 
change. Another test case I have in mind is to have
2 parquet files, one with {{int}} and another with {{bigint}}. This way we 
could test the mix of both integers.

Also, should we rename the .q file? something like, 
parquet_schema_column_type_promotion.q? I don't know, I am just brainstorming.

3. {{ETypeConverter.java}} is exactly what I had in mind. Good job for that. 
I am just having this developer idea if we can do something like [~rdblue] 
mentioned where we can use an interface, such as NumericWritable where it has 
getInt(), getLong(), etc. implemented. I don't know how yet, but if you can 
think on something that can make this etypeconverter class look clean would be 
great. If not, don't worry. This patch does the work we are expecting.

4. Are you going to support other types, like short? strings? I would like to 
get rid of ParquetShortInspector, ParquetStringInspector and 
ParquetByteInspector. 

Overall, the patch looks pretty good. Next time, could you create a Review 
Board so I can leave the comments directly? This would be much better.

Thanks.

> Support auto type widening for Parquet table
> --------------------------------------------
>
>                 Key: HIVE-12080
>                 URL: https://issues.apache.org/jira/browse/HIVE-12080
>             Project: Hive
>          Issue Type: New Feature
>          Components: File Formats
>            Reporter: Mohammad Kamrul Islam
>            Assignee: Mohammad Kamrul Islam
>         Attachments: HIVE-12080.1.patch
>
>
> Currently Hive+Parquet doesn't support it. It should include at least basic 
> type promotions short->int->bigint,  float->double etc, that are already 
> supported for  other file formats.
> There were similar effort (Hive-6784) but was not committed. This JIRA is to 
> address the same in different way with little (no) performance impact.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12080) Support auto type widening for Parquet table

Reply via email to