[ 
https://issues.apache.org/jira/browse/PARQUET-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839660#comment-15839660
 ] 

Uwe L. Korn commented on PARQUET-845:
-------------------------------------

Storagewise it should not make a difference whether you would have an INT8 or 
an INT32 physical type. Putting 4 INT8s into a single INT32 actually would 
decrease Parquet's efficiency as some of the encoding "tricks" aren't as 
effective anymore. (Usually my INT8 columns takes less than a bit per row when 
stored in Parquet. )

Or are you maybe talking about a particular API that should return INT8s 
instead of INT32s?

> Efficient storage for several INT_8 and INT_16
> ----------------------------------------------
>
>                 Key: PARQUET-845
>                 URL: https://issues.apache.org/jira/browse/PARQUET-845
>             Project: Parquet
>          Issue Type: Wish
>            Reporter: Fernando Pereira
>            Priority: Minor
>
> In very large datasets, aggregating several INT8 into INT32 fields (or byte 
> array) can make a big difference.
> In parquet, efficient algorithms exist for INT32, so if the LogicalType is 
> INT_8 the encoded int might take up only one byte.
> However further optimizations could be made by allowing the user to better 
> specify the types.
> What about BYTE_ARRAY logical type, backed by FIXED_LEN_BYTE_ARRAY type (or 
> eventually INT_32)?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to