[
https://issues.apache.org/jira/browse/ORC-209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16112217#comment-16112217
]
ASF GitHub Bot commented on ORC-209:
------------------------------------
GitHub user mmccline opened a pull request:
https://github.com/apache/orc/pull/147
ORC-209
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mmccline/orc ORC-209
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/orc/pull/147.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #147
----
commit c839a57467842a2476ba5b91c4230ea0db9f8360
Author: Matt McCline <[email protected]>
Date: 2017-08-03T05:11:05Z
ORC-209.03.patch
----
> Add Decimal64 Serialization/Deserialization
> -------------------------------------------
>
> Key: ORC-209
> URL: https://issues.apache.org/jira/browse/ORC-209
> Project: ORC
> Issue Type: Bug
> Reporter: Matt McCline
> Assignee: Matt McCline
> Priority: Critical
> Attachments: ORC-209.01.wip.patch, ORC-209.02.wip.patch,
> ORC-209.03.patch, storage-api.01.wip.patch, storage-api.02.wip.patch
>
>
> Currently, HiveDecimal is serialized in ORC in a special binary bytes format
> as the "value" stream and a secondary stream with the scale for each decimal.
> The decimal has trailing zeroes removed and the scale can vary for each
> decimal. This format has CPU and storage space (i.e. compression)
> inefficiencies.
> The decimal type has a fixed precision and scale. Gopal/Prasanth/Owen have
> suggested storing the decimals with the trailing zeroes (so the scale is a
> constant value for the file from the metadata) and store it as an integer
> stream that can benefit from run-length encoding compression, etc.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)