[ 
https://issues.apache.org/jira/browse/HIVE-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965247#comment-13965247
 ] 

Lefty Leverenz commented on HIVE-6382:
--------------------------------------

For the record:  this adds the configuration parameter 
*hive.exec.orc.skip.corrupt.data* to HiveConf.java and 
hive-default.xml.template.

> PATCHED_BLOB encoding in ORC will corrupt data in some cases
> ------------------------------------------------------------
>
>                 Key: HIVE-6382
>                 URL: https://issues.apache.org/jira/browse/HIVE-6382
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.13.0
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>              Labels: orcfile
>             Fix For: 0.13.0
>
>         Attachments: HIVE-6382.1.patch, HIVE-6382.2.patch, HIVE-6382.3.patch, 
> HIVE-6382.4.patch, HIVE-6382.5.patch, HIVE-6382.6.patch
>
>
> In PATCHED_BLOB encoding (added in HIVE-4123), gapVsPatchList is an array of 
> long that stores gap (g) between the values that are patched and the patch 
> value (p). The maximum distance of gap can be 511 that require 8 bits to 
> encode. And patch values can take more than 56 bits. When patch values take 
> more than 56 bits, p + g will become > 64 bits which cannot be packed to a 
> long. This will result in data corruption under the case where patch values 
> are > 56 bits. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to