[ 
https://issues.apache.org/jira/browse/HIVE-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408135#comment-13408135
 ] 

Thejas M Nair commented on HIVE-3168:
-------------------------------------

Neha,
Try 'ant clean' on the hive dir before building it. 
To verify that the patched hive code is getting used, try adding a debug 
message in LazyBinaryObjectInspector around line 43 (after applying patch) 
(note the debug message would come out of map task). Or change the data setting 
the first byte, and see if changes your result -
{code}
    System.arraycopy(bWritable.getBytes(), 0, data, 0, bWritable.getLength());
    ba.setData(data);
{code}
to 
{code}
    System.arraycopy(bWritable.getBytes(), 0, data, 0, bWritable.getLength());
    data[0]='Z';
    ba.setData(data);
{code}
I am not sure if I tested with hcat 0.4 branch or trunk. But I don't think 
there has been any change that should affect this behavior, it should work with 
hcat 0.4 release as well.

                
> LazyBinaryObjectInspector.getPrimitiveJavaObject copies beyond length of 
> underlying BytesWritable
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-3168
>                 URL: https://issues.apache.org/jira/browse/HIVE-3168
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.9.0
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.10.0, 0.9.1
>
>         Attachments: HIVE-3168.1.patch, HIVE-3168.2.patch
>
>
> LazyBinaryObjectInspector.getPrimitiveJavaObject copies the full capacity of 
> the LazyBinary's underlying BytesWritable object, which can be greater than 
> the size of the actual contents. 
> This leads to additional characters at the end of the ByteArrayRef returned. 
> When the LazyBinary object gets re-used, there can be remnants of the later 
> portion of previous entry. 
> This was not seen while reading through hive queries, which I think is 
> because a copy elsewhere seems to create LazyBinary with length == capacity. 
> (probably LazyBinary copy constructor). This was seen when MR or pig used 
> Hcatalog to read the data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to