[ 
https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15454112#comment-15454112
 ] 

Matt McCline commented on HIVE-14451:
-------------------------------------

There are 2 improvements in the patch.

First, when the input bytes being deserialized are immutable and it is safe to 
retain references (e.g. hash table entry), the VectorDeserializeRow has an 
alternate deserializeByRef method than can be called.  This avoids an 
unnecessary buffer copy operation.

Also, when BinarySortable and LazySimple have to "unescape" data in the input 
buffer to produce the string/char/varchar/binary result, a preallocation scheme 
is used where the (scratch) buffer in BytesColumnVector is made available to be 
used directly as the target buffer.  This avoids an extra buffer copy operation.

> Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
> --------------------------------------------------------------------------
>
>                 Key: HIVE-14451
>                 URL: https://issues.apache.org/jira/browse/HIVE-14451
>             Project: Hive
>          Issue Type: Improvement
>          Components: Vectorization
>            Reporter: Gopal V
>            Assignee: Matt McCline
>         Attachments: HIVE-14451.01.patch, HIVE-14451.02.patch
>
>
> In a majority of cases, when using the OptimizedHashMap, the references to 
> the byte[] are immutable. 
> The hashmap result always allocates on boundary conditions, but never mutates 
> a previous buffer.
> Copying Strings out of the hashtable is entirely wasteful and it would be 
> easy to know when the currentBytes is a borrowed slice from the original 
> input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to