[
https://issues.apache.org/jira/browse/HIVE-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934980#action_12934980
]
Siying Dong commented on HIVE-1802:
-----------------------------------
For any Group by, we needed 2 mem-copies. One from Text objects to buffer, one
add an extra tag to the end of the buffer.
Now, the case with single Text takes no mem-copy (except the first byte is 0)
and for multiple keys it needs one (from Text object to buffer).
For join, we needed 2 mem-copies. One from Text to buffer, one add tag.
Now one single Text needs one copy from buffer to add a tag. Other cases we
still need two copies.
> Encode MapReduce Shuffling Keys Differently for Single string/bigint Key
> -------------------------------------------------------------------------
>
> Key: HIVE-1802
> URL: https://issues.apache.org/jira/browse/HIVE-1802
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Attachments: HIVE-1802.1.patch, HIVE-1802.2.patch
>
>
> Delimiters are not needed if we only have one shuffling key, and in the same
> time escaping delimiters are not needed. We can save some CPU time on
> serializing and shuffle slightly less amount of data to save memory footprint
> and network traffic.
> Also there is a bug that for group-by, we by mistake add a -1 to the end of
> the key and pay one more unnecessary mem-copy. Can be easily fixed.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.