[
https://issues.apache.org/jira/browse/TEZ-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rajesh Balamohan updated TEZ-972:
---------------------------------
Status: Patch Available (was: Open)
> Shuffle Phase - optimize memory usage of empty partition data in
> DataMovementEvent
> ----------------------------------------------------------------------------------
>
> Key: TEZ-972
> URL: https://issues.apache.org/jira/browse/TEZ-972
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: TEZ-972-v1.patch, TEZ-972-v2.patch, TEZ-972-v3.patch
>
>
> Empty partition details are stored in byte[] in compressed format and sent
> via DataMovementEvent in shuffle phase. Quick standalone tests reveals that
> BitSet would be more efficient than compressing the byte[].
> PartitionSize=1 , BitSetSize=1 , CompressedBitSetSize=9 ,
> NormalByteArrayCompressed=9
> PartitionSize=101 , BitSetSize=13 , CompressedBitSetSize=22 ,
> NormalByteArrayCompressed=42
> PartitionSize=201 , BitSetSize=26 , CompressedBitSetSize=37 ,
> NormalByteArrayCompressed=62
> PartitionSize=301 , BitSetSize=38 , CompressedBitSetSize=49 ,
> NormalByteArrayCompressed=76
> ..
> PartitionSize=1001 , BitSetSize=126 , CompressedBitSetSize=137 ,
> NormalByteArrayCompressed=197
> ..
> PartitionSize=2001 , BitSetSize=251 , CompressedBitSetSize=262 ,
> NormalByteArrayCompressed=374
> PartitionSize=4001 , BitSetSize=501 , CompressedBitSetSize=512 ,
> NormalByteArrayCompressed=686
> PartitionSize=8001 , BitSetSize=1001 , CompressedBitSetSize=1012 ,
> NormalByteArrayCompressed=1330
> PartitionSize=16001 , BitSetSize=2001 , CompressedBitSetSize=1979 ,
> NormalByteArrayCompressed=2569
> PartitionSize=32001 , BitSetSize=4001 , CompressedBitSetSize=3885 ,
> NormalByteArrayCompressed=5000
> -This is based on considering random bit positions as empty partitions.
> It is not possible to directly use JDK 1.6's BitSet directly as it does not
> support valueOf, toByteArray() functions. Suggestion is to have Tez specific
> BitSet (until Tez moves to JDK 1.7) and make the compression as a job
> configuration.
--
This message was sent by Atlassian JIRA
(v6.2#6252)