[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236613#comment-15236613
 ] 

Timothy Farkas commented on APEXMALHAR-2049:
--------------------------------------------

A further detailed example of how repartitioning would work can be seen in the 
example below originally created by [~ishark]

Here is high level approach for supporting dynamic partitioning in HDHT:
1. A parameter is added for total number of buckets to allocate. 
If the buckets is not power of 2, the closest upper bound with power of 2 is 
set as max number of buckets.
This number indicates maximum number of partitions that HDHT operator can scale 
out to.
2. Depending on the original number of partitions, the buckets are distributed 
in a round robin fashion.
3. When the number of partitions are changed dynamically, the buckets are 
re-distributed amongst new partitions.
The physical location of buckets is not changed when redistribution is done. 
Only mapping of partitions to buckets changes.
4. The actual implementation of this mapping is done by assigning Partition 
Mask and Partition keys for each partition as mentioned below:
Partition Mask is set to binary representation of number of buckets instead of 
number of partitions
The Partition Keys are set to include bucket numbers that are being managed by 
partition
Let's consider an example to illustrate:
For example, consider case: initial partition count = 2 and total number of 
buckets = 8
In this scenario, originally following partitions will be deployed:
Partition 0 => Manages Buckets 0, 2, 4, 6 
Partition 1 => Manages Buckets 1, 3, 5, 7
Let us suppose partition count is increased to 4
Dynamic update to partitions, changes the partitions to bucket mapping as below:
Partition 0 => Manages Buckets 0, 4
Partition 1 => Manages Buckets 1, 5
Partition 2 => Manages Buckets 2, 6
Partition 3 => Manages Buckets 3, 7
We can only increase the number of partitions up to 8, since at minimum each 
partition can have 1 bucket. 
In case of partition count = 8, the distribution will be one to one mapping 
betwen partition and bucket, as below:
Partition 0 => Manages Buckets 0
Partition 1 => Manages Buckets 1
Partition 2 => Manages Buckets 3
Partition 3 => Manages Buckets 4
Partition 4 => Manages Buckets 5
Partition 5 => Manages Buckets 6
Partition 6 => Manages Buckets 7
Partition 7 => Manages Buckets 8
Advantages of this method:
There is no physical data movement on re-partitioning
The change required for this is minimum, and only limited to definePartitions

> Managed State - add support for distributing buckets to partition
> -----------------------------------------------------------------
>
>                 Key: APEXMALHAR-2049
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2049
>             Project: Apache Apex Malhar
>          Issue Type: Improvement
>            Reporter: Timothy Farkas
>            Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to