Ivan Andika created HDDS-15338:
----------------------------------

             Summary: Fix possible skew for FixedThreadPoolWithAffinityExecutor
                 Key: HDDS-15338
                 URL: https://issues.apache.org/jira/browse/HDDS-15338
             Project: Apache Ozone
          Issue Type: Sub-task
            Reporter: Ivan Andika
            Assignee: Ivan Andika


Currently, the implementation of hash for FixedThreadPoolWithAffinityExecutor 
(used to group the ContainerReportQueue assigned for container reports) is as 
follow

{code:java}
    // For messages that need to be routed to the same thread need to
    // implement hashCode to match the messages. This should be safe for
    // other messages that implement the native hash.
    int index = message.hashCode() & (workQueues.size() - 1);
    BlockingQueue<Q> queue = workQueues.get(index);
{code}

However, this might cause skew when the queueCount is not a power of two (like 
the when the default is 10)

For queueCount = 10

{code:java}
queueCount - 1 = 9
9 in binary = 1001
{code}

So the routing becomes:

{code:java}
queueIndex = hash & 9
{code}

That means only the bits represented by 1001 are kept. The possible results are 
only:

{code:java}
0  -> 0000
1  -> 0001
8  -> 1000
9  -> 1001
{code}

So with 10 queues, only queues 0, 1, 8, and 9 can be selected. Queues 2, 3, 4, 
5, 6, 7 are effectively unused.

Example

{code:java}
hash 0  -> 0000 & 1001 = 0000 = queue 0
hash 1  -> 0001 & 1001 = 0001 = queue 1
hash 2  -> 0010 & 1001 = 0000 = queue 0
hash 3  -> 0011 & 1001 = 0001 = queue 1
hash 4  -> 0100 & 1001 = 0000 = queue 0
hash 5  -> 0101 & 1001 = 0001 = queue 1
hash 8  -> 1000 & 1001 = 1000 = queue 8
hash 9  -> 1001 & 1001 = 1001 = queue 9
hash 10 -> 1010 & 1001 = 1000 = queue 8
hash 11 -> 1011 & 1001 = 1001 = queue 9
{code}

*So the configured pool says “10 FCR processing queues”, but in practice it 
behaves closer to 4 active queues.*

This is observed in the MAT tool check in HDDS-15330.

If the queue count were 16:

{code:java}
queueCount - 1 = 15
15 in binary = 1111
queueIndex = hash & 15
{code}

Then possible queue indexes are 0..15, so all 16 queues can be used.

Currently, we can set the queue count to power of two (e.g. 16).

However, to fix the skew entirely, we can use modulo instead

For example,
{code:java}
index = Math.floorMod(hash, queueCount);
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to