ivandika3 opened a new pull request, #10325:
URL: https://github.com/apache/ozone/pull/10325

   ## What changes were proposed in this pull request?
   
   Currently, the implementation of hash for 
FixedThreadPoolWithAffinityExecutor (used to group the ContainerReportQueue 
assigned for container reports) is as follow
   
   {code:java}
       // For messages that need to be routed to the same thread need to
       // implement hashCode to match the messages. This should be safe for
       // other messages that implement the native hash.
       int index = message.hashCode() & (workQueues.size() - 1);
       BlockingQueue<Q> queue = workQueues.get(index);
   {code}
   
   However, this might cause skew when the queueCount is not a power of two 
(like the when the default is 10)
   
   For queueCount = 10
   
   {code:java}
   queueCount - 1 = 9
   9 in binary = 1001
   {code}
   
   So the routing becomes:
   
   {code:java}
   queueIndex = hash & 9
   {code}
   
   That means only the bits represented by 1001 are kept. The possible results 
are only:
   
   {code:java}
   0  -> 0000
   1  -> 0001
   8  -> 1000
   9  -> 1001
   {code}
   
   So with 10 queues, only queues 0, 1, 8, and 9 can be selected. Queues 2, 3, 
4, 5, 6, 7 are effectively unused.
   
   Example
   
   {code:java}
   hash 0  -> 0000 & 1001 = 0000 = queue 0
   hash 1  -> 0001 & 1001 = 0001 = queue 1
   hash 2  -> 0010 & 1001 = 0000 = queue 0
   hash 3  -> 0011 & 1001 = 0001 = queue 1
   hash 4  -> 0100 & 1001 = 0000 = queue 0
   hash 5  -> 0101 & 1001 = 0001 = queue 1
   hash 8  -> 1000 & 1001 = 1000 = queue 8
   hash 9  -> 1001 & 1001 = 1001 = queue 9
   hash 10 -> 1010 & 1001 = 1000 = queue 8
   hash 11 -> 1011 & 1001 = 1001 = queue 9
   {code}
   
   *So the configured pool says “10 FCR processing queues”, but in practice it 
behaves closer to 4 active queues.*
   
   This is observed in the MAT tool check in HDDS-15330.
   
   If the queue count were 16:
   
   {code:java}
   queueCount - 1 = 15
   15 in binary = 1111
   queueIndex = hash & 15
   {code}
   
   Then possible queue indexes are 0..15, so all 16 queues can be used.
   
   Currently, we can set the queue count to power of two (e.g. 16).
   
   However, to fix the skew entirely, we can use modulo instead
   
   For example,
   {code:java}
   index = Math.floorMod(hash, queueCount);
   {code}
   
   Hopefully through this we can ensure FCR is processed quickly and SCM memory 
can be freed.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-15338
   
   ## How was this patch tested?
   
   UT.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to