[ 
https://issues.apache.org/jira/browse/HADOOP-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615900#comment-14615900
 ] 

Chris Li commented on HADOOP-12189:
-----------------------------------

Hi [~zxu], I see, it's a synchronized map, so its methods acquire the queue's 
intrinsic lock. Thanks for clarifying

I have concerns with performance. Have you measured impact? Last I checked, 
acquiring a lock every time resulted in measurable performance penalty. It's 
not just time under contention, it's also the fact that synchronized will 
introduce a memory barrier, and having 120 threads contesting over yet another 
lock when processing 60k requests per second sounds questionable. 

I think it all comes down to performance. The unsafe "sleep 10ms and try again" 
scheme was born from that (since queue swapping is a relatively infrequent 
event compared to {{put()}}. If the performance situation has changed since we 
last discussed this, then we can come up with better solutions that are 
guarantee consistency.

So options are:
1. Start locking and provide strict consistency (and then we don't need to 
Thread.sleep)
2. Try to increase sleep timeout or increase number of checks to make call loss 
practically impossible
3. Relax tests and accept the loss of calls during queue swaps

[~daryn] might have some thoughts.

> CallQueueManager may drop elements from the queue sometimes when calling 
> swapQueue
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-12189
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12189
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc, test
>    Affects Versions: 2.7.1
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>         Attachments: HADOOP-12189.000.patch
>
>
> CallQueueManager may drop elements from the queue sometimes when calling 
> {{swapQueue}}. 
> The following test failure from TestCallQueueManager shown some elements in 
> the queue are dropped.
> https://builds.apache.org/job/PreCommit-HADOOP-Build/7150/testReport/org.apache.hadoop.ipc/TestCallQueueManager/testSwapUnderContention/
> {code}
> java.lang.AssertionError: expected:<27241> but was:<27245>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:743)
>       at org.junit.Assert.assertEquals(Assert.java:118)
>       at org.junit.Assert.assertEquals(Assert.java:555)
>       at org.junit.Assert.assertEquals(Assert.java:542)
>       at 
> org.apache.hadoop.ipc.TestCallQueueManager.testSwapUnderContention(TestCallQueueManager.java:220)
> {code}
> It looked like the elements in the queue are dropped due to 
> {{CallQueueManager#swapQueue}}
> Looked at the implementation of {{CallQueueManager#swapQueue}}, there is a 
> possibility that the elements in the queue are dropped. If the queue is full, 
> the calling thread for {{CallQueueManager#put}} is blocked for long time. It 
> may put the element into the old queue after queue in {{takeRef}} is changed 
> by swapQueue, then this element in the old queue will be dropped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to