[ 
https://issues.apache.org/jira/browse/HADOOP-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615742#comment-14615742
 ] 

zhihai xu commented on HADOOP-12189:
------------------------------------

Thanks for the review [~chrili]!
bq. but this should be super rare, like a theoretical concern more than a 
practical one, so if you're seeing dropped calls we should fix that.
Yes, it may rarely happen but the original test will fail Intermittently, The 
updated unit test can reproduce this problem easily.
bq. this code should be lock free, so we can't depend on using synchronized() 
blocks.
Yes, there will be some performance degradation. But the code inside the lock 
is very small. the performance overhead with the patch is not much. So it will 
depend on what is the impact if we drop the elements in the queue. 
{code}
synchronized (queueMap) {
bq = putRef.get();
num = queueMap.get(bq);
num.incrementAndGet();
}
{code}
bq. Also it seems that this code doesn't provide 100% guarantee either, since 
the queue can still be swapped in between bq = putRef.get() and 
num.incrementAndGet()
The patch can handle the above situation. If the queue is swapped between {{bq 
= putRef.get()}} and {{num.incrementAndGet()}}, {{bq}} will be oldQ, because of 
the lock for synchronizedMap {{queueMap}}, {{queueMap.remove(oldQ);}} can only 
be called after {{num.incrementAndGet()}}. This will make sure {{swapQueue}} 
see the updated {{num}}. So I think the patch can guarantee the elements added 
by {{put}} won't be dropped.

> CallQueueManager may drop elements from the queue sometimes when calling 
> swapQueue
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-12189
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12189
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc, test
>    Affects Versions: 2.7.1
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>         Attachments: HADOOP-12189.000.patch
>
>
> CallQueueManager may drop elements from the queue sometimes when calling 
> {{swapQueue}}. 
> The following test failure from TestCallQueueManager shown some elements in 
> the queue are dropped.
> https://builds.apache.org/job/PreCommit-HADOOP-Build/7150/testReport/org.apache.hadoop.ipc/TestCallQueueManager/testSwapUnderContention/
> {code}
> java.lang.AssertionError: expected:<27241> but was:<27245>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:743)
>       at org.junit.Assert.assertEquals(Assert.java:118)
>       at org.junit.Assert.assertEquals(Assert.java:555)
>       at org.junit.Assert.assertEquals(Assert.java:542)
>       at 
> org.apache.hadoop.ipc.TestCallQueueManager.testSwapUnderContention(TestCallQueueManager.java:220)
> {code}
> It looked like the elements in the queue are dropped due to 
> {{CallQueueManager#swapQueue}}
> Looked at the implementation of {{CallQueueManager#swapQueue}}, there is a 
> possibility that the elements in the queue are dropped. If the queue is full, 
> the calling thread for {{CallQueueManager#put}} is blocked for long time. It 
> may put the element into the old queue after queue in {{takeRef}} is changed 
> by swapQueue, then this element in the old queue will be dropped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to