[ 
https://issues.apache.org/jira/browse/HADOOP-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619255#comment-14619255
 ] 

Chris Li commented on HADOOP-12189:
-----------------------------------

[~arpitagarwal] I think [~zxu] encountered unit test failures, which brought 
his attention here. 

If increasing checkpoints works then we should do that. I don't think we should 
introduce a new config parameter though... this is something nobody will ever 
modify and controls such a low level detail. I'd suggest experimenting with 
what passes and then increasing the tolerance by an order of magnitude to be 
safe... so if you can get it to pass with 10 checks at 2ms pause, then we can 
do 20-100 checks at 2ms pause (as long as the total wait time is < 1 second). 
This is basically mostly for developer's sakes, since as Arpit mentioned, 
dropping is a rarity even during queue swaps in real life.

> CallQueueManager may drop elements from the queue sometimes when calling 
> swapQueue
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-12189
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12189
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc, test
>    Affects Versions: 2.7.1
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>         Attachments: HADOOP-12189.000.patch, HADOOP-12189.001.patch, 
> HADOOP-12189.none_guarantee.000.patch
>
>
> CallQueueManager may drop elements from the queue sometimes when calling 
> {{swapQueue}}. 
> The following test failure from TestCallQueueManager shown some elements in 
> the queue are dropped.
> https://builds.apache.org/job/PreCommit-HADOOP-Build/7150/testReport/org.apache.hadoop.ipc/TestCallQueueManager/testSwapUnderContention/
> {code}
> java.lang.AssertionError: expected:<27241> but was:<27245>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:743)
>       at org.junit.Assert.assertEquals(Assert.java:118)
>       at org.junit.Assert.assertEquals(Assert.java:555)
>       at org.junit.Assert.assertEquals(Assert.java:542)
>       at 
> org.apache.hadoop.ipc.TestCallQueueManager.testSwapUnderContention(TestCallQueueManager.java:220)
> {code}
> It looked like the elements in the queue are dropped due to 
> {{CallQueueManager#swapQueue}}
> Looked at the implementation of {{CallQueueManager#swapQueue}}, there is a 
> possibility that the elements in the queue are dropped. If the queue is full, 
> the calling thread for {{CallQueueManager#put}} is blocked for long time. It 
> may put the element into the old queue after queue in {{takeRef}} is changed 
> by swapQueue, then this element in the old queue will be dropped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to