[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573109#comment-14573109
 ] 

Hongchao Deng commented on ZOOKEEPER-2204:
------------------------------------------

+1
The patch looks good.

> LearnerSnapshotThrottlerTest.testHighContentionWithTimeout fails occasionally
> -----------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2204
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2204
>             Project: ZooKeeper
>          Issue Type: Test
>    Affects Versions: 3.5.0
>            Reporter: Donny Nadolny
>            Assignee: Donny Nadolny
>            Priority: Minor
>             Fix For: 3.5.1, 3.6.0
>
>         Attachments: ZOOKEEPER-2204.patch, ZOOKEEPER-2204.patch
>
>
> The {{LearnerSnapshotThrottler}} will only allow 2 concurrent snapshots to be 
> taken, and if there are already 2 snapshots in progress it will wait up to 
> 200ms for one to complete. This isn't enough time for 
> {{testHighContentionWithTimeout}} to consistently pass - on a cold JVM 
> running just the one test I was able to get it to fail 3 times in around 50 
> runs. This 200ms timeout will be hit if there is a delay between a thread 
> calling {{LearnerSnapshot snap = throttler.beginSnapshot(false);}} and 
> {{throttler.endSnapshot();}}.
> This also erroneously fails on the build server, see 
> https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2747/testReport/org.apache.zookeeper.server.quorum/LearnerSnapshotThrottlerTest/testHighContentionWithTimeout/
>  for an example.
> I have bumped the timeout up to 5 seconds (which should be more than enough 
> for warmup / gc pauses), as well as added logging to the {{catch (Exception 
> e)}} block to assist in debugging any future issues.
> An alternate approach would be to separate out results gathered from the 
> threads, because although we only record true/false there are really three 
> outcomes:
> 1. The {{snapshotNumber}} was <= 2, meaning the individual call operated 
> correctly
> 2. The {{snapshotNumber}} was > 2, meaning the test should definitely fail
> 3. We were unable to snapshot in the time given, so we can't determine if we 
> should fail or pass (although if we have "enough" successes from #1 with no 
> failures from #2 maybe we would pass the test anyway).
> Bumping up the timeout is easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to