[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614992#comment-13614992
 ] 

Alexander Shraer commented on ZOOKEEPER-1629:
---------------------------------------------

After some debugging, here's what seems to be the problem.
There were some timing related failures, which the attached patch mostly 
addresses, revealing a different problem.

The failure above is that zk1 sees znode /test2 but zk2 doesn't.
>From the log:
2013-03-27 02:39:59,438 [myid:] - INFO  [main:TruncateCorruptionTest@160] - 
List of children at zk2 before zk1 became master
2013-03-27 02:39:59,440 [myid:] - INFO  [main:TruncateCorruptionTest@162] - 
[test, zookeeper, test3]
2013-03-27 02:39:59,440 [myid:] - INFO  [main:TruncateCorruptionTest@164] - 
List of children at zk1 before zk1 became master
2013-03-27 02:39:59,442 [myid:] - INFO  [main:TruncateCorruptionTest@166] - 
[test, zookeeper, test2, test3]

The test is designed in a way that /test2 is first committed to servers 1 and 
3, but then the test deletes the data dir of server 3,
disconnects server 1, has server 3 form a quorum with server 2 and when server 
1 connects to the new ensemble it is being forced
to truncate the committed transaction that created /test2. So why does it still
have /test2 in its data tree ? This is because earlier it managed to make a 
snapshot (39:17). Truncate doesn't touch the snapshot. After the truncate when 
we load the database, we first start from the snapshot, then apply the 
truncated log. So /test2 showing up is perfectly OK in this case. 

If we want to keep the current structure of this test, we should disable 
snapshotting for its duration. Is there a way to do that ?




                
> testTrancationLogCorruption occasionally fails
> ----------------------------------------------
>
>                 Key: ZOOKEEPER-1629
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1629
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: tests
>            Reporter: Alexander Shraer
>         Attachments: TruncateCorruptionTest-patch.patch
>
>
> It seems that testTransactionLogCorruption is very flaky,for example fails 
> here:
> https://builds.apache.org/job/ZooKeeper-trunk-jdk7/500/
> https://builds.apache.org/job/ZooKeeper-trunk-jdk7/502/
> https://builds.apache.org/job/ZooKeeper-trunk-jdk7/503/#showFailuresLink
> also fails for older builds (no longer on the website), for example all 
> builds from 381 to 399.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to