[ https://issues.apache.org/jira/browse/KUDU-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Grant Henke resolved KUDU-2831. ------------------------------- Resolution: Fixed Fix Version/s: 1.10.0 Resolved via [https://github.com/apache/kudu/commit/6ecafb439] > DistributedDataGeneratorTest.testGenerateRandomData is flaky > ------------------------------------------------------------ > > Key: KUDU-2831 > URL: https://issues.apache.org/jira/browse/KUDU-2831 > Project: Kudu > Issue Type: Bug > Components: spark, test > Affects Versions: 1.10.0 > Reporter: Adar Dembo > Assignee: Will Berkeley > Priority: Major > Fix For: 1.10.0 > > > Saw this once last month and again today, so not super flaky but still worth > fixing: > {noformat} > 1) > testGenerateRandomData(org.apache.kudu.spark.tools.DistributedDataGeneratorTest) > java.lang.AssertionError: expected:<100> but was:<99> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.kudu.spark.tools.DistributedDataGeneratorTest.testGenerateRandomData(DistributedDataGeneratorTest.scala:58) > {noformat} > I talked about this with [~granthenke] when it last happened. The issue > appears to be in the LongAccumulator used to track collisions in the data > generator. Before the failure, the test logged this: > {noformat} > 02:22:39.533 [INFO - main] (DistributedDataGenerator.scala:134) Rows written: > 99 > 02:22:39.533 [INFO - main] (DistributedDataGenerator.scala:135) Collisions: 1 > {noformat} > The assert code looks like this: > {noformat} > val collisions = ss.sparkContext.longAccumulator("row_collisions").value > // Collisions could cause the number of row to be less than the number > set. > assertEquals(numRows - collisions, rdd.collect.length) > {noformat} > So the value of this LongAccumulator was zero even though there was one > collision. Our thinking was that accumulators like these were updated > asynchronously and so if we don't wait for the entire job to finish, we may > not be getting their up-to-date values at assertion time. > We publish other LongAccumulators in kudu-spark, but AFAICT this is the only > one that is asserted on. Nevertheless, it would be great if we could solve > this in some generic way so that if someone wrote a test that used a > different LongAccumulator, the race could be avoided. -- This message was sent by Atlassian JIRA (v7.6.3#76005)