[ 
https://issues.apache.org/jira/browse/KUDU-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-2831.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 1.10.0

Resolved via [https://github.com/apache/kudu/commit/6ecafb439]

> DistributedDataGeneratorTest.testGenerateRandomData is flaky
> ------------------------------------------------------------
>
>                 Key: KUDU-2831
>                 URL: https://issues.apache.org/jira/browse/KUDU-2831
>             Project: Kudu
>          Issue Type: Bug
>          Components: spark, test
>    Affects Versions: 1.10.0
>            Reporter: Adar Dembo
>            Assignee: Will Berkeley
>            Priority: Major
>             Fix For: 1.10.0
>
>
> Saw this once last month and again today, so not super flaky but still worth 
> fixing:
> {noformat}
> 1) 
> testGenerateRandomData(org.apache.kudu.spark.tools.DistributedDataGeneratorTest)
> java.lang.AssertionError: expected:<100> but was:<99>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:834)
>       at org.junit.Assert.assertEquals(Assert.java:645)
>       at org.junit.Assert.assertEquals(Assert.java:631)
>       at 
> org.apache.kudu.spark.tools.DistributedDataGeneratorTest.testGenerateRandomData(DistributedDataGeneratorTest.scala:58)
> {noformat}
> I talked about this with [~granthenke] when it last happened. The issue 
> appears to be in the LongAccumulator used to track collisions in the data 
> generator. Before the failure, the test logged this:
> {noformat}
> 02:22:39.533 [INFO - main] (DistributedDataGenerator.scala:134) Rows written: 
> 99
> 02:22:39.533 [INFO - main] (DistributedDataGenerator.scala:135) Collisions: 1
> {noformat}
> The assert code looks like this:
> {noformat}
>     val collisions = ss.sparkContext.longAccumulator("row_collisions").value
>     // Collisions could cause the number of row to be less than the number 
> set.
>     assertEquals(numRows - collisions, rdd.collect.length)
> {noformat}
> So the value of this LongAccumulator was zero even though there was one 
> collision. Our thinking was that accumulators like these were updated 
> asynchronously and so if we don't wait for the entire job to finish, we may 
> not be getting their up-to-date values at assertion time.
> We publish other LongAccumulators in kudu-spark, but AFAICT this is the only 
> one that is asserted on. Nevertheless, it would be great if we could solve 
> this in some generic way so that if someone wrote a test that used a 
> different LongAccumulator, the race could be avoided.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to