[ https://issues.apache.org/jira/browse/SPARK-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Armbrust updated SPARK-1994: ------------------------------------ Fix Version/s: 1.1.0 1.0.1 > Aggregates return incorrect results on first execution > ------------------------------------------------------ > > Key: SPARK-1994 > URL: https://issues.apache.org/jira/browse/SPARK-1994 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.0.0 > Reporter: Michael Armbrust > Assignee: Michael Armbrust > Priority: Blocker > Fix For: 1.0.1, 1.1.0 > > > [~adav] has a full reproduction but he has found a case where the first run > returns corrupted results, but the second case does not. The same does not > occur when reading from HDFS a second time... > {code} > sql("SELECT lang, COUNT(*) AS cnt FROM tweetTable GROUP BY lang ORDER BY cnt > DESC").collect.foreach(println) > [bg,16636] > [16266,16266] > [16223,16223] > [16161,16161] > [16047,16047] > [lt,11405] > [hu,11380] > [el,10845] > [da,10289] > [fi,10261] > [9897,9897] > [9765,9765] > [9751,9751] > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)