[ https://issues.apache.org/jira/browse/SPARK-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Armbrust reassigned SPARK-1994: --------------------------------------- Assignee: Michael Armbrust > Weird data corruption bug when running Spark SQL on data in HDFS > ---------------------------------------------------------------- > > Key: SPARK-1994 > URL: https://issues.apache.org/jira/browse/SPARK-1994 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.0.0 > Reporter: Michael Armbrust > Assignee: Michael Armbrust > Priority: Blocker > > [~adav] has a full reproduction but he has found a case where the first run > returns corrupted results, but the second case does not. The same does not > occur when reading from HDFS a second time... > {code} > sql("SELECT lang, COUNT(*) AS cnt FROM tweetTable GROUP BY lang ORDER BY cnt > DESC").collect.foreach(println) > [bg,16636] > [16266,16266] > [16223,16223] > [16161,16161] > [16047,16047] > [lt,11405] > [hu,11380] > [el,10845] > [da,10289] > [fi,10261] > [9897,9897] > [9765,9765] > [9751,9751] > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)