Hi, I am using spark 1.0.0 and found in spark sql some queries use GROUP BY give weird results. To reproduce, type the following commands in spark-shell connecting to a standalone server:
case class Foo(k: String, v: Int) val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext._ val rows = List.fill(100)(Foo("a", 1)) ++ List.fill(200)(Foo("b", 2)) ++ List.fill(300)(Foo("c", 3)) sc.makeRDD(rows).registerAsTable("foo") sql("select k,count(*) from foo group by k").collect the result will be something random like: res1: Array[org.apache.spark.sql.Row] = Array([b,180], [3,18], [a,75], [c,270], [4,56], [1,1]) and if I run the same query again, the new result will be correct: sql("select k,count(*) from foo group by k").collect res2: Array[org.apache.spark.sql.Row] = Array([b,200], [a,100], [c,300]) Should I file a bug? -- Pei-Lun Lee