[jira] [Created] (SPARK-12802) The DataFrame.rdd not return same result
Joseph Sun created SPARK-12802: -- Summary: The DataFrame.rdd not return same result Key: SPARK-12802 URL: https://issues.apache.org/jira/browse/SPARK-12802 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 1.5.2 Environment: 3 servers of centos7, cluster mode Reporter: Joseph Sun run spark-shell and typeing following codes. > import org.apache.spark.sql.types._ > val schema = StructType(StructField("id",IntegerType,true)::Nil) > val rdd = sc.parallelize((0 to 1)).map(Row(_)) > val df = sqlContext.createDataFrame(rdd,schema) > df.registerTempTable("test") > sqlContext.cacheTable("test") > sqlContext.sql("select * from test limit 2").collect() show Array[org.apache.spark.sql.Row] = Array([0], [1]) > sqlContext.sql("select * from test limit 2").rdd.collect() run the code one more times,the result is not consistent. some times the result is : Array[org.apache.spark.sql.Row] = Array([0], [1]) or: Array[org.apache.spark.sql.Row] = Array([2500], [2501]) why? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12801) The DataFrame.rdd not return same result
Joseph Sun created SPARK-12801: -- Summary: The DataFrame.rdd not return same result Key: SPARK-12801 URL: https://issues.apache.org/jira/browse/SPARK-12801 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 1.5.2 Environment: 3 servers of centos7, cluster mode Reporter: Joseph Sun run spark-shell and typeing following codes. > import org.apache.spark.sql.types._ > val schema = StructType(StructField("id",IntegerType,true)::Nil) > val rdd = sc.parallelize((0 to 1)).map(Row(_)) > val df = sqlContext.createDataFrame(rdd,schema) > df.registerTempTable("test") > sqlContext.cacheTable("test") > sqlContext.sql("select * from test limit 2").collect() show Array[org.apache.spark.sql.Row] = Array([0], [1]) > sqlContext.sql("select * from test limit 2").rdd.collect() run the code one more times,the result is not consistent. some times the result is : Array[org.apache.spark.sql.Row] = Array([0], [1]) or: Array[org.apache.spark.sql.Row] = Array([2500], [2501]) why? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org