[ https://issues.apache.org/jira/browse/SPARK-12801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096867#comment-15096867 ]
Jakob Odersky commented on SPARK-12801: --------------------------------------- I can't reproduce this either > The DataFrame.rdd not return same result > ---------------------------------------- > > Key: SPARK-12801 > URL: https://issues.apache.org/jira/browse/SPARK-12801 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL > Affects Versions: 1.5.2 > Environment: 3 servers of centos7, cluster mode > Reporter: Joseph Sun > > run spark-shell and typeing following codes. > > import org.apache.spark.sql.types._ > > val schema = StructType(StructField("id",IntegerType,true)::Nil) > > val rdd = sc.parallelize((0 to 10000)).map(Row(_)) > > val df = sqlContext.createDataFrame(rdd,schema) > > df.registerTempTable("test") > > sqlContext.cacheTable("test") > > sqlContext.sql("select * from test limit 2").collect() > show Array[org.apache.spark.sql.Row] = Array([0], [1]) > > sqlContext.sql("select * from test limit 2").rdd.collect() > run the code one more times,the result is not consistent. > some times the result is : Array[org.apache.spark.sql.Row] = Array([0], [1]) > or: Array[org.apache.spark.sql.Row] = Array([2500], [2501]) > why? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org