[ 
https://issues.apache.org/jira/browse/SPARK-12801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096867#comment-15096867
 ] 

Jakob Odersky commented on SPARK-12801:
---------------------------------------

I can't reproduce this either

> The DataFrame.rdd not return same result
> ----------------------------------------
>
>                 Key: SPARK-12801
>                 URL: https://issues.apache.org/jira/browse/SPARK-12801
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 1.5.2
>         Environment: 3 servers of centos7, cluster mode
>            Reporter: Joseph Sun
>
> run spark-shell and typeing following codes.
> > import org.apache.spark.sql.types._
> > val schema = StructType(StructField("id",IntegerType,true)::Nil)
> > val rdd = sc.parallelize((0 to 10000)).map(Row(_))
> > val df = sqlContext.createDataFrame(rdd,schema)
> > df.registerTempTable("test")
> > sqlContext.cacheTable("test")
> > sqlContext.sql("select *  from test limit 2").collect()
> show Array[org.apache.spark.sql.Row] = Array([0], [1]) 
> > sqlContext.sql("select *  from test limit 2").rdd.collect()
> run the code one more times,the result is not consistent.
> some times the result is : Array[org.apache.spark.sql.Row] = Array([0], [1])
> or: Array[org.apache.spark.sql.Row] = Array([2500], [2501])
> why?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to