[jira] [Created] (SPARK-12802) The DataFrame.rdd not return same result

2016-01-13 Thread Joseph Sun (JIRA)
Joseph Sun created SPARK-12802:
--

 Summary: The DataFrame.rdd not return same result
 Key: SPARK-12802
 URL: https://issues.apache.org/jira/browse/SPARK-12802
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, SQL
Affects Versions: 1.5.2
 Environment: 3 servers of centos7, cluster mode
Reporter: Joseph Sun


run spark-shell and typeing following codes.

> import org.apache.spark.sql.types._
> val schema = StructType(StructField("id",IntegerType,true)::Nil)
> val rdd = sc.parallelize((0 to 1)).map(Row(_))
> val df = sqlContext.createDataFrame(rdd,schema)
> df.registerTempTable("test")
> sqlContext.cacheTable("test")
> sqlContext.sql("select *  from test limit 2").collect()
show Array[org.apache.spark.sql.Row] = Array([0], [1]) 

> sqlContext.sql("select *  from test limit 2").rdd.collect()
run the code one more times,the result is not consistent.
some times the result is : Array[org.apache.spark.sql.Row] = Array([0], [1])
or: Array[org.apache.spark.sql.Row] = Array([2500], [2501])

why?






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12801) The DataFrame.rdd not return same result

2016-01-13 Thread Joseph Sun (JIRA)
Joseph Sun created SPARK-12801:
--

 Summary: The DataFrame.rdd not return same result
 Key: SPARK-12801
 URL: https://issues.apache.org/jira/browse/SPARK-12801
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, SQL
Affects Versions: 1.5.2
 Environment: 3 servers of centos7, cluster mode
Reporter: Joseph Sun


run spark-shell and typeing following codes.

> import org.apache.spark.sql.types._
> val schema = StructType(StructField("id",IntegerType,true)::Nil)
> val rdd = sc.parallelize((0 to 1)).map(Row(_))
> val df = sqlContext.createDataFrame(rdd,schema)
> df.registerTempTable("test")
> sqlContext.cacheTable("test")
> sqlContext.sql("select *  from test limit 2").collect()
show Array[org.apache.spark.sql.Row] = Array([0], [1]) 

> sqlContext.sql("select *  from test limit 2").rdd.collect()
run the code one more times,the result is not consistent.
some times the result is : Array[org.apache.spark.sql.Row] = Array([0], [1])
or: Array[org.apache.spark.sql.Row] = Array([2500], [2501])

why?






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org