[jira] [Assigned] (SPARK-7158) collect and take return different results

Apache Spark (JIRA) Mon, 27 Apr 2015 00:43:21 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Apache Spark reassigned SPARK-7158:
-----------------------------------

    Assignee:     (was: Apache Spark)

> collect and take return different results
> -----------------------------------------
>
>                 Key: SPARK-7158
>                 URL: https://issues.apache.org/jira/browse/SPARK-7158
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Reynold Xin
>            Priority: Blocker
>
> Reported by [~rams]
> {code}
> import java.util.UUID
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val rdd = sc.parallelize(List(1,2,3), 2)
> val schema = StructType(List(StructField("index",IntegerType,true)))
> val df = sqlContext.createDataFrame(rdd.map(p => Row(p)), schema)
> def id:() => String = () => {UUID.randomUUID().toString()}
> def square:Int => Int = (x: Int) => {x * x}
> val dfWithId = df.withColumn("id",callUDF(id, StringType)).cache() //expect 
> the ID to have materialized at this point
> dfWithId.collect()
> //res0: Array[org.apache.spark.sql.Row] = 
> Array([1,43c7b8e2-b4a3-43ee-beff-0bb4b7d6c1b1], 
> [2,efd061be-e8cc-43fa-956e-cfd6e7355982], 
> [3,79b0baab-627c-4761-af0d-8995b8c5a125])
> val dfWithIdAndSquare = dfWithId.withColumn("square",callUDF(square, 
> IntegerType, col("index")))
> dfWithIdAndSquare.collect()
> //res1: Array[org.apache.spark.sql.Row] = 
> Array([1,a3b2e744-a0a1-40fe-8133-87a67660b4ab,1], 
> [2,0a7052a0-6071-4ef5-a25a-2670248ea5cd,4], 
> [3,209f269e-207a-4dfd-a186-738be5db2eff,9])
> //why are the IDs in lines 11 and 15 different?
> {code}
> The randomly generated IDs are the same if show (which uses take under the 
> hood) is used instead of collect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-7158) collect and take return different results

Reply via email to