[ 
https://issues.apache.org/jira/browse/SPARK-17752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15543045#comment-15543045
 ] 

Kevin Ushey commented on SPARK-17752:
-------------------------------------

I can confirm that everything is okay with Spark 2.0.2-SNAPSHOT (as retrieved 
from http://people.apache.org/~pwendell/spark-nightly/spark-master-bin/latest/).

I also reproduced this issue with Spark 1.6.2, so it might be worth 
investigating what fixed this issue, and if that fix should be backported to 
e.g. Spark 1.6.3. (not sure if long-term support of Spark 1.6 is intended)

> Spark returns incorrect result when 'collect()'ing a cached Dataset with many 
> columns
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-17752
>                 URL: https://issues.apache.org/jira/browse/SPARK-17752
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 2.0.0
>            Reporter: Kevin Ushey
>            Priority: Critical
>
> Run the following code (modify SPARK_HOME to point to a Spark 2.0.0 
> installation as necessary):
> {code:r}
> SPARK_HOME <- path.expand("~/Library/Caches/spark/spark-2.0.0-bin-hadoop2.7")
> Sys.setenv(SPARK_HOME = SPARK_HOME)
> library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
> sparkR.session(master = "local[*]", sparkConfig = list(spark.driver.memory = 
> "2g"))
> n <- 1E3
> df <- as.data.frame(replicate(n, 1L, FALSE))
> names(df) <- paste("X", 1:n, sep = "")
> tbl <- as.DataFrame(df)
> cache(tbl) # works fine without this
> cl <- collect(tbl)
> identical(df, cl) # FALSE
> {code}
> Although this is reproducible with SparkR, it seems more likely that this is 
> an error in the Java / Scala Spark sources.
> For posterity:
> > sessionInfo()
> R version 3.3.1 Patched (2016-07-30 r71015)
> Platform: x86_64-apple-darwin13.4.0 (64-bit)
> Running under: macOS Sierra (10.12)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to