[ https://issues.apache.org/jira/browse/SPARK-11722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557226#comment-15557226 ]
holdenk commented on SPARK-11722: --------------------------------- Is this still an issue you are experiencing and if so do you have repro code, I'm not completely sure what the issue is you are experiencing? > Rdds could be different between orginal one and save-out-then-read-in one > ------------------------------------------------------------------------- > > Key: SPARK-11722 > URL: https://issues.apache.org/jira/browse/SPARK-11722 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.5.1 > Environment: redhat6.4 64bit; standalone-cluster ; 3 machines > Reporter: liangguoning > > I found a bug on pyspark; > I did some operations to create a rdd A, but I found the data are different > between that orginal A and the saved_to_hdfs's one, called B, > I also printed all detail data inside my function and discovered that A > indeed contains a different one record from B. > That record causes a different result under the same functions. > I got B through 2 methods : A.saveAsTextFile and sc.textFile > I also check the raw data, and found that B is the right rdd. > --- > I tried another A2 through sc.parallelize(A.collect()) and got the same > result as A. > Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org