[ 
https://issues.apache.org/jira/browse/SPARK-11722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557226#comment-15557226
 ] 

holdenk commented on SPARK-11722:
---------------------------------

Is this still an issue you are experiencing and if so do you have repro code, 
I'm not completely sure what the issue is you are experiencing?

> Rdds could be different between orginal one and save-out-then-read-in one
> -------------------------------------------------------------------------
>
>                 Key: SPARK-11722
>                 URL: https://issues.apache.org/jira/browse/SPARK-11722
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.5.1
>         Environment: redhat6.4  64bit;   standalone-cluster ; 3 machines
>            Reporter: liangguoning
>
> I found a bug on pyspark;
> I did some operations to create a rdd  A,  but I found the data are different 
> between that orginal A  and the saved_to_hdfs's  one, called B,
> I also printed all detail data inside my function and discovered that A 
> indeed contains a different one record from B.
> That record causes a different result under the same functions. 
> I got B  through 2 methods : A.saveAsTextFile  and  sc.textFile
> I also check the raw data, and found that B is the right rdd. 
> ---
> I tried another A2 through sc.parallelize(A.collect()) and got the same 
> result as A.
> Thanks 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to