[jira] [Resolved] (SPARK-12389) In Cluster RDD Action results are not consistent

Sean Owen (JIRA) Thu, 17 Dec 2015 07:16:10 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-12389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Owen resolved SPARK-12389.
-------------------------------
    Resolution: Not A Problem

This doesn't work as expected since your tasks are not even agreeing on the 
same file and partitioning -- each is looking at a different local copy.

> In Cluster RDD Action results are not consistent
> ------------------------------------------------
>
>                 Key: SPARK-12389
>                 URL: https://issues.apache.org/jira/browse/SPARK-12389
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.5.2
>         Environment: Centos 6.5 Machine 
> One Master and 3 Worker Nodes in VM's
> Master : 192.168.56.102
> Worker : 192.168.56.103,192.168.56.104,192.168.56.105
>            Reporter: vinoth
>         Attachments: cluster_wide.txt, local_spark.txt
>
>
> Just to now how the RDD recreate the lost segments without replication and 
> test how the cluster wide thing work in spark.
> I have the external file in linux , just to load the file and parallelize to 
> cluster split the transformation and perform some action on it in local as 
> well as in cluster wide.
> The below are the file content 
> =======================
> hai hello
> hai hello
> vinoth test
> test vinoth
> test hai  
> =======================
> The transformation and action i tried is in the shell is:
> data = sc.textFile("/tmp/test.txt")
> datamap = data.flatMap(lambda x : x.split(' '))
> datamap.count()
> That's it i keep running the datamap.count() on every time. The result it 
> produces is not consistent.
> If you split the file and count it it will be 10. I just worked and the 
> result is consistent  if we run the pyspark shell without master option.
> If we run it on providing the master option the results or not consistent. 
> Some times it produces 10 and some times it produce 9.
> In between the run in shell i manually down one worker node 192.168.56.104. 
> It's even surprising the result now show as "11"
> I attached the result i got it from cluster wide as well as in local mode.
> Please apologize me for wasting you time to read this issue , if this is the 
> normal behavior in spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-12389) In Cluster RDD Action results are not consistent

Reply via email to