[ https://issues.apache.org/jira/browse/SPARK-12389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-12389. ------------------------------- Resolution: Not A Problem This doesn't work as expected since your tasks are not even agreeing on the same file and partitioning -- each is looking at a different local copy. > In Cluster RDD Action results are not consistent > ------------------------------------------------ > > Key: SPARK-12389 > URL: https://issues.apache.org/jira/browse/SPARK-12389 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.5.2 > Environment: Centos 6.5 Machine > One Master and 3 Worker Nodes in VM's > Master : 192.168.56.102 > Worker : 192.168.56.103,192.168.56.104,192.168.56.105 > Reporter: vinoth > Attachments: cluster_wide.txt, local_spark.txt > > > Just to now how the RDD recreate the lost segments without replication and > test how the cluster wide thing work in spark. > I have the external file in linux , just to load the file and parallelize to > cluster split the transformation and perform some action on it in local as > well as in cluster wide. > The below are the file content > ======================= > hai hello > hai hello > vinoth test > test vinoth > test hai > ======================= > The transformation and action i tried is in the shell is: > data = sc.textFile("/tmp/test.txt") > datamap = data.flatMap(lambda x : x.split(' ')) > datamap.count() > That's it i keep running the datamap.count() on every time. The result it > produces is not consistent. > If you split the file and count it it will be 10. I just worked and the > result is consistent if we run the pyspark shell without master option. > If we run it on providing the master option the results or not consistent. > Some times it produces 10 and some times it produce 9. > In between the run in shell i manually down one worker node 192.168.56.104. > It's even surprising the result now show as "11" > I attached the result i got it from cluster wide as well as in local mode. > Please apologize me for wasting you time to read this issue , if this is the > normal behavior in spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org