[jira] [Commented] (SPARK-2244) pyspark - RDD action hangs (after previously succeeding)
[ https://issues.apache.org/jira/browse/SPARK-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045437#comment-14045437 ] Matthew Farrellee commented on SPARK-2244: -- this is a duplicate of and is resolved by SPARK-2242 > pyspark - RDD action hangs (after previously succeeding) > > > Key: SPARK-2244 > URL: https://issues.apache.org/jira/browse/SPARK-2244 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.1.0 > Environment: system: fedora 20 w/ maven 3.1.1 and openjdk 1.7.0_55 & > 1.8.0_05 > code: sha b88238fa (master on 23 june 2014) > cluster: make-distribution.sh followed by ./dist/sbin/start-all.sh (running > locally) >Reporter: Matthew Farrellee > Labels: openjdk, pyspark, python, shell, spark > > {code} > $ ./dist/bin/pyspark > Python 2.7.5 (default, Feb 19 2014, 13:47:28) > [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/__ / .__/\_,_/_/ /_/\_\ version 1.0.0-SNAPSHOT > /_/ > Using Python version 2.7.5 (default, Feb 19 2014 13:47:28) > SparkContext available as sc. > >>> hundy = sc.parallelize(range(100)) > >>> hundy.count() > 100 > >>> hundy.count() > 100 > >>> hundy.count() > 100 > [repeat until hang, ctrl-C to get] > >>> hundy.count() > ^CTraceback (most recent call last): > File "", line 1, in > File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", > line 774, in count > return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() > File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", > line 765, in sum > return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add) > File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", > line 685, in reduce > vals = self.mapPartitions(func).collect() > File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", > line 649, in collect > bytesInJava = self._jrdd.collect().iterator() > File > "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", > line 535, in __call__ > File > "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", > line 363, in send_command > File > "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", > line 472, in send_command > File "/usr/lib64/python2.7/socket.py", line 430, in readline > data = recv(1) > KeyboardInterrupt > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2244) pyspark - RDD action hangs (after previously succeeding)
[ https://issues.apache.org/jira/browse/SPARK-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042536#comment-14042536 ] Matthew Farrellee commented on SPARK-2244: -- yes, but i prefer my solution - https://github.com/apache/spark/pull/1197 > pyspark - RDD action hangs (after previously succeeding) > > > Key: SPARK-2244 > URL: https://issues.apache.org/jira/browse/SPARK-2244 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.1.0 > Environment: system: fedora 20 w/ maven 3.1.1 and openjdk 1.7.0_55 & > 1.8.0_05 > code: sha b88238fa (master on 23 june 2014) > cluster: make-distribution.sh followed by ./dist/sbin/start-all.sh (running > locally) >Reporter: Matthew Farrellee > Labels: openjdk, pyspark, python, shell, spark > > {code} > $ ./dist/bin/pyspark > Python 2.7.5 (default, Feb 19 2014, 13:47:28) > [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/__ / .__/\_,_/_/ /_/\_\ version 1.0.0-SNAPSHOT > /_/ > Using Python version 2.7.5 (default, Feb 19 2014 13:47:28) > SparkContext available as sc. > >>> hundy = sc.parallelize(range(100)) > >>> hundy.count() > 100 > >>> hundy.count() > 100 > >>> hundy.count() > 100 > [repeat until hang, ctrl-C to get] > >>> hundy.count() > ^CTraceback (most recent call last): > File "", line 1, in > File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", > line 774, in count > return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() > File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", > line 765, in sum > return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add) > File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", > line 685, in reduce > vals = self.mapPartitions(func).collect() > File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", > line 649, in collect > bytesInJava = self._jrdd.collect().iterator() > File > "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", > line 535, in __call__ > File > "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", > line 363, in send_command > File > "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", > line 472, in send_command > File "/usr/lib64/python2.7/socket.py", line 430, in readline > data = recv(1) > KeyboardInterrupt > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2244) pyspark - RDD action hangs (after previously succeeding)
[ https://issues.apache.org/jira/browse/SPARK-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042529#comment-14042529 ] Reynold Xin commented on SPARK-2244: Is this related? https://github.com/apache/spark/pull/1178 > pyspark - RDD action hangs (after previously succeeding) > > > Key: SPARK-2244 > URL: https://issues.apache.org/jira/browse/SPARK-2244 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.1.0 > Environment: system: fedora 20 w/ maven 3.1.1 and openjdk 1.7.0_55 & > 1.8.0_05 > code: sha b88238fa (master on 23 june 2014) > cluster: make-distribution.sh followed by ./dist/sbin/start-all.sh (running > locally) >Reporter: Matthew Farrellee > Labels: openjdk, pyspark, python, shell, spark > > {code} > $ ./dist/bin/pyspark > Python 2.7.5 (default, Feb 19 2014, 13:47:28) > [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/__ / .__/\_,_/_/ /_/\_\ version 1.0.0-SNAPSHOT > /_/ > Using Python version 2.7.5 (default, Feb 19 2014 13:47:28) > SparkContext available as sc. > >>> hundy = sc.parallelize(range(100)) > >>> hundy.count() > 100 > >>> hundy.count() > 100 > >>> hundy.count() > 100 > [repeat until hang, ctrl-C to get] > >>> hundy.count() > ^CTraceback (most recent call last): > File "", line 1, in > File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", > line 774, in count > return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() > File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", > line 765, in sum > return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add) > File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", > line 685, in reduce > vals = self.mapPartitions(func).collect() > File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", > line 649, in collect > bytesInJava = self._jrdd.collect().iterator() > File > "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", > line 535, in __call__ > File > "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", > line 363, in send_command > File > "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", > line 472, in send_command > File "/usr/lib64/python2.7/socket.py", line 430, in readline > data = recv(1) > KeyboardInterrupt > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2244) pyspark - RDD action hangs (after previously succeeding)
[ https://issues.apache.org/jira/browse/SPARK-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042523#comment-14042523 ] Matthew Farrellee commented on SPARK-2244: -- i have a theory - after a long bisect session the following commit was implicated - 3870248740d83b0292ccca88a494ce19783847f0 is the first bad commit commit 3870248740d83b0292ccca88a494ce19783847f0 Author: Kay Ousterhout Date: Wed Jun 18 13:16:26 2014 -0700 in that commit stderr is captured into a PIPE for the first time theory is the pipe is filling a buffer that is never drained, resulting in an eventual hang in communication. testing this theory by adding an additional EchoOutputThread for proc.stderr, which appears to resolve the issue i'll come up with an appropriate fix and send a pull request > pyspark - RDD action hangs (after previously succeeding) > > > Key: SPARK-2244 > URL: https://issues.apache.org/jira/browse/SPARK-2244 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.1.0 > Environment: system: fedora 20 w/ maven 3.1.1 and openjdk 1.7.0_55 & > 1.8.0_05 > code: sha b88238fa (master on 23 june 2014) > cluster: make-distribution.sh followed by ./dist/sbin/start-all.sh (running > locally) >Reporter: Matthew Farrellee > Labels: openjdk, pyspark, python, shell, spark > > {code} > $ ./dist/bin/pyspark > Python 2.7.5 (default, Feb 19 2014, 13:47:28) > [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/__ / .__/\_,_/_/ /_/\_\ version 1.0.0-SNAPSHOT > /_/ > Using Python version 2.7.5 (default, Feb 19 2014 13:47:28) > SparkContext available as sc. > >>> hundy = sc.parallelize(range(100)) > >>> hundy.count() > 100 > >>> hundy.count() > 100 > >>> hundy.count() > 100 > [repeat until hang, ctrl-C to get] > >>> hundy.count() > ^CTraceback (most recent call last): > File "", line 1, in > File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", > line 774, in count > return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() > File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", > line 765, in sum > return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add) > File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", > line 685, in reduce > vals = self.mapPartitions(func).collect() > File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", > line 649, in collect > bytesInJava = self._jrdd.collect().iterator() > File > "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", > line 535, in __call__ > File > "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", > line 363, in send_command > File > "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", > line 472, in send_command > File "/usr/lib64/python2.7/socket.py", line 430, in readline > data = recv(1) > KeyboardInterrupt > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2244) pyspark - RDD action hangs (after previously succeeding)
[ https://issues.apache.org/jira/browse/SPARK-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041517#comment-14041517 ] Matthew Farrellee commented on SPARK-2244: -- notes - import logging logger = logging.getLogger('py4j') logger.setLevel(logging.DEBUG) sh = logging.StreamHandler() logger.addHandler(sh) one = sc.parallelize(range(1)) one.count() [repeat until hang] ***successful count -*** >>> one.count() Command to send: c o6 setCallSite scount at :1 e Answer received: yv Command to send: i java.util.ArrayList e Answer received: ylo150 Command to send: c o14 classTag e Answer received: yro151 Command to send: m d o85 e Answer received: yv Command to send: m d o86 e Answer received: yv Command to send: m d o87 e Answer received: yv Command to send: m d o88 e Answer received: yv Command to send: m d o89 e Answer received: yv Command to send: m d o90 e Answer received: yv Command to send: m d o91 e Answer received: yv Command to send: m d o92 e Answer received: yv Command to send: m d o93 e Answer received: yv Command to send: m d o94 e Answer received: yv Command to send: m d o95 e Answer received: yv Command to send: m d o96 e Answer received: yv Command to send: m d o97 e Answer received: yv Command to send: m d o98 e Answer received: yv Command to send: m d o99 e Answer received: yv Command to send: m d o100 e Answer received: yv Command to send: m d o101 e Answer received: yv Command to send: m d o102 e Answer received: yv Command to send: m d o103 e Answer received: yv Command to send: m d o104 e Answer received: yv Command to send: m d o105 e Answer received: yv Command to send: m d o106 e Answer received: yv Command to send: m d o107 e Answer received: yv Command to send: m d o108 e Answer received: yv Command to send: m d o109 e Answer received: yv Command to send: m d o110 e Answer received: yv Command to send: m d o111 e Answer received: yv Command to send: m d o112 e Answer received: yv Command to send: m d o113 e Answer received: yv Command to send: m d o114 e Answer received: yv Command to send: m d o115 e Answer received: yv Command to send: m d o116 e Answer received: yv Command to send: m d o117 e Answer received: yv Command to send: m d o118 e Answer received: yv Command to send: m d o119 e Answer received: yv Command to send: m d o120 e Answer received: yv Command to send: m d o121 e Answer received: yv Command to send: m d o122 e Answer received: yv Command to send: m d o123 e Answer received: yv Command to send: m d o124 e Answer received: yv Command to send: m d o125 e Answer received: yv Command to send: m d o126 e Answer received: yv Command to send: m d o127 e Answer received: yv Command to send: m d o128 e Answer received: yv Command to send: m d o129 e Answer received: yv Command to send: m d o130 e Answer received: yv Command to send: m d o131 e Answer received: yv Command to send: m d o132 e Answer received: yv Command to send: m d o133 e Answer received: yv Command to send: m d o134 e Answer received: yv Command to send: m d o135 e Answer received: yv Command to send: m d o136 e Answer received: yv Command to send: m d o137 e Answer received: yv Command to send: m d o138 e Answer received: yv Command to send: m d o139 e Answer received: yv Command to send: m d o140 e Answer received: yv Command to send: m d o141 e Answer received: yv Command to send: m d o142 e Answer received: yv Command to send: m d o143 e Answer received: yv Command to send: m d o144 e Answer received: yv Command to send: m d o145 e Answer received: yv Command to send: m d o146 e Answer received: yv Command to send: m d o147 e Answer received: yv Command to send: m d o148 e Answer received: yv Command to send: m d o149 e Answer received: yv Command to send: i java.util.HashMap e Answer received: yao152 Command to send: i java.util.ArrayList e Answer received: ylo153 Command to send: r u PythonRDD rj e Answer received: ycorg.apache.spark.api.python.PythonRDD Command to send: c o14 rdd e Answer received: yro154 Command to send: i org.apache.spark.api.python.PythonRDD ro154 jgAIoY3B5c3BhcmsuY2xvdWRwaWNrbGUKX21vZHVsZXNfdG9fbWFpbgpxAF1xAVULcHlzcGFyay5yZGRxAmGFcQNSMWNweXNwYXJrLmNsb3VkcGlja2xlCl9maWxsX2Z1bmN0aW9uCnEEKGNweXNwYXJrLmNsb3VkcGlja2xlCl9tYWtlX3NrZWxfZnVuYwpxBWNuZXcKY29kZQpxBihLAksCSwVLE1UWiAAAfAAAiAEAfAAAfAEAgwIAgwIAU3EHToVxCClVBXNwbGl0cQlVCGl0ZXJhdG9ycQqGcQtVQi9ob21lL21hdHQvRG9jdW1lbnRzL1JlcG9zaXRvcmllcy9zcGFyay9kaXN0L3B5dGhvbi9weXNwYXJrL3JkZC5weXEMVQ1waXBlbGluZV9mdW5jcQ1N+wVVAgABcQ5VBGZ1bmNxD1UJcHJldl9mdW5jcRCGcREpdHESUnETSwJ9cRSHcRVScRZ9cRdOXXEYKChoAF1xGWgCYYVxGlIxaAQoaAVoBihLAksCSwJLE1UKiAAAfAEAgwEAU3EbToVxHClVAXNxHWgKhnEeVUIvaG9tZS9tYXR0L0RvY3VtZW50cy9SZXBvc2l0b3JpZXMvc3BhcmsvZGlzdC9weXRob24vcHlzcGFyay9yZGQucHlxH2gPTR4BVQBxIFUBZnEhhXEiKXRxI1JxJEsBaBSHcSVScSZ9cSdOXXEoKGgAXXEpaAJhhXEqUjFoBChoBWgGKEsBSwNLBEszVVNkAAB9AQB4MgB8AABEXSoAfQ
[jira] [Commented] (SPARK-2244) pyspark - RDD action hangs (after previously succeeding)
[ https://issues.apache.org/jira/browse/SPARK-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041175#comment-14041175 ] Matthew Farrellee commented on SPARK-2244: -- this reproduces both w/ pyspark and pyspark --master spark://localhost:7077 > pyspark - RDD action hangs (after previously succeeding) > > > Key: SPARK-2244 > URL: https://issues.apache.org/jira/browse/SPARK-2244 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.1.0 > Environment: system: fedora 20 w/ maven 3.1.1 and openjdk 1.7.0_55 > code: sha b88238fa (master on 23 june 2014) > cluster: make-distribution.sh followed by ./dist/sbin/start-all.sh (running > locally) >Reporter: Matthew Farrellee > Labels: openjdk, pyspark, python, shell, spark > > $ ./dist/bin/pyspark > Python 2.7.5 (default, Feb 19 2014, 13:47:28) > [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/__ / .__/\_,_/_/ /_/\_\ version 1.0.0-SNAPSHOT > /_/ > Using Python version 2.7.5 (default, Feb 19 2014 13:47:28) > SparkContext available as sc. > >>> hundy = sc.parallelize(range(100)) > >>> hundy.count() > 100 > >>> hundy.count() > 100 > >>> hundy.count() > 100 > [repeat until hang, ctrl-C to get] > >>> hundy.count() > ^CTraceback (most recent call last): > File "", line 1, in > File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", > line 774, in count > return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() > File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", > line 765, in sum > return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add) > File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", > line 685, in reduce > vals = self.mapPartitions(func).collect() > File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", > line 649, in collect > bytesInJava = self._jrdd.collect().iterator() > File > "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", > line 535, in __call__ > File > "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", > line 363, in send_command > File > "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", > line 472, in send_command > File "/usr/lib64/python2.7/socket.py", line 430, in readline > data = recv(1) > KeyboardInterrupt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2244) pyspark - RDD action hangs (after previously succeeding)
[ https://issues.apache.org/jira/browse/SPARK-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041138#comment-14041138 ] Matthew Farrellee commented on SPARK-2244: -- fyi - i've not been able to reproduce this with the scala shell > pyspark - RDD action hangs (after previously succeeding) > > > Key: SPARK-2244 > URL: https://issues.apache.org/jira/browse/SPARK-2244 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.1.0 > Environment: system: fedora 20 w/ maven 3.1.1 and openjdk 1.7.0_55 > code: sha b88238fa (master on 23 june 2014) > cluster: make-distribution.sh followed by ./dist/sbin/start-all.sh (running > locally) >Reporter: Matthew Farrellee > Labels: openjdk, pyspark, python, shell, spark > > $ ./dist/bin/pyspark > Python 2.7.5 (default, Feb 19 2014, 13:47:28) > [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/__ / .__/\_,_/_/ /_/\_\ version 1.0.0-SNAPSHOT > /_/ > Using Python version 2.7.5 (default, Feb 19 2014 13:47:28) > SparkContext available as sc. > >>> hundy = sc.parallelize(range(100)) > >>> hundy.count() > 100 > >>> hundy.count() > 100 > >>> hundy.count() > 100 > [repeat until hang, ctrl-C to get] > >>> hundy.count() > ^CTraceback (most recent call last): > File "", line 1, in > File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", > line 774, in count > return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() > File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", > line 765, in sum > return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add) > File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", > line 685, in reduce > vals = self.mapPartitions(func).collect() > File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", > line 649, in collect > bytesInJava = self._jrdd.collect().iterator() > File > "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", > line 535, in __call__ > File > "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", > line 363, in send_command > File > "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", > line 472, in send_command > File "/usr/lib64/python2.7/socket.py", line 430, in readline > data = recv(1) > KeyboardInterrupt -- This message was sent by Atlassian JIRA (v6.2#6252)