subject:"\[jira\] \[Commented\] \(SPARK\-2244\) pyspark \- RDD action hangs \(after previously succeeding\)"

[jira] [Commented] (SPARK-2244) pyspark - RDD action hangs (after previously succeeding)

2014-06-26 Thread Matthew Farrellee (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045437#comment-14045437
 ] 

Matthew Farrellee commented on SPARK-2244:
--

this is a duplicate of and is resolved by SPARK-2242

> pyspark - RDD action hangs (after previously succeeding)
> 
>
> Key: SPARK-2244
> URL: https://issues.apache.org/jira/browse/SPARK-2244
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: system: fedora 20 w/ maven 3.1.1 and openjdk 1.7.0_55 & 
> 1.8.0_05
> code: sha b88238fa (master on 23 june 2014)
> cluster: make-distribution.sh followed by ./dist/sbin/start-all.sh (running 
> locally)
>Reporter: Matthew Farrellee
>  Labels: openjdk, pyspark, python, shell, spark
>
> {code}
> $ ./dist/bin/pyspark
> Python 2.7.5 (default, Feb 19 2014, 13:47:28) 
> [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/__ / .__/\_,_/_/ /_/\_\   version 1.0.0-SNAPSHOT
>   /_/
> Using Python version 2.7.5 (default, Feb 19 2014 13:47:28)
> SparkContext available as sc.
> >>> hundy = sc.parallelize(range(100))
> >>> hundy.count()
> 100
> >>> hundy.count()
> 100
> >>> hundy.count()
> 100
> [repeat until hang, ctrl-C to get]
> >>> hundy.count()
> ^CTraceback (most recent call last):
>   File "", line 1, in 
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", 
> line 774, in count
> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", 
> line 765, in sum
> return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add)
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", 
> line 685, in reduce
> vals = self.mapPartitions(func).collect()
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", 
> line 649, in collect
> bytesInJava = self._jrdd.collect().iterator()
>   File 
> "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>  line 535, in __call__
>   File 
> "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>  line 363, in send_command
>   File 
> "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>  line 472, in send_command
>   File "/usr/lib64/python2.7/socket.py", line 430, in readline
> data = recv(1)
> KeyboardInterrupt
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2244) pyspark - RDD action hangs (after previously succeeding)

2014-06-24 Thread Matthew Farrellee (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042536#comment-14042536
 ] 

Matthew Farrellee commented on SPARK-2244:
--

yes, but i prefer my solution - https://github.com/apache/spark/pull/1197

> pyspark - RDD action hangs (after previously succeeding)
> 
>
> Key: SPARK-2244
> URL: https://issues.apache.org/jira/browse/SPARK-2244
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: system: fedora 20 w/ maven 3.1.1 and openjdk 1.7.0_55 & 
> 1.8.0_05
> code: sha b88238fa (master on 23 june 2014)
> cluster: make-distribution.sh followed by ./dist/sbin/start-all.sh (running 
> locally)
>Reporter: Matthew Farrellee
>  Labels: openjdk, pyspark, python, shell, spark
>
> {code}
> $ ./dist/bin/pyspark
> Python 2.7.5 (default, Feb 19 2014, 13:47:28) 
> [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/__ / .__/\_,_/_/ /_/\_\   version 1.0.0-SNAPSHOT
>   /_/
> Using Python version 2.7.5 (default, Feb 19 2014 13:47:28)
> SparkContext available as sc.
> >>> hundy = sc.parallelize(range(100))
> >>> hundy.count()
> 100
> >>> hundy.count()
> 100
> >>> hundy.count()
> 100
> [repeat until hang, ctrl-C to get]
> >>> hundy.count()
> ^CTraceback (most recent call last):
>   File "", line 1, in 
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", 
> line 774, in count
> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", 
> line 765, in sum
> return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add)
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", 
> line 685, in reduce
> vals = self.mapPartitions(func).collect()
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", 
> line 649, in collect
> bytesInJava = self._jrdd.collect().iterator()
>   File 
> "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>  line 535, in __call__
>   File 
> "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>  line 363, in send_command
>   File 
> "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>  line 472, in send_command
>   File "/usr/lib64/python2.7/socket.py", line 430, in readline
> data = recv(1)
> KeyboardInterrupt
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2244) pyspark - RDD action hangs (after previously succeeding)

2014-06-24 Thread Reynold Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042529#comment-14042529
 ] 

Reynold Xin commented on SPARK-2244:


Is this related? https://github.com/apache/spark/pull/1178

> pyspark - RDD action hangs (after previously succeeding)
> 
>
> Key: SPARK-2244
> URL: https://issues.apache.org/jira/browse/SPARK-2244
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: system: fedora 20 w/ maven 3.1.1 and openjdk 1.7.0_55 & 
> 1.8.0_05
> code: sha b88238fa (master on 23 june 2014)
> cluster: make-distribution.sh followed by ./dist/sbin/start-all.sh (running 
> locally)
>Reporter: Matthew Farrellee
>  Labels: openjdk, pyspark, python, shell, spark
>
> {code}
> $ ./dist/bin/pyspark
> Python 2.7.5 (default, Feb 19 2014, 13:47:28) 
> [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/__ / .__/\_,_/_/ /_/\_\   version 1.0.0-SNAPSHOT
>   /_/
> Using Python version 2.7.5 (default, Feb 19 2014 13:47:28)
> SparkContext available as sc.
> >>> hundy = sc.parallelize(range(100))
> >>> hundy.count()
> 100
> >>> hundy.count()
> 100
> >>> hundy.count()
> 100
> [repeat until hang, ctrl-C to get]
> >>> hundy.count()
> ^CTraceback (most recent call last):
>   File "", line 1, in 
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", 
> line 774, in count
> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", 
> line 765, in sum
> return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add)
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", 
> line 685, in reduce
> vals = self.mapPartitions(func).collect()
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", 
> line 649, in collect
> bytesInJava = self._jrdd.collect().iterator()
>   File 
> "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>  line 535, in __call__
>   File 
> "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>  line 363, in send_command
>   File 
> "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>  line 472, in send_command
>   File "/usr/lib64/python2.7/socket.py", line 430, in readline
> data = recv(1)
> KeyboardInterrupt
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2244) pyspark - RDD action hangs (after previously succeeding)

2014-06-24 Thread Matthew Farrellee (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042523#comment-14042523
 ] 

Matthew Farrellee commented on SPARK-2244:
--

i have a theory - after a long bisect session the following commit was 
implicated -

3870248740d83b0292ccca88a494ce19783847f0 is the first bad commit
commit 3870248740d83b0292ccca88a494ce19783847f0
Author: Kay Ousterhout 
Date:   Wed Jun 18 13:16:26 2014 -0700

in that commit stderr is captured into a PIPE for the first time

theory is the pipe is filling a buffer that is never drained, resulting in an 
eventual hang in communication.

testing this theory by adding an additional EchoOutputThread for proc.stderr, 
which appears to resolve the issue

i'll come up with an appropriate fix and send a pull request

> pyspark - RDD action hangs (after previously succeeding)
> 
>
> Key: SPARK-2244
> URL: https://issues.apache.org/jira/browse/SPARK-2244
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: system: fedora 20 w/ maven 3.1.1 and openjdk 1.7.0_55 & 
> 1.8.0_05
> code: sha b88238fa (master on 23 june 2014)
> cluster: make-distribution.sh followed by ./dist/sbin/start-all.sh (running 
> locally)
>Reporter: Matthew Farrellee
>  Labels: openjdk, pyspark, python, shell, spark
>
> {code}
> $ ./dist/bin/pyspark
> Python 2.7.5 (default, Feb 19 2014, 13:47:28) 
> [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/__ / .__/\_,_/_/ /_/\_\   version 1.0.0-SNAPSHOT
>   /_/
> Using Python version 2.7.5 (default, Feb 19 2014 13:47:28)
> SparkContext available as sc.
> >>> hundy = sc.parallelize(range(100))
> >>> hundy.count()
> 100
> >>> hundy.count()
> 100
> >>> hundy.count()
> 100
> [repeat until hang, ctrl-C to get]
> >>> hundy.count()
> ^CTraceback (most recent call last):
>   File "", line 1, in 
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", 
> line 774, in count
> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", 
> line 765, in sum
> return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add)
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", 
> line 685, in reduce
> vals = self.mapPartitions(func).collect()
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", 
> line 649, in collect
> bytesInJava = self._jrdd.collect().iterator()
>   File 
> "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>  line 535, in __call__
>   File 
> "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>  line 363, in send_command
>   File 
> "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>  line 472, in send_command
>   File "/usr/lib64/python2.7/socket.py", line 430, in readline
> data = recv(1)
> KeyboardInterrupt
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2244) pyspark - RDD action hangs (after previously succeeding)

2014-06-23 Thread Matthew Farrellee (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041517#comment-14041517
 ] 

Matthew Farrellee commented on SPARK-2244:
--

notes -

import logging
logger = logging.getLogger('py4j')
logger.setLevel(logging.DEBUG)
sh = logging.StreamHandler()
logger.addHandler(sh)

one = sc.parallelize(range(1))
one.count()
[repeat until hang]

***successful count -***

>>> one.count()
Command to send: c
o6
setCallSite
scount at :1
e

Answer received: yv
Command to send: i
java.util.ArrayList
e

Answer received: ylo150
Command to send: c
o14
classTag
e

Answer received: yro151
Command to send: m
d
o85
e

Answer received: yv
Command to send: m
d
o86
e

Answer received: yv
Command to send: m
d
o87
e

Answer received: yv
Command to send: m
d
o88
e

Answer received: yv
Command to send: m
d
o89
e

Answer received: yv
Command to send: m
d
o90
e

Answer received: yv
Command to send: m
d
o91
e

Answer received: yv
Command to send: m
d
o92
e

Answer received: yv
Command to send: m
d
o93
e

Answer received: yv
Command to send: m
d
o94
e

Answer received: yv
Command to send: m
d
o95
e

Answer received: yv
Command to send: m
d
o96
e

Answer received: yv
Command to send: m
d
o97
e

Answer received: yv
Command to send: m
d
o98
e

Answer received: yv
Command to send: m
d
o99
e

Answer received: yv
Command to send: m
d
o100
e

Answer received: yv
Command to send: m
d
o101
e

Answer received: yv
Command to send: m
d
o102
e

Answer received: yv
Command to send: m
d
o103
e

Answer received: yv
Command to send: m
d
o104
e

Answer received: yv
Command to send: m
d
o105
e

Answer received: yv
Command to send: m
d
o106
e

Answer received: yv
Command to send: m
d
o107
e

Answer received: yv
Command to send: m
d
o108
e

Answer received: yv
Command to send: m
d
o109
e

Answer received: yv
Command to send: m
d
o110
e

Answer received: yv
Command to send: m
d
o111
e

Answer received: yv
Command to send: m
d
o112
e

Answer received: yv
Command to send: m
d
o113
e

Answer received: yv
Command to send: m
d
o114
e

Answer received: yv
Command to send: m
d
o115
e

Answer received: yv
Command to send: m
d
o116
e

Answer received: yv
Command to send: m
d
o117
e

Answer received: yv
Command to send: m
d
o118
e

Answer received: yv
Command to send: m
d
o119
e

Answer received: yv
Command to send: m
d
o120
e

Answer received: yv
Command to send: m
d
o121
e

Answer received: yv
Command to send: m
d
o122
e

Answer received: yv
Command to send: m
d
o123
e

Answer received: yv
Command to send: m
d
o124
e

Answer received: yv
Command to send: m
d
o125
e

Answer received: yv
Command to send: m
d
o126
e

Answer received: yv
Command to send: m
d
o127
e

Answer received: yv
Command to send: m
d
o128
e

Answer received: yv
Command to send: m
d
o129
e

Answer received: yv
Command to send: m
d
o130
e

Answer received: yv
Command to send: m
d
o131
e

Answer received: yv
Command to send: m
d
o132
e

Answer received: yv
Command to send: m
d
o133
e

Answer received: yv
Command to send: m
d
o134
e

Answer received: yv
Command to send: m
d
o135
e

Answer received: yv
Command to send: m
d
o136
e

Answer received: yv
Command to send: m
d
o137
e

Answer received: yv
Command to send: m
d
o138
e

Answer received: yv
Command to send: m
d
o139
e

Answer received: yv
Command to send: m
d
o140
e

Answer received: yv
Command to send: m
d
o141
e

Answer received: yv
Command to send: m
d
o142
e

Answer received: yv
Command to send: m
d
o143
e

Answer received: yv
Command to send: m
d
o144
e

Answer received: yv
Command to send: m
d
o145
e

Answer received: yv
Command to send: m
d
o146
e

Answer received: yv
Command to send: m
d
o147
e

Answer received: yv
Command to send: m
d
o148
e

Answer received: yv
Command to send: m
d
o149
e

Answer received: yv
Command to send: i
java.util.HashMap
e

Answer received: yao152
Command to send: i
java.util.ArrayList
e

Answer received: ylo153
Command to send: r
u
PythonRDD
rj
e

Answer received: ycorg.apache.spark.api.python.PythonRDD
Command to send: c
o14
rdd
e

Answer received: yro154
Command to send: i
org.apache.spark.api.python.PythonRDD
ro154
jgAIoY3B5c3BhcmsuY2xvdWRwaWNrbGUKX21vZHVsZXNfdG9fbWFpbgpxAF1xAVULcHlzcGFyay5yZGRxAmGFcQNSMWNweXNwYXJrLmNsb3VkcGlja2xlCl9maWxsX2Z1bmN0aW9uCnEEKGNweXNwYXJrLmNsb3VkcGlja2xlCl9tYWtlX3NrZWxfZnVuYwpxBWNuZXcKY29kZQpxBihLAksCSwVLE1UWiAAAfAAAiAEAfAAAfAEAgwIAgwIAU3EHToVxCClVBXNwbGl0cQlVCGl0ZXJhdG9ycQqGcQtVQi9ob21lL21hdHQvRG9jdW1lbnRzL1JlcG9zaXRvcmllcy9zcGFyay9kaXN0L3B5dGhvbi9weXNwYXJrL3JkZC5weXEMVQ1waXBlbGluZV9mdW5jcQ1N+wVVAgABcQ5VBGZ1bmNxD1UJcHJldl9mdW5jcRCGcREpdHESUnETSwJ9cRSHcRVScRZ9cRdOXXEYKChoAF1xGWgCYYVxGlIxaAQoaAVoBihLAksCSwJLE1UKiAAAfAEAgwEAU3EbToVxHClVAXNxHWgKhnEeVUIvaG9tZS9tYXR0L0RvY3VtZW50cy9SZXBvc2l0b3JpZXMvc3BhcmsvZGlzdC9weXRob24vcHlzcGFyay9yZGQucHlxH2gPTR4BVQBxIFUBZnEhhXEiKXRxI1JxJEsBaBSHcSVScSZ9cSdOXXEoKGgAXXEpaAJhhXEqUjFoBChoBWgGKEsBSwNLBEszVVNkAAB9AQB4MgB8AABEXSoAfQ

[jira] [Commented] (SPARK-2244) pyspark - RDD action hangs (after previously succeeding)

2014-06-23 Thread Matthew Farrellee (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041175#comment-14041175
 ] 

Matthew Farrellee commented on SPARK-2244:
--

this reproduces both w/ pyspark and pyspark --master spark://localhost:7077

> pyspark - RDD action hangs (after previously succeeding)
> 
>
> Key: SPARK-2244
> URL: https://issues.apache.org/jira/browse/SPARK-2244
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: system: fedora 20 w/ maven 3.1.1 and openjdk 1.7.0_55
> code: sha b88238fa (master on 23 june 2014)
> cluster: make-distribution.sh followed by ./dist/sbin/start-all.sh (running 
> locally)
>Reporter: Matthew Farrellee
>  Labels: openjdk, pyspark, python, shell, spark
>
> $ ./dist/bin/pyspark
> Python 2.7.5 (default, Feb 19 2014, 13:47:28) 
> [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/__ / .__/\_,_/_/ /_/\_\   version 1.0.0-SNAPSHOT
>   /_/
> Using Python version 2.7.5 (default, Feb 19 2014 13:47:28)
> SparkContext available as sc.
> >>> hundy = sc.parallelize(range(100))
> >>> hundy.count()
> 100
> >>> hundy.count()
> 100
> >>> hundy.count()
> 100
> [repeat until hang, ctrl-C to get]
> >>> hundy.count()
> ^CTraceback (most recent call last):
>   File "", line 1, in 
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", 
> line 774, in count
> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", 
> line 765, in sum
> return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add)
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", 
> line 685, in reduce
> vals = self.mapPartitions(func).collect()
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", 
> line 649, in collect
> bytesInJava = self._jrdd.collect().iterator()
>   File 
> "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>  line 535, in __call__
>   File 
> "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>  line 363, in send_command
>   File 
> "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>  line 472, in send_command
>   File "/usr/lib64/python2.7/socket.py", line 430, in readline
> data = recv(1)
> KeyboardInterrupt



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2244) pyspark - RDD action hangs (after previously succeeding)

2014-06-23 Thread Matthew Farrellee (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041138#comment-14041138
 ] 

Matthew Farrellee commented on SPARK-2244:
--

fyi - i've not been able to reproduce this with the scala shell

> pyspark - RDD action hangs (after previously succeeding)
> 
>
> Key: SPARK-2244
> URL: https://issues.apache.org/jira/browse/SPARK-2244
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.0
> Environment: system: fedora 20 w/ maven 3.1.1 and openjdk 1.7.0_55
> code: sha b88238fa (master on 23 june 2014)
> cluster: make-distribution.sh followed by ./dist/sbin/start-all.sh (running 
> locally)
>Reporter: Matthew Farrellee
>  Labels: openjdk, pyspark, python, shell, spark
>
> $ ./dist/bin/pyspark
> Python 2.7.5 (default, Feb 19 2014, 13:47:28) 
> [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/__ / .__/\_,_/_/ /_/\_\   version 1.0.0-SNAPSHOT
>   /_/
> Using Python version 2.7.5 (default, Feb 19 2014 13:47:28)
> SparkContext available as sc.
> >>> hundy = sc.parallelize(range(100))
> >>> hundy.count()
> 100
> >>> hundy.count()
> 100
> >>> hundy.count()
> 100
> [repeat until hang, ctrl-C to get]
> >>> hundy.count()
> ^CTraceback (most recent call last):
>   File "", line 1, in 
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", 
> line 774, in count
> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", 
> line 765, in sum
> return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add)
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", 
> line 685, in reduce
> vals = self.mapPartitions(func).collect()
>   File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", 
> line 649, in collect
> bytesInJava = self._jrdd.collect().iterator()
>   File 
> "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>  line 535, in __call__
>   File 
> "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>  line 363, in send_command
>   File 
> "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>  line 472, in send_command
>   File "/usr/lib64/python2.7/socket.py", line 430, in readline
> data = recv(1)
> KeyboardInterrupt



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2244) pyspark - RDD action hangs (after previously succeeding)

[jira] [Commented] (SPARK-2244) pyspark - RDD action hangs (after previously succeeding)

[jira] [Commented] (SPARK-2244) pyspark - RDD action hangs (after previously succeeding)

[jira] [Commented] (SPARK-2244) pyspark - RDD action hangs (after previously succeeding)

[jira] [Commented] (SPARK-2244) pyspark - RDD action hangs (after previously succeeding)

[jira] [Commented] (SPARK-2244) pyspark - RDD action hangs (after previously succeeding)

[jira] [Commented] (SPARK-2244) pyspark - RDD action hangs (after previously succeeding)

7 matches

Site Navigation

Mail list logo

Footer information