[ https://issues.apache.org/jira/browse/SPARK-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
cocoatomo updated SPARK-3772: ----------------------------- Description: To reproduce this issue, we should execute following commands on the commit: 6e27cb630de69fa5acb510b4e2f6b980742b1957. {quote} $ PYSPARK_PYTHON=ipython ./bin/pyspark ... In [1]: file = sc.textFile('README.md') In [2]: file.first() ... 14/10/03 08:50:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/10/03 08:50:13 WARN LoadSnappy: Snappy native library not loaded 14/10/03 08:50:13 INFO FileInputFormat: Total input paths to process : 1 14/10/03 08:50:13 INFO SparkContext: Starting job: runJob at PythonRDD.scala:334 14/10/03 08:50:13 INFO DAGScheduler: Got job 0 (runJob at PythonRDD.scala:334) with 1 output partitions (allowLocal=true) 14/10/03 08:50:13 INFO DAGScheduler: Final stage: Stage 0(runJob at PythonRDD.scala:334) 14/10/03 08:50:13 INFO DAGScheduler: Parents of final stage: List() 14/10/03 08:50:13 INFO DAGScheduler: Missing parents: List() 14/10/03 08:50:13 INFO DAGScheduler: Submitting Stage 0 (PythonRDD[2] at RDD at PythonRDD.scala:44), which has no missing parents 14/10/03 08:50:13 INFO MemoryStore: ensureFreeSpace(4456) called with curMem=57388, maxMem=278019440 14/10/03 08:50:13 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.4 KB, free 265.1 MB) 14/10/03 08:50:13 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (PythonRDD[2] at RDD at PythonRDD.scala:44) 14/10/03 08:50:13 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 14/10/03 08:50:13 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1207 bytes) 14/10/03 08:50:13 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 14/10/03 08:50:14 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.IllegalArgumentException: port out of range:1027423549 at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143) at java.net.InetSocketAddress.<init>(InetSocketAddress.java:188) at java.net.Socket.<init>(Socket.java:244) at org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:75) at org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:90) at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:89) at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62) at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:100) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:71) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:744) {quote} was: To reproduce this issue, we should execute following commands. {quote} $ PYSPARK_PYTHON=ipython ./bin/pyspark ... In [1]: file = sc.textFile('README.md') In [2]: file.first() ... 14/10/03 08:50:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/10/03 08:50:13 WARN LoadSnappy: Snappy native library not loaded 14/10/03 08:50:13 INFO FileInputFormat: Total input paths to process : 1 14/10/03 08:50:13 INFO SparkContext: Starting job: runJob at PythonRDD.scala:334 14/10/03 08:50:13 INFO DAGScheduler: Got job 0 (runJob at PythonRDD.scala:334) with 1 output partitions (allowLocal=true) 14/10/03 08:50:13 INFO DAGScheduler: Final stage: Stage 0(runJob at PythonRDD.scala:334) 14/10/03 08:50:13 INFO DAGScheduler: Parents of final stage: List() 14/10/03 08:50:13 INFO DAGScheduler: Missing parents: List() 14/10/03 08:50:13 INFO DAGScheduler: Submitting Stage 0 (PythonRDD[2] at RDD at PythonRDD.scala:44), which has no missing parents 14/10/03 08:50:13 INFO MemoryStore: ensureFreeSpace(4456) called with curMem=57388, maxMem=278019440 14/10/03 08:50:13 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.4 KB, free 265.1 MB) 14/10/03 08:50:13 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (PythonRDD[2] at RDD at PythonRDD.scala:44) 14/10/03 08:50:13 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 14/10/03 08:50:13 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1207 bytes) 14/10/03 08:50:13 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 14/10/03 08:50:14 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.IllegalArgumentException: port out of range:1027423549 at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143) at java.net.InetSocketAddress.<init>(InetSocketAddress.java:188) at java.net.Socket.<init>(Socket.java:244) at org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:75) at org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:90) at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:89) at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62) at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:100) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:71) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:744) {quote} > RDD operation on IPython REPL failed with an illegal port number > ---------------------------------------------------------------- > > Key: SPARK-3772 > URL: https://issues.apache.org/jira/browse/SPARK-3772 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.2.0 > Environment: Mac OS X 10.9.5, Python 2.7.8, IPython 2.2.0 > Reporter: cocoatomo > Labels: pyspark > > To reproduce this issue, we should execute following commands on the commit: > 6e27cb630de69fa5acb510b4e2f6b980742b1957. > {quote} > $ PYSPARK_PYTHON=ipython ./bin/pyspark > ... > In [1]: file = sc.textFile('README.md') > In [2]: file.first() > ... > 14/10/03 08:50:13 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 14/10/03 08:50:13 WARN LoadSnappy: Snappy native library not loaded > 14/10/03 08:50:13 INFO FileInputFormat: Total input paths to process : 1 > 14/10/03 08:50:13 INFO SparkContext: Starting job: runJob at > PythonRDD.scala:334 > 14/10/03 08:50:13 INFO DAGScheduler: Got job 0 (runJob at > PythonRDD.scala:334) with 1 output partitions (allowLocal=true) > 14/10/03 08:50:13 INFO DAGScheduler: Final stage: Stage 0(runJob at > PythonRDD.scala:334) > 14/10/03 08:50:13 INFO DAGScheduler: Parents of final stage: List() > 14/10/03 08:50:13 INFO DAGScheduler: Missing parents: List() > 14/10/03 08:50:13 INFO DAGScheduler: Submitting Stage 0 (PythonRDD[2] at RDD > at PythonRDD.scala:44), which has no missing parents > 14/10/03 08:50:13 INFO MemoryStore: ensureFreeSpace(4456) called with > curMem=57388, maxMem=278019440 > 14/10/03 08:50:13 INFO MemoryStore: Block broadcast_1 stored as values in > memory (estimated size 4.4 KB, free 265.1 MB) > 14/10/03 08:50:13 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 > (PythonRDD[2] at RDD at PythonRDD.scala:44) > 14/10/03 08:50:13 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks > 14/10/03 08:50:13 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, > localhost, PROCESS_LOCAL, 1207 bytes) > 14/10/03 08:50:13 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) > 14/10/03 08:50:14 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) > java.lang.IllegalArgumentException: port out of range:1027423549 > at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143) > at java.net.InetSocketAddress.<init>(InetSocketAddress.java:188) > at java.net.Socket.<init>(Socket.java:244) > at > org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:75) > at > org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:90) > at > org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:89) > at > org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62) > at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:100) > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:71) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:744) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org