[ https://issues.apache.org/jira/browse/SPARK-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Idan Zalzberg updated SPARK-1394: --------------------------------- Summary: calling system.platform on worker raises IOError (was: calling system.platform on worker raises exception) > calling system.platform on worker raises IOError > ------------------------------------------------ > > Key: SPARK-1394 > URL: https://issues.apache.org/jira/browse/SPARK-1394 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 0.9.0 > Environment: Tested on Ubuntu and Linux, local and remote master, > python 2.7.* > Reporter: Idan Zalzberg > Labels: pyspark > > A simple program that calls system.platform() on the worker fails most of the > time (it works some times but very rarely). > This is critical since many libraries call that method (e.g. boto). > Here is the trace of the attempt to call that method: > $ /usr/local/spark/bin/pyspark > Python 2.7.3 (default, Feb 27 2014, 20:00:17) > [GCC 4.6.3] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > 14/04/02 18:18:37 INFO Utils: Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > 14/04/02 18:18:37 WARN Utils: Your hostname, qlika-dev resolves to a loopback > address: 127.0.1.1; using 10.33.102.46 instead (on interface eth1) > 14/04/02 18:18:37 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to > another address > 14/04/02 18:18:38 INFO Slf4jLogger: Slf4jLogger started > 14/04/02 18:18:38 INFO Remoting: Starting remoting > 14/04/02 18:18:39 INFO Remoting: Remoting started; listening on addresses > :[akka.tcp://spark@10.33.102.46:36640] > 14/04/02 18:18:39 INFO Remoting: Remoting now listens on addresses: > [akka.tcp://spark@10.33.102.46:36640] > 14/04/02 18:18:39 INFO SparkEnv: Registering BlockManagerMaster > 14/04/02 18:18:39 INFO DiskBlockManager: Created local directory at > /tmp/spark-local-20140402181839-919f > 14/04/02 18:18:39 INFO MemoryStore: MemoryStore started with capacity 294.6 > MB. > 14/04/02 18:18:39 INFO ConnectionManager: Bound socket to port 43357 with id > = ConnectionManagerId(10.33.102.46,43357) > 14/04/02 18:18:39 INFO BlockManagerMaster: Trying to register BlockManager > 14/04/02 18:18:39 INFO BlockManagerMasterActor$BlockManagerInfo: Registering > block manager 10.33.102.46:43357 with 294.6 MB RAM > 14/04/02 18:18:39 INFO BlockManagerMaster: Registered BlockManager > 14/04/02 18:18:39 INFO HttpServer: Starting HTTP Server > 14/04/02 18:18:39 INFO HttpBroadcast: Broadcast server started at > http://10.33.102.46:51803 > 14/04/02 18:18:39 INFO SparkEnv: Registering MapOutputTracker > 14/04/02 18:18:39 INFO HttpFileServer: HTTP File server directory is > /tmp/spark-9b38acb0-7b01-4463-b0a6-602bfed05a2b > 14/04/02 18:18:39 INFO HttpServer: Starting HTTP Server > 14/04/02 18:18:40 INFO SparkUI: Started Spark Web UI at > http://10.33.102.46:4040 > 14/04/02 18:18:40 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > Welcome to > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /__ / .__/\_,_/_/ /_/\_\ version 0.9.0 > /_/ > Using Python version 2.7.3 (default, Feb 27 2014 20:00:17) > Spark context available as sc. > >>> import platform > >>> sc.parallelize([1]).map(lambda x : platform.system()).collect() > 14/04/02 18:19:17 INFO SparkContext: Starting job: collect at <stdin>:1 > 14/04/02 18:19:17 INFO DAGScheduler: Got job 0 (collect at <stdin>:1) with 1 > output partitions (allowLocal=false) > 14/04/02 18:19:17 INFO DAGScheduler: Final stage: Stage 0 (collect at > <stdin>:1) > 14/04/02 18:19:17 INFO DAGScheduler: Parents of final stage: List() > 14/04/02 18:19:17 INFO DAGScheduler: Missing parents: List() > 14/04/02 18:19:17 INFO DAGScheduler: Submitting Stage 0 (PythonRDD[1] at > collect at <stdin>:1), which has no missing parents > 14/04/02 18:19:17 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 > (PythonRDD[1] at collect at <stdin>:1) > 14/04/02 18:19:17 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks > 14/04/02 18:19:17 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on > executor localhost: localhost (PROCESS_LOCAL) > 14/04/02 18:19:17 INFO TaskSetManager: Serialized task 0.0:0 as 2152 bytes in > 12 ms > 14/04/02 18:19:17 INFO Executor: Running task ID 0 > PySpark worker failed with exception: > Traceback (most recent call last): > File "/usr/local/spark/python/pyspark/worker.py", line 77, in main > serializer.dump_stream(func(split_index, iterator), outfile) > File "/usr/local/spark/python/pyspark/serializers.py", line 182, in > dump_stream > self.serializer.dump_stream(self._batched(iterator), stream) > File "/usr/local/spark/python/pyspark/serializers.py", line 117, in > dump_stream > for obj in iterator: > File "/usr/local/spark/python/pyspark/serializers.py", line 171, in _batched > for item in iterator: > File "<stdin>", line 1, in <lambda> > File "/usr/lib/python2.7/platform.py", line 1306, in system > return uname()[0] > File "/usr/lib/python2.7/platform.py", line 1273, in uname > processor = _syscmd_uname('-p','') > File "/usr/lib/python2.7/platform.py", line 1030, in _syscmd_uname > rc = f.close() > IOError: [Errno 10] No child processes > 14/04/02 18:19:17 ERROR Executor: Exception in task ID 0 > org.apache.spark.api.python.PythonException: Traceback (most recent call > last): > File "/usr/local/spark/python/pyspark/worker.py", line 77, in main > serializer.dump_stream(func(split_index, iterator), outfile) > File "/usr/local/spark/python/pyspark/serializers.py", line 182, in > dump_stream > self.serializer.dump_stream(self._batched(iterator), stream) > File "/usr/local/spark/python/pyspark/serializers.py", line 117, in > dump_stream > for obj in iterator: > File "/usr/local/spark/python/pyspark/serializers.py", line 171, in _batched > for item in iterator: > File "<stdin>", line 1, in <lambda> > File "/usr/lib/python2.7/platform.py", line 1306, in system > return uname()[0] > File "/usr/lib/python2.7/platform.py", line 1273, in uname > processor = _syscmd_uname('-p','') > File "/usr/lib/python2.7/platform.py", line 1030, in _syscmd_uname > rc = f.close() > IOError: [Errno 10] No child processes > at > org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:131) > at > org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:153) > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:96) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) > at org.apache.spark.scheduler.Task.run(Task.scala:53) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.2#6252)