[jira] [Commented] (SPARK-9089) Failing to run simple job on Spark Standalone Cluster

Eduard Llull (JIRA) Wed, 30 Mar 2016 06:04:40 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-9089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15217932#comment-15217932
 ]


Eduard Llull commented on SPARK-9089:
-------------------------------------

We've faced this same problem and after digging and having a lot of luck we've 
found the origin of the issue.

It is caused because snappy-java extracts the native library to java.io.tempdir 
(/tmp by default) and sets the executable flag to the extracted file. If you 
are mounting /tmp with the "noexec" option, snappy-java won't be able to set 
the executable flag and will raise an exception. See the [SnappyLoader.java 
code|https://github.com/xerial/snappy-java/blob/master/src/main/java/org/xerial/snappy/SnappyLoader.java#L256].

We've fixed the issue by not using the option "noexec" when mounting /tmp.

Maybe it would be better that Spark sets the property 
{{org.xerial.snappy.tempdir}} to the value of {{spark.local.dir}}, but nothing 
prevents that {{spark.local.dir}} may be mounted as "noexec" also.

> Failing to run simple job on Spark Standalone Cluster
> -----------------------------------------------------
>
>                 Key: SPARK-9089
>                 URL: https://issues.apache.org/jira/browse/SPARK-9089
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.4.0
>         Environment: Staging
>            Reporter: Amar Goradia
>            Priority: Critical
>
> We are trying out Spark and as part of that, we have setup Standalone Spark 
> Cluster. As part of testing things out, we simple open PySpark shell and ran 
> this simple job: a=sc.parallelize([1,2,3]).count()
> As a result, we are getting errors. We tried googling around this error but 
> haven't been able to find exact reasoning behind why we are running into this 
> state. Can somebody please help us further look into this issue and advise us 
> on what we are missing here?
> Here is full error stack:
> >>> a=sc.parallelize([1,2,3]).count()
> 15/07/16 00:52:15 INFO SparkContext: Starting job: count at <stdin>:1
> 15/07/16 00:52:15 INFO DAGScheduler: Got job 5 (count at <stdin>:1) with 2 
> output partitions (allowLocal=false)
> 15/07/16 00:52:15 INFO DAGScheduler: Final stage: ResultStage 5(count at 
> <stdin>:1)
> 15/07/16 00:52:15 INFO DAGScheduler: Parents of final stage: List()
> 15/07/16 00:52:15 INFO DAGScheduler: Missing parents: List()
> 15/07/16 00:52:15 INFO DAGScheduler: Submitting ResultStage 5 (PythonRDD[12] 
> at count at <stdin>:1), which has no missing parents
> 15/07/16 00:52:15 INFO TaskSchedulerImpl: Cancelling stage 5
> 15/07/16 00:52:15 INFO DAGScheduler: ResultStage 5 (count at <stdin>:1) 
> failed in Unknown s
> 15/07/16 00:52:15 INFO DAGScheduler: Job 5 failed: count at <stdin>:1, took 
> 0.004963 s
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/opt/spark/spark-1.4.0-bin-hadoop2.4/python/pyspark/rdd.py", line 
> 972, in count
>     return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
>   File "/opt/spark/spark-1.4.0-bin-hadoop2.4/python/pyspark/rdd.py", line 
> 963, in sum
>     return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add)
>   File "/opt/spark/spark-1.4.0-bin-hadoop2.4/python/pyspark/rdd.py", line 
> 771, in reduce
>     vals = self.mapPartitions(func).collect()
>   File "/opt/spark/spark-1.4.0-bin-hadoop2.4/python/pyspark/rdd.py", line 
> 745, in collect
>     port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
>   File 
> "/opt/spark/spark-1.4.0-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
>  line 538, in __call__
>   File 
> "/opt/spark/spark-1.4.0-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
>  line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 
> serialization failed: java.lang.reflect.InvocationTargetException
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:68)
> org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:60)
> org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$setConf(TorrentBroadcast.scala:73)
> org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:80)
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
> org.apache.spark.SparkContext.broadcast(SparkContext.scala:1289)
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:874)
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:815)
> org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:799)
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1419)
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
> org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>       at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266)
>       at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257)
>       at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256)
>       at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>       at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>       at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256)
>       at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:884)
>       at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:815)
>       at 
> org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:799)
>       at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1419)
>       at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
>       at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9089) Failing to run simple job on Spark Standalone Cluster

Reply via email to