HI List,

I am launching this spark-submit job:

hadoop@testbedocg:/mnt/spark> bin/spark-submit --packages 
com.datastax.spark:spark-cassandra-connector_2.10:1.6.0 --jars 
/mnt/spark/lib/TargetHolding_pyspark-cassandra-0.3.5.jar spark_v2.py

spark_v2.py is:
from pyspark_cassandra import CassandraSparkContext, Row
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
conf = 
SparkConf().setAppName("test").setMaster("spark://192.168.23.31:7077").set("spark.cassandra.connection.host",
 "192.168.23.31")
sc = CassandraSparkContext(conf=conf)
table = sc.cassandraTable("lebara_diameter_codes","nl_lebara_diameter_codes")
food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
food_count.collect()


Error I get when running the above command:

[Stage 0:>                                                          (0 + 3) / 
7]16/06/30 10:40:36 ERROR TaskSchedulerImpl: Lost executor 0 on as5: Remote RPC 
client disassociated. Likely due to containers exceeding thresholds, or network 
issues. Check driver logs for WARN messages.
[Stage 0:>                                                          (0 + 7) / 
7]16/06/30 10:40:40 ERROR TaskSchedulerImpl: Lost executor 1 on as4: Remote RPC 
client disassociated. Likely due to containers exceeding thresholds, or network 
issues. Check driver logs for WARN messages.
[Stage 0:>                                                          (0 + 5) / 
7]16/06/30 10:40:42 ERROR TaskSchedulerImpl: Lost executor 3 on as5: Remote RPC 
client disassociated. Likely due to containers exceeding thresholds, or network 
issues. Check driver logs for WARN messages.
[Stage 0:>                                                          (0 + 4) / 
7]16/06/30 10:40:46 ERROR TaskSchedulerImpl: Lost executor 4 on as4: Remote RPC 
client disassociated. Likely due to containers exceeding thresholds, or network 
issues. Check driver logs for WARN messages.
16/06/30 10:40:46 ERROR TaskSetManager: Task 5 in stage 0.0 failed 4 times; 
aborting job
Traceback (most recent call last):
  File "/mnt/spark-1.6.1-bin-hadoop2.6/spark_v2.py", line 11, in <module>
    food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1004, in count
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 995, in sum
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 869, in fold
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 771, in collect
  File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, 
in __call__
  File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in 
get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in 
stage 0.0 failed 4 times, most recent failure: Lost task 5.3 in stage 0.0 (TID 
14, as4): ExecutorLostFailure (executor 4 exited caused by one of the running 
tasks) Reason: Remote RPC client disassociated. Likely due to containers 
exceeding thresholds, or network issues. Check driver logs for WARN messages.
Driver stacktrace:
        at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
        at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
       at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
        at scala.Option.foreach(Option.scala:236)
        at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
        at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
        at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
        at 
org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:405)
        at 
org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)


>From the jobs URL spark web (stderr log page for app-20160630104030-0086/4):

stderr log page for app-20160630104030-0086/4
16/06/30 10:44:34 ERROR util.Utils: Uncaught exception in thread stdout writer 
for python
java.lang.AbstractMethodError: 
pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object;
              at 
com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
              at 
com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
              at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
              at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
              at 
com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16)
              at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
              at 
scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:914)
              at 
scala.collection.Iterator$GroupedIterator.go(Iterator.scala:929)
              at 
scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:968)
              at 
scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
              at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
              at scala.collection.Iterator$class.foreach(Iterator.scala:727)
              at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
              at 
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452)
              at 
org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280)
              at 
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765)
              at 
org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239)
16/06/30 10:44:34 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception 
in thread Thread[stdout writer for python,5,main]
java.lang.AbstractMethodError: 
pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object;
              at 
com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
              at 
com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
              at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
              at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
              at 
com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16)
              at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
              at 
scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:914)
              at 
scala.collection.Iterator$GroupedIterator.go(Iterator.scala:929)
              at 
scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:968)
              at 
scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
              at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
              at scala.collection.Iterator$class.foreach(Iterator.scala:727)
              at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
              at 
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452)
              at 
org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280)
              at 
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765)
               at 
org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239) 
BR

Joaquin
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.

Reply via email to