>>> 16/06/30 10:44:34 ERROR util.Utils: Uncaught exception in thread stdout
writer for python

java.lang.AbstractMethodError:
pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object;


You are trying to call an abstract method.  Please check the method
DeferringRowReader.read

On Thu, Jun 30, 2016 at 4:34 AM, Joaquin Alzola <joaquin.alz...@lebara.com>
wrote:

> HI List,
>
>
>
> I am launching this spark-submit job:
>
>
>
> hadoop@testbedocg:/mnt/spark> bin/spark-submit --packages
> com.datastax.spark:spark-cassandra-connector_2.10:1.6.0 --jars
> /mnt/spark/lib/TargetHolding_pyspark-cassandra-0.3.5.jar spark_v2.py
>
>
>
> spark_v2.py is:
>
> from pyspark_cassandra import CassandraSparkContext, Row
>
> from pyspark import SparkContext, SparkConf
>
> from pyspark.sql import SQLContext
>
> conf = SparkConf().setAppName("test").setMaster("spark://
> 192.168.23.31:7077").set("spark.cassandra.connection.host",
> "192.168.23.31")
>
> sc = CassandraSparkContext(conf=conf)
>
> table =
> sc.cassandraTable("lebara_diameter_codes","nl_lebara_diameter_codes")
>
> food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
>
> food_count.collect()
>
>
>
>
>
> Error I get when running the above command:
>
>
>
> [Stage 0:>                                                          (0 +
> 3) / 7]16/06/30 10:40:36 ERROR TaskSchedulerImpl: Lost executor 0 on as5:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> [Stage 0:>                                                          (0 +
> 7) / 7]16/06/30 10:40:40 ERROR TaskSchedulerImpl: Lost executor 1 on as4:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> [Stage 0:>                                                          (0 +
> 5) / 7]16/06/30 10:40:42 ERROR TaskSchedulerImpl: Lost executor 3 on as5:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> [Stage 0:>                                                          (0 +
> 4) / 7]16/06/30 10:40:46 ERROR TaskSchedulerImpl: Lost executor 4 on as4:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> 16/06/30 10:40:46 ERROR TaskSetManager: Task 5 in stage 0.0 failed 4
> times; aborting job
>
> Traceback (most recent call last):
>
>   File "/mnt/spark-1.6.1-bin-hadoop2.6/spark_v2.py", line 11, in <module>
>
>     food_count =
> table.select("errorcode2001").groupBy("errorcode2001").count()
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1004, in
> count
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 995, in sum
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 869, in
> fold
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 771, in
> collect
>
>   File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line
> 813, in __call__
>
>   File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line
> 308, in get_return_value
>
> py4j.protocol.Py4JJavaError: An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
>
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task
> 5 in stage 0.0 failed 4 times, most recent failure: Lost task 5.3 in stage
> 0.0 (TID 14, as4): ExecutorLostFailure (executor 4 exited caused by one of
> the running tasks) Reason: Remote RPC client disassociated. Likely due to
> containers exceeding thresholds, or network issues. Check driver logs for
> WARN messages.
>
> Driver stacktrace:
>
>         at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
>
>         at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
>         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
>         at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
>
>        at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>
>         at scala.Option.foreach(Option.scala:236)
>
>         at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
>
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
>
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
>
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
>
>         at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>
>         at
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
>
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
>
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
>
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
>
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
>
>         at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
>
>         at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>
>         at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>
>         at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>
>         at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
>
>         at
> org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:405)
>
>         at
> org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>         at java.lang.reflect.Method.invoke(Method.java:498)
>
>         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>
>         at
> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
>
>         at py4j.Gateway.invoke(Gateway.java:259)
>
>         at
> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>
>         at py4j.commands.CallCommand.execute(CallCommand.java:79)
>
>         at py4j.GatewayConnection.run(GatewayConnection.java:209)
>
>         at java.lang.Thread.run(Thread.java:745)
>
>
>
>
>
> From the jobs URL spark web (*stderr log page for
> app-20160630104030-0086/4)*:
>
>
>
> stderr log page for app-20160630104030-0086/4
>
> 16/06/30 10:44:34 ERROR util.Utils: Uncaught exception in thread stdout
> writer for python
>
> java.lang.AbstractMethodError:
> pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object;
>
>               at
> com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
>
>               at
> com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
>
>               at
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
>               at
> scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
>
>               at
> com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16)
>
>               at
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
>               at
> scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:914)
>
>               at
> scala.collection.Iterator$GroupedIterator.go(Iterator.scala:929)
>
>               at
> scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:968)
>
>               at
> scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
>
>               at
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>
>               at
> scala.collection.Iterator$class.foreach(Iterator.scala:727)
>
>               at
> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>
>               at
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452)
>
>               at
> org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280)
>
>               at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765)
>
>               at
> org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239)
>
> 16/06/30 10:44:34 ERROR util.SparkUncaughtExceptionHandler: Uncaught
> exception in thread Thread[stdout writer for python,5,main]
>
> java.lang.AbstractMethodError:
> pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object;
>
>               at
> com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
>
>               at
> com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
>
>               at
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
>               at
> scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
>
>               at
> com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16)
>
>               at
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
>               at
> scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:914)
>
>               at
> scala.collection.Iterator$GroupedIterator.go(Iterator.scala:929)
>
>               at
> scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:968)
>
>               at
> scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
>
>               at
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>
>               at
> scala.collection.Iterator$class.foreach(Iterator.scala:727)
>
>               at
> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>
>               at
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452)
>
>               at
> org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280)
>
>               at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765)
>
>                at
> org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239)
> BR
>
>
>
> Joaquin
> This email is confidential and may be subject to privilege. If you are not
> the intended recipient, please do not copy or disclose its content but
> contact the sender immediately upon receipt.
>



-- 
Best Regards

Jeff Zhang

Reply via email to