Github user squito commented on the issue: https://github.com/apache/spark/pull/21977 it fails consistently for me locally too, with your branch, but with this failure: ``` [info] File "/Users/irashid/github/pub/spark/target/tmp/spark-7c0a388c-1413-4215-9a4d-c590edec929c/test.py", line 15, in <modul e> [info] cnt = rdd.count() [info] File "/Users/irashid/github/pub/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1075, in count [info] File "/Users/irashid/github/pub/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1066, in sum [info] File "/Users/irashid/github/pub/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 937, in fold [info] File "/Users/irashid/github/pub/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 836, in collect [info] File "/Users/irashid/github/pub/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__ [info] File "/Users/irashid/github/pub/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value [info] py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. [info] : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failu re: Lost task 0.3 in stage 0.0 (TID 6, 192.168.86.45, executor 1): org.apache.spark.SparkException: This RDD lacks a SparkContext. It could happen in the following cases: [info] (1) RDD transformations and actions are NOT invoked by the driver, but inside of other transformations; for example, rdd1. map(x => rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the r dd1.map transformation. For more information, see SPARK-5063. [info] (2) When a Spark Streaming job recovers from checkpoint, this exception will be hit if a reference to an RDD not defined b y the streaming job is used in DStream operations. For more information, See SPARK-13758. [info] at org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$sc(RDD.scala:90) [info] at org.apache.spark.rdd.RDD.conf(RDD.scala:107) [info] at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:61) [info] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) [info] at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) [info] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) [info] at org.apache.spark.scheduler.Task.run(Task.scala:128) [info] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:367) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [info] at java.lang.Thread.run(Thread.java:745) ``` the stack trace you pasted above is a little weird, because one of the lines it mentions doesn't seem right (though maybe you had some other debug code in there?): https://github.com/rdblue/spark/blob/SPARK-25004-add-python-memory-limit/python/pyspark/context.py?utf8=%E2%9C%93#L188
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org