Github user squito commented on the issue:

    https://github.com/apache/spark/pull/21977
  
    it fails consistently for me locally too, with your branch, but with this 
failure:
    
    ```
     [info]     File 
"/Users/irashid/github/pub/spark/target/tmp/spark-7c0a388c-1413-4215-9a4d-c590edec929c/test.py",
 line 15, in <modul
    e>
    [info]       cnt = rdd.count()
    [info]     File 
"/Users/irashid/github/pub/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 
1075, in count
    [info]     File 
"/Users/irashid/github/pub/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 
1066, in sum
    [info]     File 
"/Users/irashid/github/pub/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 
937, in fold
    [info]     File 
"/Users/irashid/github/pub/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 
836, in collect
    [info]     File 
"/Users/irashid/github/pub/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
 line 1257, in __call__
    [info]     File 
"/Users/irashid/github/pub/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py",
 line 328, in get_return_value
    [info]   py4j.protocol.Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.collectAndServe.
    [info]   : org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 0 in stage 0.0 failed 4 times, most recent failu
    re: Lost task 0.3 in stage 0.0 (TID 6, 192.168.86.45, executor 1): 
org.apache.spark.SparkException: This RDD lacks a SparkContext. 
    It could happen in the following cases: 
    [info]   (1) RDD transformations and actions are NOT invoked by the driver, 
but inside of other transformations; for example, rdd1.
    map(x => rdd2.values.count() * x) is invalid because the values 
transformation and count action cannot be performed inside of the r
    dd1.map transformation. For more information, see SPARK-5063.
    [info]   (2) When a Spark Streaming job recovers from checkpoint, this 
exception will be hit if a reference to an RDD not defined b
    y the streaming job is used in DStream operations. For more information, 
See SPARK-13758.
    [info]          at 
org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$sc(RDD.scala:90)
    [info]          at org.apache.spark.rdd.RDD.conf(RDD.scala:107)
    [info]          at 
org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:61)
    [info]          at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    [info]          at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    [info]          at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    [info]          at org.apache.spark.scheduler.Task.run(Task.scala:128)
    [info]          at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:367)
    [info]          at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    [info]          at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    [info]          at java.lang.Thread.run(Thread.java:745)
    ```
    
    the stack trace you pasted above is a little weird, because one of the 
lines it mentions doesn't seem right (though maybe you had some other debug 
code in there?): 
https://github.com/rdblue/spark/blob/SPARK-25004-add-python-memory-limit/python/pyspark/context.py?utf8=%E2%9C%93#L188


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to