Restarting the SparkContext in pyspark

Alexander Czech Thu, 24 Aug 2017 09:01:08 -0700

I'm running a Jupyter-Spark setup and I want to benchmark my cluster with
different input parameters. To make sure the enivorment stays the same I'm
trying to reset(restart) the SparkContext, here is some code:















*    temp_result_parquet = os.path.normpath('/home/spark_tmp_parquet')    i
= 0     while i < max_i:        i += 1        if
os.path.exists(temp_result_parquet):
shutil.rmtree(temp_result_parquet) # I know I could simply overwrite the
parquet                My_DF = do_something(i)
My_DF.write.parquet(temp_result_parquet)        sc.stop()
time.sleep(10)        sc = pyspark.SparkContext(master='spark://ip:here',
appName='PySparkShell')*

when I do this in the first iteration it runs fine but in the second I get
the following error:







*    Py4JJavaError: An error occurred while calling o1876.parquet.    :
org.apache.spark.SparkException: Job aborted.    [...]    Caused by:
java.lang.IllegalStateException: SparkContext has been shutdown        at
org.apache.spark.SparkContext.runJob(SparkContext.scala:2014)        at
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:188)*
I tried running the code without the SparkContext restart but this results
in memory issues. So to wipe the slate clean before every iteration I'm
trying this. With the weird result that parquet "thinks" SparkContext is
down.

Restarting the SparkContext in pyspark

Reply via email to