spark.python.worker.reuse not working as expected

David Figueroa Thu, 26 Apr 2018 06:26:29 -0700

 given this code block

def return_pid(_): yield os.getpid()
spark = SparkSession.builder.getOrCreate()
pids = set(spark.sparkContext.range(32).mapPartitions(return_pid).collect())


print(pids)
pids = set(spark.sparkContext.range(32).mapPartitions(return_pid).collect())

print(pids)

I was expecting that the same python process ids will be printed twice.
instead, completely different Python process ids are being printed.

spark.python.worker.reuse is true but default. but this unexpected
behaviors still occurs if spark.python.worker.reuse=true explicitly.

spark.python.worker.reuse not working as expected

Reply via email to