given this code block

def return_pid(_): yield os.getpid()
spark = SparkSession.builder.getOrCreate()
pids = set(spark.sparkContext.range(32).mapPartitions(return_pid).collect())

print(pids)
pids = set(spark.sparkContext.range(32).mapPartitions(return_pid).collect())

print(pids)

I was expecting that the same python process ids will be printed twice.
instead, completely different Python process ids are being printed.

spark.python.worker.reuse is true but default. but this unexpected
behaviors still occurs if spark.python.worker.reuse=true explicitly.

Reply via email to