given this code block def return_pid(_): yield os.getpid() spark = SparkSession.builder.getOrCreate() pids = set(spark.sparkContext.range(32).mapPartitions(return_pid).collect())
print(pids) pids = set(spark.sparkContext.range(32).mapPartitions(return_pid).collect()) print(pids) I was expecting that the same python process ids will be printed twice. instead, completely different Python process ids are being printed. spark.python.worker.reuse is true but default. but this unexpected behaviors still occurs if spark.python.worker.reuse=true explicitly.