Re: Setting spark.kubernetes.driver.connectionTimeout, spark.kubernetes.submission.connectionTimeout to default spark.network.timeout

2022-08-01 Thread Dongjoon Hyun
Hi, Pralabh. Could you elaborate on your situation more? I'm interested in your needs. Currently, the default value of spark.network.timeout, 120s, is quite bigger than the default value of spark.kubernetes.driver.connectionTimeout, 10s. It would be a breaking change if we increase

Setting spark.kubernetes.driver.connectionTimeout, spark.kubernetes.submission.connectionTimeout to default spark.network.timeout

2022-08-01 Thread Pralabh Kumar
Hi Dev team Since* spark.network.timeout* is default for all the network transactions . Shouldn’t *spark.kubernetes.driver.connectionTimeout*, *spark.kubernetes.submission.connectionTimeout* by default to be set spark.network.timeout . Users migrating from Yarn to K8s are familiar with

Re: Non-deterministic function duplicated in final Spark plan

2022-08-01 Thread Wenchen Fan
This is a hard one. Spark duplicates the join child plan if it's a self-join because Spark does not support diamond-shaped query plans. Seems the only option here is to write the join child plan to a parquet table (or using a shuffle) and read it back. On Mon, Aug 1, 2022 at 4:46 PM Enrico Minack