spark UI not showing correct file and line numbers

2021-10-05 Thread Rochel Wasserman
Hi, I am experiencing performance issues in one of my pyspark applications. When I look at the spark UI, the file and line number of each entry is listed as . I would like to use the information in the Spark UI for debugging, but without knowing the correct file and line number for the

Re: Current state of dataset api

2021-10-05 Thread Koert Kuipers
the encoder api remains a pain point due to its lack of composability. serialization overhead is also still there i believe. i dont remember what has happened to the predicate pushdown issues, i think they are mostly resolved? we tend to use dataset api on our methods/interfaces where its fitting

Re: [EXTERNAL] [Marketing Mail] Re: [Spark] Optimize spark join on different keys for same data frame

2021-10-05 Thread Saurabh Gulati
Hi Amit, The only approach I can think of is to create 2 copies of schema_df1​, one partitioned on key1 and other on key2 and then use these to Join. From: Amit Joshi Sent: 04 October 2021 19:13 To: spark-user Subject: [EXTERNAL] [Marketing Mail] Re: [Spark]