Re: pyspark dataframe join with two different data type

2024-05-15 Thread Karthick Nk
Thanks Mich, I have tried this solution, but i want all the columns from the dataframe df_1, if i explode the df_1 i am getting only data column. But the resultant should get the all the column from the df_1 with distinct result like below. Results in *df:* +---+ |column1| +---+ |

How to provide a Zstd "training mode" dictionary object

2024-05-15 Thread Saha, Daniel
Hi, I understand that Zstd compression can optionally be provided a dictionary object to improve performance. See “training mode” here https://facebook.github.io/zstd/ Does Spark surface a way to provide this dictionary object when writing/reading data? What about for intermediate shuffle

Query Regarding UDF Support in Spark Connect with Kubernetes as Cluster Manager

2024-05-15 Thread Nagatomi Yasukazu
Hi Spark Community, I have a question regarding the support for User-Defined Functions (UDFs) in Spark Connect, specifically when using Kubernetes as the Cluster Manager. According to the Spark documentation, UDFs are supported by default for the shell and in standalone applications with