Re: Question about bucketing and custom partitioners

2022-04-11 Thread Gourav Sengupta
Hi, have you checked skew settings in SPARK 3.2? I am also not quite sure why you need a custom partitioner? While RDD still remains a valid option you must try to explore the recent ways of thinking and framing better solutions using SPARK. Regards, Gourav Sengupta On Mon, Apr 11, 2022 at 4:47

Re: Question about bucketing and custom partitioners

2022-04-11 Thread Lalwani, Jayesh
[EXTERNAL] Question about bucketing and custom partitioners CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hello, I have a few questions related to bucketing and custom pa

Re: Question about bucketing and custom partitioners

2022-04-11 Thread ayan guha
IMHO you should ask this to dev email for better response and suggestions On Tue, 12 Apr 2022 at 1:47 am, David Diebold wrote: > Hello, > > I have a few questions related to bucketing and custom partitioning in > dataframe api. > > I am considering bucketing to perform one-side free shuffle

Question about bucketing and custom partitioners

2022-04-11 Thread David Diebold
Hello, I have a few questions related to bucketing and custom partitioning in dataframe api. I am considering bucketing to perform one-side free shuffle join in incremental jobs, but there is one thing that I'm not happy with. Data is likely to grow/skew over time. At some point, i would need to