Hi,
have you checked skew settings in SPARK 3.2?
I am also not quite sure why you need a custom partitioner? While RDD still
remains a valid option you must try to explore the recent ways of thinking
and framing better solutions using SPARK.
Regards,
Gourav Sengupta
On Mon, Apr 11, 2022 at 4:47
[EXTERNAL] Question about bucketing and custom partitioners
CAUTION: This email originated from outside of the organization. Do not click
links or open attachments unless you can confirm the sender and know the
content is safe.
Hello,
I have a few questions related to bucketing and custom pa
IMHO you should ask this to dev email for better response and suggestions
On Tue, 12 Apr 2022 at 1:47 am, David Diebold
wrote:
> Hello,
>
> I have a few questions related to bucketing and custom partitioning in
> dataframe api.
>
> I am considering bucketing to perform one-side free shuffle
Hello,
I have a few questions related to bucketing and custom partitioning in
dataframe api.
I am considering bucketing to perform one-side free shuffle join in
incremental jobs, but there is one thing that I'm not happy with.
Data is likely to grow/skew over time. At some point, i would need to