Re: [Spark Core]: Adding support for size based partition coalescing

German Schiavon Thu, 01 Apr 2021 07:23:45 -0700

Hi!

have you tried spark.sql.files.maxRecordsPerFile ?


As a workaround you could try to see how many rows are 128MB and then set
that number in that property.

Best


On Thu, 1 Apr 2021 at 00:38, mhawes <hawes.i...@gmail.com> wrote:

> Okay from looking closer at some of the code, I'm not sure that what I'm
> asking for in terms of adaptive execution makes much sense as it can only
> happen between stages. I.e. optimising future /stages/ based on the results
> of previous stages. Thus an "on-demand" adaptive coalesce doesn't make much
> sense as it wouldn't necessarily occur at a stage boundary.
>
> However I think my original question still stands of:
> - How to /dynamically/ deal with poorly partitioned data without incurring
> a
> shuffle or extra computation.
>
> I think the only thing that's changed is that I no longer have any good
> ideas on how to do it :/
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: [Spark Core]: Adding support for size based partition coalescing

Reply via email to