Hi! have you tried spark.sql.files.maxRecordsPerFile ?
As a workaround you could try to see how many rows are 128MB and then set that number in that property. Best On Thu, 1 Apr 2021 at 00:38, mhawes <hawes.i...@gmail.com> wrote: > Okay from looking closer at some of the code, I'm not sure that what I'm > asking for in terms of adaptive execution makes much sense as it can only > happen between stages. I.e. optimising future /stages/ based on the results > of previous stages. Thus an "on-demand" adaptive coalesce doesn't make much > sense as it wouldn't necessarily occur at a stage boundary. > > However I think my original question still stands of: > - How to /dynamically/ deal with poorly partitioned data without incurring > a > shuffle or extra computation. > > I think the only thing that's changed is that I no longer have any good > ideas on how to do it :/ > > > > -- > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >