I'm not quite sure if this hint is useful. People usually keep a buffer and
flush the buffer when it's full, so that they can control the batch size of
writing, no matter how many inputs they will get. e.g. if spark hints to
you that there will be 1 GB data, are you going to allocate a 1 GB buffer
Hello Wenchen,
On Wed, Aug 16, 2023 at 23:33 Wenchen Fan wrote:
> > is there a way to hint to the downstream users on the number of rows
> expected to write?
>
> It will be very hard to do. Spark pipelines the execution (within shuffle
> boundaries) and we can't predict the number of final
> is there a way to hint to the downstream users on the number of rows
expected to write?
It will be very hard to do. Spark pipelines the execution (within shuffle
boundaries) and we can't predict the number of final output rows.
On Mon, Aug 7, 2023 at 8:27 PM Steve Loughran
wrote:
>
>
> On
On Thu, 1 Jun 2023 at 00:58, Andrew Melo wrote:
> Hi all
>
> I've been developing for some time a Spark DSv2 plugin "Laurelin" (
> https://github.com/spark-root/laurelin
> ) to read the ROOT (https://root.cern) file format (which is used in high
> energy physics). I've recently presented my work
Hello Spark Devs
Could anyone help me with this?
Thanks,
Andrew
On Wed, May 31, 2023 at 20:57 Andrew Melo wrote:
> Hi all
>
> I've been developing for some time a Spark DSv2 plugin "Laurelin" (
> https://github.com/spark-root/laurelin
> ) to read the ROOT (https://root.cern) file format
Hi all
I've been developing for some time a Spark DSv2 plugin "Laurelin" (
https://github.com/spark-root/laurelin
) to read the ROOT (https://root.cern) file format (which is used in high
energy physics). I've recently presented my work in a conference (