Re: Spark writing API

2023-08-17 Thread Wenchen Fan
I'm not quite sure if this hint is useful. People usually keep a buffer and flush the buffer when it's full, so that they can control the batch size of writing, no matter how many inputs they will get. e.g. if spark hints to you that there will be 1 GB data, are you going to allocate a 1 GB buffer

Re: Spark writing API

2023-08-16 Thread Andrew Melo
Hello Wenchen, On Wed, Aug 16, 2023 at 23:33 Wenchen Fan wrote: > > is there a way to hint to the downstream users on the number of rows > expected to write? > > It will be very hard to do. Spark pipelines the execution (within shuffle > boundaries) and we can't predict the number of final

Re: Spark writing API

2023-08-16 Thread Wenchen Fan
> is there a way to hint to the downstream users on the number of rows expected to write? It will be very hard to do. Spark pipelines the execution (within shuffle boundaries) and we can't predict the number of final output rows. On Mon, Aug 7, 2023 at 8:27 PM Steve Loughran wrote: > > > On

Re: Spark writing API

2023-08-07 Thread Steve Loughran
On Thu, 1 Jun 2023 at 00:58, Andrew Melo wrote: > Hi all > > I've been developing for some time a Spark DSv2 plugin "Laurelin" ( > https://github.com/spark-root/laurelin > ) to read the ROOT (https://root.cern) file format (which is used in high > energy physics). I've recently presented my work

Re: Spark writing API

2023-08-02 Thread Andrew Melo
Hello Spark Devs Could anyone help me with this? Thanks, Andrew On Wed, May 31, 2023 at 20:57 Andrew Melo wrote: > Hi all > > I've been developing for some time a Spark DSv2 plugin "Laurelin" ( > https://github.com/spark-root/laurelin > ) to read the ROOT (https://root.cern) file format

Spark writing API

2023-05-31 Thread Andrew Melo
Hi all I've been developing for some time a Spark DSv2 plugin "Laurelin" ( https://github.com/spark-root/laurelin ) to read the ROOT (https://root.cern) file format (which is used in high energy physics). I've recently presented my work in a conference (