Le 05/06/2020 à 17:09, Rémi Dettai a écrit :
> I looked into the details of why the decoder could not estimate the target
> Arrow array size for my Parquet column. It's because I am decoding from
> Parquet-Dictionary to Arrow-Plain (which is the default when loading
> Parquet). In this case the size prediction is impossible :-(

But we can probably make up a heuristic.  For example
   avg(dictionary value size) * number of non-null values

It would avoid a number of resizes, even though there may still be a
couple of them at the end.  It may oversize in some cases, but much less
than your proposed strategy of reserving a huge chunk of virtual memory :-)

Regards

Antoine.

Reply via email to