Le 05/06/2020 à 17:09, Rémi Dettai a écrit : > I looked into the details of why the decoder could not estimate the target > Arrow array size for my Parquet column. It's because I am decoding from > Parquet-Dictionary to Arrow-Plain (which is the default when loading > Parquet). In this case the size prediction is impossible :-(
But we can probably make up a heuristic. For example avg(dictionary value size) * number of non-null values It would avoid a number of resizes, even though there may still be a couple of them at the end. It may oversize in some cases, but much less than your proposed strategy of reserving a huge chunk of virtual memory :-) Regards Antoine.