On Thu, 4 Jun 2020 17:48:16 +0200 Rémi Dettai <rdet...@gmail.com> wrote: > When creating large arrays, Arrow uses realloc quite intensively. > > I have an example where y read a gzipped parquet column (strings) that > expands from 8MB to 100+MB when loaded into Arrow. Of course Jemalloc > cannot anticipate this and every reallocate call above 1MB (the most > critical ones) ends up being a copy.
Ideally, we should be able to presize the array to a good enough estimate. I don't know if Parquet gives us enough information for that, though. Regards Antoine.