I looked into the details of why the decoder could not estimate the target
Arrow array size for my Parquet column. It's because I am decoding from
Parquet-Dictionary to Arrow-Plain (which is the default when loading
Parquet). In this case the size prediction is impossible :-(

> This would actually be the most interesting thing. In general, getting
access to the pages mapped into RAM would improve in a lot of more
situations, not just realloction. For example, when you take a small slice
of a large array and only pass this on, but don't an explicit reference to
the array, you will still indirectly hold on the larger memory size. Having
an allocator that would understand the mapping between pages and memory
block would allow us to free the pages that are not part of the view
Not sure I'm following you on this one. From my understanding the subject
here is mremap which allows you to keep your physical memory but change the
virtual address range that points to it. It seems according to this (
https://stackoverflow.com/questions/11621606/faster-way-to-move-memory-page-than-mremap)
that is mainly efficient for growing large allocations.

Le ven. 5 juin 2020 à 16:25, Uwe L. Korn <uw...@xhochy.com> a écrit :

>
> On Fri, Jun 5, 2020, at 3:13 PM, Rémi Dettai wrote:
> > Hi Antoine !
> > > I would indeed have expected jemalloc to do that (remap the pages)
> > I have no idea about the performance gain this would provide (if any).
> > Could be interesting to explore.
>
> This would actually be the most interesting thing. In general, getting
> access to the pages mapped into RAM would improve in a lot of more
> situations, not just realloction. For example, when you take a small slice
> of a large array and only pass this on, but don't an explicit reference to
> the array, you will still indirectly hold on the larger memory size. Having
> an allocator that would understand the mapping between pages and memory
> block would allow us to free the pages that are not part of the view.
>
> Also, yes: For CSV and JSON, we don't have size estimates beforehand.
> There this would be a great performance improvement.
>
> Best
> Uwe
>

Reply via email to