Le 05/06/2020 à 16:25, Uwe L. Korn a écrit :
> 
> On Fri, Jun 5, 2020, at 3:13 PM, Rémi Dettai wrote:
>> Hi Antoine !
>>> I would indeed have expected jemalloc to do that (remap the pages)
>> I have no idea about the performance gain this would provide (if any).
>> Could be interesting to explore.
> 
> This would actually be the most interesting thing. In general, getting access 
> to the pages mapped into RAM would improve in a lot of more situations, not 
> just realloction. For example, when you take a small slice of a large array 
> and only pass this on, but don't an explicit reference to the array, you will 
> still indirectly hold on the larger memory size. Having an allocator that 
> would understand the mapping between pages and memory block would allow us to 
> free the pages that are not part of the view.
> 
> Also, yes: For CSV and JSON, we don't have size estimates beforehand. There 
> this would be a great performance improvement.

For CSV we actually know the size after parsing:
https://github.com/apache/arrow/blob/master/cpp/src/arrow/csv/converter.cc#L177-L178

It would be a shame if this were possible in CSV but not in Parquet, a
storage format dedicated to big columnar data.

Regards

Antoine.

Reply via email to