Hi Uwe!

> As your suggestions don't seem to be specific to Arrow, why not
contribute them directly to jemalloc? They are much better in reviewing
allocator code than we are.
I mentioned this idea in the jemalloc gitter. The first response was that
it should work but workloads with realloc aren't very common and these huge
allocations can have some impact during coredumps. It's true that jemalloc
is supposed to be general purpose allocator, so it has to make compromises
in that sense. Let's wait to see if other answers come in. You can follow
the conversation here if you are interested
https://gitter.im/jemalloc/jemalloc. It seemed to me while investigating on
Jemalloc inners that most of the conception effort is concentrated on
small/medium allocations. This makes sense as they probably represent 99%
of workloads. But Arrow is really about structuring larger chunks of data
in memory. I agree with Antoine that in normal circumstances, if you start
to blame the allocator, it means that you likely got things wrong
elsewhere. But maybe the Arrow workload is specific enough to get a sit in
the realm of exceptions (right next to Video Game engines) ! ;-)

> Still, when we read a column, we should be able to determine its final
size from the Parquet metadata. Maybe we're passing an information there
not along?
I'm going to take a look at this. But even if for Parquet you can get a
good estimation, what about CSV or JSON ? How can you estimate the size of
a column of strings beforehand, specially with a compressed payload where
you don't even know the number of lines!

Le ven. 5 juin 2020 à 12:24, Uwe L. Korn <uw...@xhochy.com> a écrit :

> Hello Rémi,
>
> under the hood jemalloc does quite similar things to what you describe.
> I'm not sure what the offset is in the current version but in earlier
> releases, it used a different allocation strategy for objects above 4MB.
> For the initial large allocation, you will see quite some copies as mmap is
> returning a new base address and it isn't able to reuse an existing space.
> This could probably be circumvented by a single large allocation which is
> free'd again.
>
> As your suggestions don't seem to be specific to Arrow, why not contribute
> them directly to jemalloc? They are much better in reviewing allocator code
> than we are.
>
> Still, when we read a column, we should be able to determine its final
> size from the Parquet metadata. Maybe we're passing an information there
> not along?
>
> Best,
> Uwe
>
> On Thu, Jun 4, 2020, at 5:48 PM, Rémi Dettai wrote:
> > When creating large arrays, Arrow uses realloc quite intensively.
> >
> > I have an example where y read a gzipped parquet column (strings) that
> > expands from 8MB to 100+MB when loaded into Arrow. Of course Jemalloc
> > cannot anticipate this and every reallocate call above 1MB (the most
> > critical ones) ends up being a copy.
> >
> > I think that knowing that we like using realloc in Arrow, we could come
> up
> > with an allocator for large objects that would behave a lot better than
> > Jemalloc. For smaller objects, this allocator could just let the memory
> > request being handled by Jemalloc. Not trying to outsmart the brilliant
> > guys from Facebook and co ;-) But for larger objects, we could adopt a
> > custom strategy:
> > - if an allocation or a re-allocation larger than 1MB (or maybe even
> 512K)
> > is made on our memory pool, call mmap with size XGB (X being slightly
> > smaller than the total physical memory on the system). This is ok because
> > mmap will not physically allocate this memory as long as it is not
> touched.
> > - we keep track of all allocations that we created this way, by storing
> the
> > pointer + the actual used size inside our XGB alloc in a map.
> > - when growing an alloc mmaped this way we will always have contiguous
> > memory available, (otherwise we would already have OOMed because X is the
> > physical memory size).
> > - when reducing the alloc size we can free with madvice (optional: if the
> > alloc becomes small enough, we might copy it back into a Jemalloc
> > allocation).
> >
> > I am not an expert of these matters, and I just learned what an allocator
> > really is, so my approach might be naive. In this case feel free ton
> > enlighten me!
> >
> > Please note that I'm not sure about the level of portability of this
> > solution.
> >
> > Have a nice day!
> >
> > Remi
> >
>

Reply via email to