I believe the formal Parquet standard already allows a file per column. At least I remember it being discussed when the spec was first implemented. If you look at the thrift spec it actually allows for this:
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L771 That being said, I'm not sure which readers support this read pattern. If it is part of the spec, doing it as a Parquet writing mode makes sense to me. On Mon, Jul 13, 2020 at 11:08 PM Roman Karlstetter < roman.karlstet...@gmail.com> wrote: > > I'd suggest a new write pattern. Write the columns page at a time to > separate files then use a second process to concatenate the columns and > append the footer. Odds are you would do better than os swapping and take > memory requirements down to > page size times field count. > > This is exactly what a student of us implemented pretty successfully: > writing to one file per column (non-parquet, binary, memory-mapped). And > once enough data is put into those "cache/buffer-files", the data is > flushed to a parquet rowgroup. > > My question targeted the integration of these ideas into the arrow parquet > writer. I wanted to know whether it makes sense to integrate these ideas or > whether it's better to keep that functionality outside of arrow/parquet. > Having it inside would have the benefit of reduced storage space because of > encoding/compression and thus smaller overhead in the final copy phase > (less data to copy and data already encoded/compressed). But on the other > hand, having one memory mapped file per column is not something that seems > to fit well with the current design of arrow. > > Thanks for the feedback, > Roman > > Am So., 12. Juli 2020 um 03:05 Uhr schrieb Micah Kornfield < > emkornfi...@gmail.com>: > >> This is an interesting idea. For s3 multipart uploads one might run into >> limitations pretty quickly (only 10k parts appear to be supported. all but >> the last are expected to be at least 5mb if I read their docs correctly >> [1]) >> >> [1] https://docs.aws.amazon.com/AmazonS3/latest/dev/qfacts.html >> >> >> On Saturday, July 11, 2020, Jacques Nadeau <jacq...@apache.org> wrote: >> >> > I'd suggest a new write pattern. Write the columns page at a time to >> > separate files then use a second process to concatenate the columns and >> > append the footer. Odds are you would do better than os swapping and >> take >> > memory requirements down to page size times field count. >> > >> > In s3 I believe you could do this via a multipart upload and entirely >> skip >> > the second step. I don't know of any implementations that actually do >> this >> > yet. >> > >> > On Thu, Jul 9, 2020, 11:58 PM Roman Karlstetter < >> > roman.karlstet...@gmail.com> wrote: >> > >> >> Hi, >> >> >> >> I wasn't aware of the fact that jemalloc mmap automatically for larger >> >> allocations. And I didn't yet test this. >> >> >> >> The approach could be different in that we would know which parts of >> the >> >> buffers are going to be used next (the buffers are appendonly) and >> which >> >> parts won't be needed until actually flushing the rowgroup (and when >> >> flushing, we also know the order). But I'm not sure whether that >> knowledge >> >> helps a lot in a) saving memory compared to a generic allocator or b) >> >> improving performance. In addition to that, communicating this >> knowledge >> >> to >> >> the implementation will also be tricky for the general case, I guess. >> >> >> >> Regarding setting the allocator to another memory pool: I was unsure >> >> whether the memory pool is used for further allocations where the >> default >> >> memory pool would be more appropriate. If not, then setting the memory >> >> pool >> >> in the writer properties should actually work well. >> >> >> >> Maybe I should just play a bit with the different memory pool options >> and >> >> see how they behave. It makes more sense to discuss further ideas once >> I >> >> have some performance numbers. >> >> >> >> Thanks, >> >> Roman >> >> >> >> >> >> Am Fr., 10. Juli 2020 um 06:47 Uhr schrieb Micah Kornfield < >> >> emkornfi...@gmail.com>: >> >> >> >> > +parquet-dev as this seems more concerned with the non-arrow pieces >> of >> >> > parquet >> >> > >> >> > Hi Roman, >> >> > Answers inline. >> >> > >> >> > One way to solve that problem would be to use memory mapped files >> >> instead >> >> > > of plain memory buffers. That way, the number of required memory >> can >> >> be >> >> > > limited by the number of columns times the os-pagesize, which >> would be >> >> > > independent of the rowgroup-size. Consequently, large rowgroupsizes >> >> pose >> >> > no >> >> > > problem with respect to RAM consumption. >> >> > >> >> > I was under the impression that modern allocator (i.e. jemalloc) >> already >> >> > mmap for large allocations. How would this approach be different >> from >> >> the >> >> > way allocators use it? Have you prototyped this approach to see if >> it >> >> > allows for better scalability? >> >> > >> >> > >> >> > > After a quick look at how the buffers are managed inside arrow >> >> (allocated >> >> > > from a default memory pool), I have the impression that an >> >> implementation >> >> > > of this idea could be a rather huge change. I still wanted to know >> >> > whether >> >> > > that is something you could see being integrated or whether that is >> >> out >> >> > of >> >> > > scope of arrow. >> >> > >> >> > >> >> > A huge change probably isn't a great idea unless we've validated the >> >> > approach along with alternatives. Is there currently code that >> doesn't >> >> > make use of the MemoryPool [1] provided by WriterProperties? If so we >> >> > should probably fix it. Otherwise, is there a reason that you can't >> >> > substitute a customized memory pool on WriterProperties? >> >> > >> >> > Thanks, >> >> > Micah >> >> > >> >> > [1] >> >> > >> >> > https://github.com/apache/arrow/blob/5602c459eb8773b6be8059b1b11817 >> >> 5e9f16b7a3/cpp/src/parquet/properties.h#L447 >> >> > >> >> > On Thu, Jul 9, 2020 at 8:35 AM Roman Karlstetter < >> >> > roman.karlstet...@gmail.com> wrote: >> >> > >> >> > > Hi everyone, >> >> > > >> >> > > since some time now, parquet::ParquetFileWriter has the option to >> >> create >> >> > > buffered rowgroups with AppendBufferedRowGroup(), which basically >> >> gives >> >> > you >> >> > > the possibility to write to columns in any order you like (in >> >> contrast to >> >> > > the former only possible way of writing one column after the >> other). >> >> This >> >> > > is cool since it avoids the caller from having to create an in >> memory >> >> > > columnar representation of its data. >> >> > > >> >> > > However, when data size is huge compared to the available system >> >> memory >> >> > > (due to wide schema or a large rowgroupsize), this is problematic, >> as >> >> the >> >> > > buffers allocated internally can take up a large portion of RAM of >> the >> >> > > machine the conversion is running on. >> >> > > >> >> > > One way to solve that problem would be to use memory mapped files >> >> instead >> >> > > of plain memory buffers. That way, the number of required memory >> can >> >> be >> >> > > limited by the number of columns times the os-pagesize, which >> would be >> >> > > independent of the rowgroup-size. Consequently, large rowgroupsizes >> >> pose >> >> > no >> >> > > problem with respect to RAM consumption. >> >> > > >> >> > > I wonder what you generally think about the idea of integrating an >> >> > > AppendFileBufferedRowGroup() (or similar name) possibility which >> gives >> >> > the >> >> > > user the option to have the internal buffers be memory mapped >> files. >> >> > > >> >> > > After a quick look at how the buffers are managed inside arrow >> >> (allocated >> >> > > from a default memory pool), I have the impression that an >> >> implementation >> >> > > of this idea could be a rather huge change. I still wanted to know >> >> > whether >> >> > > that is something you could see being integrated or whether that is >> >> out >> >> > of >> >> > > scope of arrow. >> >> > > >> >> > > Thanks in advance and kind regards, >> >> > > Roman >> >> > > >> >> > >> >> >> > >> >