In vaex I always write the data to hdf5 as 1 large chunk (per column). The reason is that it allows the mmapped columns to be exposed as a single numpy array (talking numerical data only for now), which many people are quite comfortable with.
The strategy for vaex to write unchunked data, is to first create an 'empty' hdf5 file (filled with zeros), mmap those huge arrays, and write to that in chunks. This means that in vaex I need to support mutable data (only used internally, vaex' default is immutable data like arrow), since I need to write to the memory mapped data. It also makes the exporting code relatively simple. I could not find a way in Arrow to get something similar done, at least not without having a single pa.array instance for each column. I think Arrow's mindset is that you should just use chunks right? Or is this also something that can be considered for Arrow? An alternative would be to implement Arrow in hdf5, which I basically do now in vaex (with limited support). Again, I'm wondering if there is there an interest in storing arrow data in hdf5 from the Arrow community? cheers, Maarten