Hello, Glad to learn of this project— good work!
If I allocate a single chunk of memory and start building Arrow format within it, does this chunk save any state regarding my progress? For example, suppose I allocate a column for floating point (fixed width) and a column for string (variable width). Suppose I start building the floating point column at offset X into my single buffer, and the string “pointer” column at offset Y into the same single buffer, and the string data elements at offset Z. I write one floating point number and one string, then go away. When I come back to this buffer to append another value, does the buffer itself know where I would begin? I.e. is there a differentiation in the column (or blob) data itself between the available space and the used space? Suppose I write a lot of large variable width strings and “run out” of space for them before running out of space for floating point numbers or string pointers. (I guessed badly when doing the original allocation.). I consider this to be Ok since I can always “copy” the data to “compress out” the unused fp/pointer buckets... the choice is up to me. The above applied to a (feather?) file is how I anticipate appending data to disk... pre-allocate a mem-mapped file and gradually fill it up. The efficiency of file utilization will depend on my projections regarding variable-width data types, but as I said above, I can always re-write the file if/when this bothers me. Is this the recommended and supported approach for incremental appends? I’m really hoping to use Arrow instead of rolling my own, but functionality like this is absolutely key! Hoping not to use a side-car file (or memory chunk) to store “append progress” information. I am brand new to this project so please forgive me if I have overlooked something obvious. And again, looks like great work so far! Thanks! -John