Re: Stored state of incremental writes to fixed size Arrow buffer?

Jacques Nadeau Mon, 06 May 2019 06:18:56 -0700

This is more of a question of implementation versus specification. An arrow
buffer is generally built and then sealed. In different languages, this
building process works differently (a concern of the language rather than
the memory specification). We don't currently allow a half built vector to
be moved to another language and then be further built. So the question is
really more concrete: what language are you looking at and what is the
specific pattern you're trying to undertake for building.


If you're trying to go across independent processes (whether the same
process restarted or two separate processes active simultaneously) you'll
need to build up your own data structures to help with this.

On Mon, May 6, 2019 at 6:28 PM John Muehlhausen <j...@jgm.org> wrote:

> Hello,
>
> Glad to learn of this project— good work!
>
> If I allocate a single chunk of memory and start building Arrow format
> within it, does this chunk save any state regarding my progress?
>
> For example, suppose I allocate a column for floating point (fixed width)
> and a column for string (variable width).  Suppose I start building the
> floating point column at offset X into my single buffer, and the string
> “pointer” column at offset Y into the same single buffer, and the string
> data elements at offset Z.
>
> I write one floating point number and one string, then go away.  When I
> come back to this buffer to append another value, does the buffer itself
> know where I would begin?  I.e. is there a differentiation in the column
> (or blob) data itself between the available space and the used space?
>
> Suppose I write a lot of large variable width strings and “run out” of
> space for them before running out of space for floating point numbers or
> string pointers.  (I guessed badly when doing the original allocation.). I
> consider this to be Ok since I can always “copy” the data to “compress out”
> the unused fp/pointer buckets... the choice is up to me.
>
> The above applied to a (feather?) file is how I anticipate appending data
> to disk... pre-allocate a mem-mapped file and gradually fill it up.  The
> efficiency of file utilization will depend on my projections regarding
> variable-width data types, but as I said above, I can always re-write the
> file if/when this bothers me.
>
> Is this the recommended and supported approach for incremental appends?
> I’m really hoping to use Arrow instead of rolling my own, but functionality
> like this is absolutely key!  Hoping not to use a side-car file (or memory
> chunk) to store “append progress” information.
>
> I am brand new to this project so please forgive me if I have overlooked
> something obvious.  And again, looks like great work so far!
>
> Thanks!
> -John
>

Re: Stored state of incremental writes to fixed size Arrow buffer?

Reply via email to