Hi all,

We are thinking of providing varchar/varbinary vectors with a different
memory layout which exists in a wide range of systems. The memory layout is
different from that of VarCharVector in the following ways:


   1.

   Instead of storing (start offset, end offset), the new layout stores
   (start offset, length)
   2.

   The content of varchars may not be in a consecutive memory region.
   Instead, it can be in arbitrary memory address.


Due to these differences in memory layout, it incurs performance overhead
when converting data between existing systems and VarCharVectors.

The above difference 1 seems insignificant, while difference 2 is difficult
to overcome. However, the scenario of difference 2 is prevalent in
practice: for example we store strings in a series of memory segments.
Whenever a segment is full, we request a new one. However, these memory
segments may not be consecutive, because other processes/threads are also
requesting/releasing memory segments in the meantime.

So we are wondering if it is possible to support such memory layout in
Arrow. I think there are more systems that are trying to adopting Arrow,
but are hindered by such difficulty.

Would you please give your valuable feedback?


Best,

Liya Fan

Reply via email to