Hi all,
We are thinking of providing varchar/varbinary vectors with a different memory layout which exists in a wide range of systems. The memory layout is different from that of VarCharVector in the following ways: 1. Instead of storing (start offset, end offset), the new layout stores (start offset, length) 2. The content of varchars may not be in a consecutive memory region. Instead, it can be in arbitrary memory address. Due to these differences in memory layout, it incurs performance overhead when converting data between existing systems and VarCharVectors. The above difference 1 seems insignificant, while difference 2 is difficult to overcome. However, the scenario of difference 2 is prevalent in practice: for example we store strings in a series of memory segments. Whenever a segment is full, we request a new one. However, these memory segments may not be consecutive, because other processes/threads are also requesting/releasing memory segments in the meantime. So we are wondering if it is possible to support such memory layout in Arrow. I think there are more systems that are trying to adopting Arrow, but are hindered by such difficulty. Would you please give your valuable feedback? Best, Liya Fan