ARROW-4810 [1] and ARROW-750 [2] discuss adding types with 64-bit offsets
to Lists, Strings and binary data types.

Philipp started an implementation for the large list type [3] and I hacked
together a potentially viable java implementation [4]

I'd like to kickoff the discussion for getting these types voted on.  I'm
coupling them together because I think there are design consideration for
how we evolve Schema.fbs

There are two proposed options:
1.  The current PR proposal which adds a new type LargeList:
  // List with 64-bit offsets
  table LargeList {}

2.  As François suggested, it might cleaner to parameterize List with
offset width.  I suppose something like:

table List {
  // only 32 bit and 64 bit is supported.
  bitWidth: int = 32;
}

I think Option 2 is cleaner and potentially better long-term, but I think
it breaks forward compatibility of the existing arrow libraries.  If we
proceed with Option 2, I would advocate making the change to Schema.fbs all
at once for all types (assuming we think that 64-bit offsets are desirable
for all types) along with future compatibility checks to avoid multiple
releases were future compatibility is broken (by broken I mean the
inability to detect that an implementation is receiving data it can't
read).    What are peoples thoughts on this?

Also, any other concern with adding these types?

Thanks,
Micah

[1] https://issues.apache.org/jira/browse/ARROW-4810
[2] https://issues.apache.org/jira/browse/ARROW-750
[3] https://github.com/apache/arrow/pull/3848
[4]
https://github.com/apache/arrow/commit/03956cac2202139e43404d7a994508080dc2cdd1

Reply via email to