Le 11/04/2019 à 10:52, Micah Kornfield a écrit : > ARROW-4810 [1] and ARROW-750 [2] discuss adding types with 64-bit offsets > to Lists, Strings and binary data types. > > Philipp started an implementation for the large list type [3] and I hacked > together a potentially viable java implementation [4] > > I'd like to kickoff the discussion for getting these types voted on. I'm > coupling them together because I think there are design consideration for > how we evolve Schema.fbs > > There are two proposed options: > 1. The current PR proposal which adds a new type LargeList: > // List with 64-bit offsets > table LargeList {} > > 2. As François suggested, it might cleaner to parameterize List with > offset width. I suppose something like: > > table List { > // only 32 bit and 64 bit is supported. > bitWidth: int = 32; > } > > I think Option 2 is cleaner and potentially better long-term, but I think > it breaks forward compatibility of the existing arrow libraries. If we > proceed with Option 2, I would advocate making the change to Schema.fbs all > at once for all types (assuming we think that 64-bit offsets are desirable > for all types) along with future compatibility checks to avoid multiple > releases were future compatibility is broken (by broken I mean the > inability to detect that an implementation is receiving data it can't > read). What are peoples thoughts on this?
I think Option 1 is ok. Making List / String / Binary parameterizable doesn't bring anything *concretely*, since the types will not be physically interchangeable. The cost of breaking compatibility should be offset by a compelling benefit, which doesn't seem to exist here. Of course, implementations are free to refactor their internals to avoid code duplication (for example the C++ ListBuilder and LargeListBuilder classes could be instances of a BaseListBuilder<IndexType> generic type)... Regards Antoine.