hi Antoine,
On Sun, Jun 24, 2018 at 1:06 PM, Antoine Pitrou <[email protected]> wrote:
>
> Hi Wes,
>
> Le 24/06/2018 à 08:24, Wes McKinney a écrit :
>>
>> If this sounds interesting to the community, I could help to kickstart
>> a design process which would likely take a significant amount of time.
>> The requirements could be complex (i.e. we might want to support
>> variable-size record fields while also providing random access
>> guarantees).
>
> What do you call "variable-sized" here? A scheme where the length of a
> record's field is determined by the value of another field in the same
> record?
As an example, here is a fixed size record
record foo {
a: int32;
b: float64;
c: uint8;
}
With padding suppose this is 16 bytes per record; so if we have a
column of these, then random accessing any value in any record is
simple.
Here's a variable-length record:
record bar {
a: string;
b: list<int32>;
}
What I've seen done to represent this in memory is to have a fixed
size record followed by a sidecar containing the variable-length data,
so the fixed size portion might look something like
a_offset: int32;
a_length: int32;
b_offset: int32;
b_length: int32;
So from this, you can do random access into the record. If you wanted
to do random access on a _column_ of such records, it is similar to
our current variable-length Binary type. So it might be that the
underlying Arrow memory layout would be FixedSizeBinary for fixed-size
records and variable Binary for variable-size records.
- Wes
>
>
>
> Regards
>
> Antoine.