hi John,
The documentation says
array : pyarrow.Array or pyarrow.ChunkedArray
A ChunkedArray instead of an Array is returned if:
- the object data overflowed binary storage.
- the object's ``__arrow_array__`` protocol method returned a chunked
array.
Overflowing binary storage means exceeding the 2^31 - 1 bytes limit
for BinaryType or StringType/UTF8. We thought this was better than
failing since the output of pyarrow.array is often used to instantiate
a pyarrow.Table which will not argue with the ChunkedArray.
Depending on your input data you might wager a guess whether the
overflow will occur but it will be application-dependent.
- Wes
On Tue, Dec 3, 2019 at 10:51 AM John Muehlhausen wrote:
>
> Given input data and a type, how do we predict whether array() will produce
> ChunkedArray?
>
> I figure the formula involves:
> - the length of input
> - the type, and max length (to be conservative) for variable length types
> - some constant(s) that Arrow knows internally... that may change in the
> future?
>
> Should there be an API to make this easy? Am I missing one that already
> exists?
>
> Thanks,
> John