predict whether pa.array() will produce ChunkedArray

2019-12-03 Thread John Muehlhausen
Given input data and a type, how do we predict whether array() will produce
ChunkedArray?

I figure the formula involves:
- the length of input
- the type, and max length (to be conservative) for variable length types
- some constant(s) that Arrow knows internally... that may change in the
future?

Should there be an API to make this easy?  Am I missing one that already
exists?

Thanks,
John


Re: predict whether pa.array() will produce ChunkedArray

2019-12-03 Thread Wes McKinney
hi John,

The documentation says

array : pyarrow.Array or pyarrow.ChunkedArray
A ChunkedArray instead of an Array is returned if:

- the object data overflowed binary storage.
- the object's ``__arrow_array__`` protocol method returned a chunked
  array.

Overflowing binary storage means exceeding the 2^31 - 1 bytes limit
for BinaryType or StringType/UTF8. We thought this was better than
failing since the output of pyarrow.array is often used to instantiate
a pyarrow.Table which will not argue with the ChunkedArray.

Depending on your input data you might wager a guess whether the
overflow will occur but it will be application-dependent.

- Wes

On Tue, Dec 3, 2019 at 10:51 AM John Muehlhausen  wrote:
>
> Given input data and a type, how do we predict whether array() will produce
> ChunkedArray?
>
> I figure the formula involves:
> - the length of input
> - the type, and max length (to be conservative) for variable length types
> - some constant(s) that Arrow knows internally... that may change in the
> future?
>
> Should there be an API to make this easy?  Am I missing one that already
> exists?
>
> Thanks,
> John