I think the current behavior of `from` functions on IntVector and
FloatVector can be quite confusing for new arrow users. The current
behavior can be summarized as:
- if the argument is any type of TypedArray (including one of a mismatched
type), create a new vector backed by that array's buffer.
- otherwise, treat it as an iterable of numbers, and convert them as needed
- ... unless we're making an Int64Vector, then treat each input as a 32-bit
number and pack pairs together

This can give users very unexpected results. For example, you might expect
arrow.Int32Vector.from(Float32Array.from([1.0,2.0,3.0])) to yield a vector
with the values [1,2,3] - but it doesn't, it gives you the integers that
result from re-interpreting that buffer of floating point numbers as
integers.

I put together a notebook with some more examples of this confusing
behavior, compared to TypedArray.from:
https://observablehq.com/d/6aa80e43b5a97361

I'd like to propose that we re-write these from functions with the
following behavior:
- iff the argument is an ArrayBuffer or a TypedArray of the same numeric
type, create a new vector backed by that array's buffer.
- otherwise, treat is as an iterable of numbers and convert to the
appropriate type.
- no exceptions for Int64

If users really want to preserve the current behavior and use a
TypedArray's memory directly without converting, even when the types are
mismatched, they can still just access the underlying ArrayBuffer and pass
that in. So arrow.Int32Vector.from(Float32Array.from([1.0,2.0,3.0])) would
yield a vector with [1,2,3], but you could still use
arrow.Int32Vector.from(Float32Array.from([1.0,2.0,3.0]).buffer) to
replicate the current behavior.

Removing the special case for Int64 does make it a little easier to shoot
yourself in the foot by exceeding JS numbers' 53-bit precision, so maybe we
should mitigate that somehow, but I don't think combining pairs of numbers
is the right way to do that. Maybe a warning?

What do you all think? If there's consensus on this I'd like to make the
change prior to 0.14 to minimize the number of releases with the current
behavior.

Brian

Reply via email to