On Mon, 2021-03-01 at 01:30 +0100, Michal Radwanski wrote:
> Hello,
> 
> I'm not sure if it's expected behaviour or a bug, so I decided to
> write
> here. First an example:
> In [4]: array([2**63]) 
> Out[4]: array([9223372036854775808], dtype=uint64)
> 
> In [5]: array([2**63-1, 2**63]) 
> Out[5]: array([9.22337204e+18, 9.22337204e+18])
> 
> 

Thanks, this is a known issue, e.g.:
https://github.com/numpy/numpy/issues/14883
and https://github.com/numpy/numpy/issues/16287

Currently, my view is that trying to "fix" it so that the result is
truly minimal is probably doomed to introduce unnecessary complexity
and/or will just make the oddities slightly more hard to find.

Instead, my stance is that we should be to refuse to guess anything
beside the "default integer" users pass in integers.  That would
probably mean you get an error that `2**63` cannot be represented by
`int64` forcing you to be explicit about the dtype you expect.
(In the long run, it might also return an `object` array. [1]) 


With regards to the documentation... `np.array` promotes inputs as they
come in (depth first currently). I.e. in a "left-to-right" fashion.
That basically means, that you are right and "minimal" will not always
be true, due to our promotion rules.
But the bigger confusion is that Python Integers are mapped to NumPy
dtypes by finding the first one in the following list which can
represent the value:

  * C long: int64 on 64bit linux/mac, otherwise (all windows!) int32
  * C long long: int64 on all relevant platforms AFAIK
  * C unsigned long long: uint64 on all relevant platforms AFAIK
  * object

Which is an attempt at "minimal" of course.  If we have an idea how to
capture especially this integer behaviour in the docs, that may be a
good idea.  (The way the promotion is done also breaks the "minimal"
claim, but that is much more subtle.)

Cheers,

Sebastian


[1] However, before that happens, we may also consider an API where you
have to explicitly allow the `np.array` call to fall back to `object`
in cases where promotion fails – including this case. I.e. with
something like:

    np.array(..., dtype="allow-object-fallback")  # of course shorter

(I can't find the issue about it right now, there is at least one where
this was discussed.)


> The docs for `numpy.array` mention, that:
> 
> dtype : data-type, optional
>  The desired data-type for the array. If not given, then the type 
>  will be determined as the minimum type required to hold 
>  the objects in the sequence.
> 
> I understand the type promotions here, but I believe that the
> documentation is wrong in this case. Indeed, the minumum type in the
> latter case would be 'uint64'.
> 
> Is it a bug worth submitting/fixing?
> 
> 

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to