On Thu, 2020-04-30 at 18:47 +0100, Eric Wieser wrote:
> > Another larger visible change will be code such as:
> > 
> >     np.concatenate([np.array(["string"]), np.array([2])])
> > 
> > will result in an error instead of returning a string array. (Users
> > will have to cast manually here.)
> 
> I wonder if we can lessen the blow by allowing
> `np.concatenate([np.array(["string"]), np.array([2])],
> casting='unsafe',
> dtype=str)` or similar in its place.
> It seems a little unfortunate that with this change, we lose the
> ability to
> concatenate numbers to strings without making intermediate copies.
> 

I agree we can do that for concatenate and am happy to add just add it.
Adding the dtype argument (maybe for now only force-casting is fine?)
to `np.concatenate` seems like a reasonable extension of concatenate
even without the loss of this potential use-case.

- Sebastian


> Eric
> 
> 
> 
> On Thu, 30 Apr 2020 at 18:32, Sebastian Berg <
> sebast...@sipsolutions.net>
> wrote:
> 
> > Hi all,
> > 
> > in https://github.com/numpy/numpy/pull/15925 I propose to deprecate
> > promotion of strings and numbers. I have to double check whether
> > this
> > has a large effect on pandas, but it currently seems to me that it
> > will
> > be reasonable.
> > 
> > This means that `np.promote_types("S", "int8")`, etc. will lead to
> > an
> > error instead of returning `"S4"`.  For the user, I believe the two
> > main visible changes are that:
> > 
> >     np.array(["string", 0])
> > 
> > will stop creating a string array and return either an `object`
> > array
> > or give an error (object array would be the default currently).
> > 
> > Another larger visible change will be code such as:
> > 
> >     np.concatenate([np.array(["string"]), np.array([2])])
> > 
> > will result in an error instead of returning a string array. (Users
> > will have to cast manually here.)
> > 
> > The alternative is to return an object array also for the
> > concatenate
> > example.  I somewhat dislike that because `object` is not
> > homogeneously
> > typed and we thus lose type information.  This also affects
> > functions
> > that wish to cast inputs to a common type (ufuncs also do this
> > sometimes).
> > A further example of this and discussion is at the end of the mail
> > [1].
> > 
> > 
> > So the first question is whether we can form an agreement that an
> > error
> > is the better choice for `concatenate` and `np.promote_types()`.
> > I.e. there is no one dtype that can faithfully represent both
> > strings
> > and integers. (This is currently the case e.g. for datetime64 and
> > float64.)
> > 
> > 
> > The second question is what to do for:
> > 
> >     np.array(["string", 0])
> > 
> > which currently always returns strings.  Arguably, it must also
> > either
> > return an `object` array, or raise an error (requiring the user to
> > pick
> > string or object using `dtype=object`).
> > 
> > The default would be to create a FutureWarning that an `object`
> > array
> > will be returned for `np.asarray(["string", 0])` in the future.
> > But if we know already that we prefer an error, it would be better
> > to
> > give a DeprecationWarning right away. (It just does not seem nice
> > to
> > change the same thing twice even if the workaround is identical.)
> > 
> > Cheers,
> > 
> > Sebastian
> > 
> > 
> > [1]
> > 
> > A second more in-depth point is that code such as:
> > 
> >     common_dtype = np.result_type(arr1, arr2)  # or promote_types
> >     arr1 = arr1.astype(common_dtype, copy=False)
> >     arr2 = arr2.astype(common_dtype, copy=False)
> > 
> > will currently use `string` in this case while it would error in
> > the
> > future. This already fails with other type combinations such as
> > `datetime64` and `float64` at the moment.
> > 
> > The main alternative to this proposal is to return `object` for the
> > common dtype, since an object array is not homogeneously typed, it
> > arguably can represent both inputs.  I do not quite like this
> > choice
> > personally because in the above example, it may be that the next
> > line
> > is something like:
> > 
> >     return arr1 * arr2
> > 
> > in which case, the preferred return may be `str` and not `object`.
> > We currently never promote to `object` unless one of the arrays is
> > already an `object` array, and that seems like the right choice to
> > me.
> > 
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to