> Another larger visible change will be code such as: > > np.concatenate([np.array(["string"]), np.array([2])]) > > will result in an error instead of returning a string array. (Users > will have to cast manually here.)
I wonder if we can lessen the blow by allowing `np.concatenate([np.array(["string"]), np.array([2])], casting='unsafe', dtype=str)` or similar in its place. It seems a little unfortunate that with this change, we lose the ability to concatenate numbers to strings without making intermediate copies. Eric On Thu, 30 Apr 2020 at 18:32, Sebastian Berg <sebast...@sipsolutions.net> wrote: > Hi all, > > in https://github.com/numpy/numpy/pull/15925 I propose to deprecate > promotion of strings and numbers. I have to double check whether this > has a large effect on pandas, but it currently seems to me that it will > be reasonable. > > This means that `np.promote_types("S", "int8")`, etc. will lead to an > error instead of returning `"S4"`. For the user, I believe the two > main visible changes are that: > > np.array(["string", 0]) > > will stop creating a string array and return either an `object` array > or give an error (object array would be the default currently). > > Another larger visible change will be code such as: > > np.concatenate([np.array(["string"]), np.array([2])]) > > will result in an error instead of returning a string array. (Users > will have to cast manually here.) > > The alternative is to return an object array also for the concatenate > example. I somewhat dislike that because `object` is not homogeneously > typed and we thus lose type information. This also affects functions > that wish to cast inputs to a common type (ufuncs also do this > sometimes). > A further example of this and discussion is at the end of the mail [1]. > > > So the first question is whether we can form an agreement that an error > is the better choice for `concatenate` and `np.promote_types()`. > I.e. there is no one dtype that can faithfully represent both strings > and integers. (This is currently the case e.g. for datetime64 and > float64.) > > > The second question is what to do for: > > np.array(["string", 0]) > > which currently always returns strings. Arguably, it must also either > return an `object` array, or raise an error (requiring the user to pick > string or object using `dtype=object`). > > The default would be to create a FutureWarning that an `object` array > will be returned for `np.asarray(["string", 0])` in the future. > But if we know already that we prefer an error, it would be better to > give a DeprecationWarning right away. (It just does not seem nice to > change the same thing twice even if the workaround is identical.) > > Cheers, > > Sebastian > > > [1] > > A second more in-depth point is that code such as: > > common_dtype = np.result_type(arr1, arr2) # or promote_types > arr1 = arr1.astype(common_dtype, copy=False) > arr2 = arr2.astype(common_dtype, copy=False) > > will currently use `string` in this case while it would error in the > future. This already fails with other type combinations such as > `datetime64` and `float64` at the moment. > > The main alternative to this proposal is to return `object` for the > common dtype, since an object array is not homogeneously typed, it > arguably can represent both inputs. I do not quite like this choice > personally because in the above example, it may be that the next line > is something like: > > return arr1 * arr2 > > in which case, the preferred return may be `str` and not `object`. > We currently never promote to `object` unless one of the arrays is > already an `object` array, and that seems like the right choice to me. > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion