On Thu, 2020-04-30 at 18:47 +0100, Eric Wieser wrote: > > Another larger visible change will be code such as: > > > > np.concatenate([np.array(["string"]), np.array([2])]) > > > > will result in an error instead of returning a string array. (Users > > will have to cast manually here.) > > I wonder if we can lessen the blow by allowing > `np.concatenate([np.array(["string"]), np.array([2])], > casting='unsafe', > dtype=str)` or similar in its place. > It seems a little unfortunate that with this change, we lose the > ability to > concatenate numbers to strings without making intermediate copies. >
I agree we can do that for concatenate and am happy to add just add it. Adding the dtype argument (maybe for now only force-casting is fine?) to `np.concatenate` seems like a reasonable extension of concatenate even without the loss of this potential use-case. - Sebastian > Eric > > > > On Thu, 30 Apr 2020 at 18:32, Sebastian Berg < > sebast...@sipsolutions.net> > wrote: > > > Hi all, > > > > in https://github.com/numpy/numpy/pull/15925 I propose to deprecate > > promotion of strings and numbers. I have to double check whether > > this > > has a large effect on pandas, but it currently seems to me that it > > will > > be reasonable. > > > > This means that `np.promote_types("S", "int8")`, etc. will lead to > > an > > error instead of returning `"S4"`. For the user, I believe the two > > main visible changes are that: > > > > np.array(["string", 0]) > > > > will stop creating a string array and return either an `object` > > array > > or give an error (object array would be the default currently). > > > > Another larger visible change will be code such as: > > > > np.concatenate([np.array(["string"]), np.array([2])]) > > > > will result in an error instead of returning a string array. (Users > > will have to cast manually here.) > > > > The alternative is to return an object array also for the > > concatenate > > example. I somewhat dislike that because `object` is not > > homogeneously > > typed and we thus lose type information. This also affects > > functions > > that wish to cast inputs to a common type (ufuncs also do this > > sometimes). > > A further example of this and discussion is at the end of the mail > > [1]. > > > > > > So the first question is whether we can form an agreement that an > > error > > is the better choice for `concatenate` and `np.promote_types()`. > > I.e. there is no one dtype that can faithfully represent both > > strings > > and integers. (This is currently the case e.g. for datetime64 and > > float64.) > > > > > > The second question is what to do for: > > > > np.array(["string", 0]) > > > > which currently always returns strings. Arguably, it must also > > either > > return an `object` array, or raise an error (requiring the user to > > pick > > string or object using `dtype=object`). > > > > The default would be to create a FutureWarning that an `object` > > array > > will be returned for `np.asarray(["string", 0])` in the future. > > But if we know already that we prefer an error, it would be better > > to > > give a DeprecationWarning right away. (It just does not seem nice > > to > > change the same thing twice even if the workaround is identical.) > > > > Cheers, > > > > Sebastian > > > > > > [1] > > > > A second more in-depth point is that code such as: > > > > common_dtype = np.result_type(arr1, arr2) # or promote_types > > arr1 = arr1.astype(common_dtype, copy=False) > > arr2 = arr2.astype(common_dtype, copy=False) > > > > will currently use `string` in this case while it would error in > > the > > future. This already fails with other type combinations such as > > `datetime64` and `float64` at the moment. > > > > The main alternative to this proposal is to return `object` for the > > common dtype, since an object array is not homogeneously typed, it > > arguably can represent both inputs. I do not quite like this > > choice > > personally because in the above example, it may be that the next > > line > > is something like: > > > > return arr1 * arr2 > > > > in which case, the preferred return may be `str` and not `object`. > > We currently never promote to `object` unless one of the arrays is > > already an `object` array, and that seems like the right choice to > > me. > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion