Re: [Numpy-discussion] Regression: in-place operations (possibly intentional)

Ralf Gommers Tue, 18 Sep 2012 14:04:28 -0700

On Tue, Sep 18, 2012 at 10:52 PM, Benjamin Root <ben.r...@ou.edu> wrote:


>
>
> On Tue, Sep 18, 2012 at 4:42 PM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>>
>>
>> On Tue, Sep 18, 2012 at 2:33 PM, Travis Oliphant <tra...@continuum.io>wrote:
>>
>>>
>>> On Sep 18, 2012, at 2:44 PM, Charles R Harris wrote:
>>>
>>>
>>>
>>> On Tue, Sep 18, 2012 at 1:35 PM, Benjamin Root <ben.r...@ou.edu> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Sep 18, 2012 at 3:25 PM, Charles R Harris <
>>>> charlesr.har...@gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Sep 18, 2012 at 1:13 PM, Benjamin Root <ben.r...@ou.edu>wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 18, 2012 at 2:47 PM, Charles R Harris <
>>>>>> charlesr.har...@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Sep 18, 2012 at 11:39 AM, Benjamin Root <ben.r...@ou.edu>wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Sep 17, 2012 at 9:33 PM, Charles R Harris <
>>>>>>>> charlesr.har...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Sep 17, 2012 at 3:40 PM, Travis Oliphant <
>>>>>>>>> tra...@continuum.io> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sep 17, 2012, at 8:42 AM, Benjamin Root wrote:
>>>>>>>>>>
>>>>>>>>>> > Consider the following code:
>>>>>>>>>> >
>>>>>>>>>> > import numpy as np
>>>>>>>>>> > a = np.array([1, 2, 3, 4, 5], dtype=np.int16)
>>>>>>>>>> > a *= float(255) / 15
>>>>>>>>>> >
>>>>>>>>>> > In v1.6.x, this yields:
>>>>>>>>>> > array([17, 34, 51, 68, 85], dtype=int16)
>>>>>>>>>> >
>>>>>>>>>> > But in master, this throws an exception about failing to cast
>>>>>>>>>> via same_kind.
>>>>>>>>>> >
>>>>>>>>>> > Note that numpy was smart about this operation before, consider:
>>>>>>>>>> > a = np.array([1, 2, 3, 4, 5], dtype=np.int16)
>>>>>>>>>> > a *= float(128) / 256
>>>>>>>>>>
>>>>>>>>>> > yields:
>>>>>>>>>> > array([0, 1, 1, 2, 2], dtype=int16)
>>>>>>>>>> >
>>>>>>>>>> > Of course, this is different than if one does it in a
>>>>>>>>>> non-in-place manner:
>>>>>>>>>> > np.array([1, 2, 3, 4, 5], dtype=np.int16) * 0.5
>>>>>>>>>> >
>>>>>>>>>> > which yields an array with floating point dtype in both
>>>>>>>>>> versions.  I can appreciate the arguments for preventing this kind of
>>>>>>>>>> implicit casting between non-same_kind dtypes, but I argue that 
>>>>>>>>>> because the
>>>>>>>>>> operation is in-place, then I (as the programmer) am explicitly 
>>>>>>>>>> stating
>>>>>>>>>> that I desire to utilize the current array to store the results of 
>>>>>>>>>> the
>>>>>>>>>> operation, dtype and all.  Obviously, we can't completely turn off 
>>>>>>>>>> this
>>>>>>>>>> rule (for example, an in-place addition between integer array and a
>>>>>>>>>> datetime64 makes no sense), but surely there is some sort of happy 
>>>>>>>>>> medium
>>>>>>>>>> that would allow these sort of operations to take place?
>>>>>>>>>> >
>>>>>>>>>> > Lastly, if it is determined that it is desirable to allow
>>>>>>>>>> in-place operations to continue working like they have before, I 
>>>>>>>>>> would like
>>>>>>>>>> to see such a fix in v1.7 because if it isn't in 1.7, then other 
>>>>>>>>>> libraries
>>>>>>>>>> (such as matplotlib, where this issue was first found) would have to 
>>>>>>>>>> change
>>>>>>>>>> their code anyway just to be compatible with numpy.
>>>>>>>>>>
>>>>>>>>>> I agree that in-place operations should allow different casting
>>>>>>>>>> rules.  There are different opinions on this, of course, but 
>>>>>>>>>> generally this
>>>>>>>>>> is how NumPy has worked in the past.
>>>>>>>>>>
>>>>>>>>>> We did decide to change the default casting rule to "same_kind"
>>>>>>>>>> but making an exception for in-place seems reasonable.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think that in these cases same_kind will flag what are most
>>>>>>>>> likely programming errors and sloppy code. It is easy to be explicit 
>>>>>>>>> and
>>>>>>>>> doing so will make the code more readable because it will be 
>>>>>>>>> immediately
>>>>>>>>> obvious what the multiplicand is without the need to recall what the 
>>>>>>>>> numpy
>>>>>>>>> casting rules are in this exceptional case. IISTR several mentions of 
>>>>>>>>> this
>>>>>>>>> before (Gael?), and in some of those cases it turned out that bugs 
>>>>>>>>> were
>>>>>>>>> being turned up. Catching bugs with minimal effort is a good thing.
>>>>>>>>>
>>>>>>>>> Chuck
>>>>>>>>>
>>>>>>>>>
>>>>>>>> True, it is quite likely to be a programming error, but then again,
>>>>>>>> there are many cases where it isn't.  Is the problem strictly that we 
>>>>>>>> are
>>>>>>>> trying to downcast the float to an int, or is it that we are trying to
>>>>>>>> downcast to a lower precision?  Is there a way for one to explicitly 
>>>>>>>> relax
>>>>>>>> the same_kind restriction?
>>>>>>>>
>>>>>>>
>>>>>>> I think the problem is down casting across kinds, with the result
>>>>>>> that floats are truncated and the imaginary parts of imaginaries might 
>>>>>>> be
>>>>>>> discarded. That is, the value, not just the precision, of the rhs 
>>>>>>> changes.
>>>>>>> So I'd favor an explicit cast in code like this, i.e., cast the rhs to 
>>>>>>> an
>>>>>>> integer.
>>>>>>>
>>>>>>> It is true that this forces downstream to code up to a higher
>>>>>>> standard, but I don't see that as a bad thing, especially if it exposes
>>>>>>> bugs. And it isn't difficult to fix.
>>>>>>>
>>>>>>> Chuck
>>>>>>>
>>>>>>>
>>>>>> Mind you, in my case, casting the rhs as an integer before doing the
>>>>>> multiplication would be a bug, since our value for the rhs is usually
>>>>>> between zero and one.  Multiplying first by the integer numerator before
>>>>>> dividing by the integer denominator would likely cause issues with
>>>>>> overflowing the 16 bit integer.
>>>>>>
>>>>>>
>>>>> For the case in point I'd do
>>>>>
>>>>> In [1]: a = np.array([1, 2, 3, 4, 5], dtype=np.int16)
>>>>>
>>>>> In [2]: a //= 2
>>>>>
>>>>> In [3]: a
>>>>> Out[3]: array([0, 1, 1, 2, 2], dtype=int16)
>>>>>
>>>>> Although I expect you would want something different in practice. But
>>>>> the current code already looks fragile to me and I think it is a good 
>>>>> thing
>>>>> you are taking a closer look at it. If you really intend going through a
>>>>> float, then it should be something like
>>>>>
>>>>> a = (a*(float(128)/256)).astype(int16)
>>>>>
>>>>> Chuck
>>>>>
>>>>>
>>>> And thereby losing the memory benefit of an in-place multiplication?
>>>>
>>>
>>> What makes you think you are getting that? I'd have to check the numpy
>>> C source, but I expect the multiplication is handled just as I wrote it
>>> out. I don't recall any loops that handle mixed types likes that. I'd like
>>> to see some, though, scaling integers is a common problem.
>>>
>>>
>>>
>>>
>>>> That is sort of the point of all this.  We are using 16 bit integers
>>>> because we wanted to be as efficient as possible and didn't need anything
>>>> larger.  Note, that is what we changed the code to, I am just wondering if
>>>> we are being too cautious.  The casting kwarg looks to be what I might
>>>> want, though it isn't as clean as just writing an "*=" statement.
>>>>
>>>>
>>> I think even there you will have an intermediate float array followed by
>>> a cast.
>>>
>>>
>>> This is true, but it is done in chunks of a fixed size (controllable by
>>> a thread-local variable or keyword argument to the ufunc).
>>>
>>> How difficult would it be to change in-place operations back to the
>>> "unsafe" default?
>>>
>>
>> Probably not too difficult, but I think it would be a mistake. What
>> keyword argument are you referring to? In the current case, I think what is
>> wanted is a scaling function that will actually do things in place. The
>> matplotlib folks would probably be happier with the result if they simply
>> coded up a couple of small Cython routines to do that.
>>
>> Chuck
>>
>>
> As far as matplotlib is concerned, the problem was solved when we reverted
> a change.  The issue that I am raising is that it was such an innocuous,
> and frankly, obvious change to do an in-place operation in the first
> place.  I have to wonder if we are being overly cautious with "same_kind".
> You are right, we probably would benefit greatly from creating some CXX
> scaling functions (contrary to popular belief, we don't use Cython),
> however, I would imagine that such general-purpose function would fare
> better within NumPy.  But, ultimately, Python is about there being one
> right way of doing something, and so I think the goal should be to have a
> somewhat more restrictive casting rule than "unsafe" for in-place
> operations, but restrictive enough to catch the sort of errors "same_kind"
> was catching.
>

That sentence doesn't parse. ("more restrictive" & "restrictive enough") ==
"same_kind" ?

Ralf

> This way, I have one way of doing an inplace operation, regardless of the
> types of my operands.
>

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Regression: in-place operations (possibly intentional)

Reply via email to