On Mon, Feb 13, 2012 at 8:04 PM, Travis Oliphant <tra...@continuum.io>wrote:
> I disagree with your assessment of the subscript operator, but I'm sure we > will have plenty of time to discuss that. I don't think it's correct to > compare the corner cases of the fancy indexing and regular indexing to the > corner cases of type coercion system. If you recall, I was quite nervous > about all the changes you made to the coercion rules because I didn't > believe you fully understood what had been done before and I knew there was > not complete test coverage. > > It is true that both systems have emerged from a long history and could > definitely use fresh perspectives which we all appreciate you and others > bringing. It is also true that few are aware of the details of how things > are actually implemented and that there are corner cases that are basically > defined by the algorithm used (this is more true of the type-coercion > system than fancy-indexing, however). > Likely the only way we will be able to know for certain the extent to which our opinions are accurate is to actually dig into the code. I think we can agree, however, that at the very least it could use some performance improvement. :) > I think it would have been wise to write those extensive tests prior to > writing new code. I'm curious if what you were expecting for the output > was derived from what earlier versions of NumPy produced. NumPy has > never been in a state where you could just re-factor at will and assume > that tests will catch all intended use cases. Numeric before it was not > in that state either. This is a good goal, and we always welcome new > tests. It just takes a lot of time and a lot of tedious work that the > volunteer labor to this point have not had the time to do. > I did put quite a bit of effort into maintaining compatibility, and was incredibly careful about the change we're discussing. I used something I suspect you created, the "can cast safely" table here: http://docs.scipy.org/doc/numpy/reference/ufuncs.html#casting-rules I extended it to more cases including scalar/array combinations of type promotion, and validated that 1.5 and 1.6 produced the same outputs. The script I used is here: https://github.com/numpy/numpy/blob/master/numpy/testing/print_coercion_tables.py I definitely didn't jump into the change blind, but I did approach it from a clean perspective with the willingness to try and make things better. I understand this is a delicate balance to walk, and I'd like to stress that I didn't take any of the changes I made here lightly. Very few of us have ever been paid to work on NumPy directly and have often > been trying to fit in improvements to the code base between other jobs we > are supposed to be doing. Of course, you and I are hoping to change that > this year and look forward to the code quality improving commensurately. > Well, everything I did for 1.6 that we're discussing here was volunteer work too. :) You and Enthought have all the credit for the later bit where I did get paid a little bit to do the datetime64 and NA stuff! Thanks for all you are doing. I also agree that Rolf and Charles > have-been and are invaluable in the maintenance and progress of NumPy and > SciPy. They deserve as much praise and kudos as anyone can give them. > It's great to have you back and active in the community again too. I'm sure this is improving the moods of many NumPy and SciPy users. -Mark > > -Travis > > > > On Feb 13, 2012, at 9:40 PM, Mark Wiebe wrote: > > I believe the main lessons to draw from this are just how incredibly > important a complete test suite and staying on top of code reviews are. I'm > of the opinion that any explicit design choice of this nature should be > reflected in the test suite, so that if someone changes it years later, > they get immediate feedback that they're breaking something important. > NumPy has gradually increased its test suite coverage, and when I dealt > with the type promotion subsystem, I added fairly extensive tests: > > > https://github.com/numpy/numpy/blob/master/numpy/core/tests/test_numeric.py#L345 > > Another subsystem which is in a similar state as what the type promotion > subsystem was, is the subscript operator and how regular/fancy indexing > work. What this means is that any attempt to improve it that doesn't > coincide with the original intent years ago can easily break things that > were originally intended without them being caught by a test. I believe > this subsystem needs improvement, and the transition to new/improved code > will probably be trickier to manage than for the dtype promotion case. > > Let's try to learn from the type promotion case as best we can, and use it > to improve NumPy's process. I believe Charles and Ralph have been doing a > great job of enforcing high standards in new NumPy code, and managing the > release process in a way that has resulted in very few bugs and regressions > in the release. Most of these quality standards are still informal, > however, and it's probably a good idea to write them down in a canonical > location. It will be especially helpful for newcomers, who can treat the > standards as a checklist before submitting pull requests. > > Thanks, > -Mark > > On Mon, Feb 13, 2012 at 7:11 PM, Travis Oliphant <tra...@continuum.io>wrote: > >> The problem is that these sorts of things take a while to emerge. The >> original system was more consistent than I think you give it credit. What >> you are seeing is that most people get NumPy from distributions and are >> relying on us to keep things consistent. >> >> The scalar coercion rules were deterministic and based on the idea that a >> scalar does not determine the output dtype unless it is of a different >> kind. The new code changes that unfortunately. >> >> Another thing I noticed is that I thought that int16 <op> scalar float >> would produce float32 originally. This seems to have changed, but I need >> to check on an older version of NumPy. >> >> Changing the scalar coercion rules is an unfortunate substantial change >> in semantics and should not have happened in the 1.X series. >> >> I understand you did not get a lot of feedback and spent a lot of time on >> the code which we all appreciate. I worked to stay true to the Numeric >> casting rules incorporating the changes to prevent scalar upcasting due to >> the absence of single precision Numeric literals in Python. >> >> We will need to look in detail at what has changed. I will write a test >> to do that. >> >> Thanks, >> >> Travis >> >> -- >> Travis Oliphant >> (on a mobile) >> 512-826-7480 >> >> >> On Feb 13, 2012, at 7:58 PM, Mark Wiebe <mwwi...@gmail.com> wrote: >> >> On Mon, Feb 13, 2012 at 5:00 PM, Travis Oliphant <tra...@continuum.io>wrote: >> >>> Hmmm. This seems like a regression. The scalar casting API was fairly >>> intentional. >>> >>> What is the reason for the change? >>> >> >> In order to make 1.6 ABI-compatible with 1.5, I basically had to rewrite >> this subsystem. There were virtually no tests in the test suite specifying >> what the expected behavior should be, and there were clear inconsistencies >> where for example "a+b" could result in a different type than "b+a". I >> recall there being some bugs in the tracker related to this as well, but I >> don't remember those details. >> >> This change felt like an obvious extension of an existing behavior for >> eliminating overflow, where the promotion changed unsigned -> signed based >> on the value of the scalar. This change introduced minimal upcasting only >> in a set of cases where an overflow was guaranteed to happen without that >> upcasting. >> >> During the 1.6 beta period, I signaled that this subsystem had changed, >> as the bullet point starting "The ufunc uses a more consistent algorithm >> for loop selection.": >> >> http://mail.scipy.org/pipermail/numpy-discussion/2011-March/055156.html >> >> The behavior Matthew has observed is a direct result of how I designed >> the minimization function mentioned in that bullet point, and the algorithm >> for it is documented in the 'Notes' section of the result_type page: >> >> http://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html >> >> Hopefully that explains it well enough. I made the change intentionally >> and carefully, tested its impact on SciPy and other projects, and advocated >> for it during the release cycle. >> >> Cheers, >> Mark >> >> -- >>> Travis Oliphant >>> (on a mobile) >>> 512-826-7480 >>> >>> >>> On Feb 13, 2012, at 6:25 PM, Matthew Brett <matthew.br...@gmail.com> >>> wrote: >>> >>> > Hi, >>> > >>> > I recently noticed a change in the upcasting rules in numpy 1.6.0 / >>> > 1.6.1 and I just wanted to check it was intentional. >>> > >>> > For all versions of numpy I've tested, we have: >>> > >>> >>>> import numpy as np >>> >>>> Adata = np.array([127], dtype=np.int8) >>> >>>> Bdata = np.int16(127) >>> >>>> (Adata + Bdata).dtype >>> > dtype('int8') >>> > >>> > That is - adding an integer scalar of a larger dtype does not result >>> > in upcasting of the output dtype, if the data in the scalar type fits >>> > in the smaller. >>> > >>> > For numpy < 1.6.0 we have this: >>> > >>> >>>> Bdata = np.int16(128) >>> >>>> (Adata + Bdata).dtype >>> > dtype('int8') >>> > >>> > That is - even if the data in the scalar does not fit in the dtype of >>> > the array to which it is being added, there is no upcasting. >>> > >>> > For numpy >= 1.6.0 we have this: >>> > >>> >>>> Bdata = np.int16(128) >>> >>>> (Adata + Bdata).dtype >>> > dtype('int16') >>> > >>> > There is upcasting... >>> > >>> > I can see why the numpy 1.6.0 way might be preferable but it is an API >>> > change I suppose. >>> > >>> > Best, >>> > >>> > Matthew >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion@scipy.org >>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion