Hi, On Mon, Jan 14, 2013 at 9:02 AM, Dave Hirschfeld <dave.hirschf...@gmail.com> wrote: > Robert Kern <robert.kern <at> gmail.com> writes: > >> >> >>> > >> >>> > One alternative that does not expand the API with two-liners is to let >> >>> > the ndarray.fill() method return self: >> >>> > >> >>> > a = np.empty(...).fill(20.0) >> >>> >> >>> This violates the convention that in-place operations never return >> >>> self, to avoid confusion with out-of-place operations. E.g. >> >>> ndarray.resize() versus ndarray.reshape(), ndarray.sort() versus >> >>> np.sort(), and in the broader Python world, list.sort() versus >> >>> sorted(), list.reverse() versus reversed(). (This was an explicit >> >>> reason given for list.sort to not return self, even.) >> >>> >> >>> Maybe enabling this idiom is a good enough reason to break the >> >>> convention ("Special cases aren't special enough to break the rules. / >> >>> Although practicality beats purity"), but it at least makes me -0 on >> >>> this... >> >>> >> >> >> >> I tend to agree with the notion that inplace operations shouldn't return >> >> self, but I don't know if it's just because I've been conditioned this >> >> way. >> >> Not returning self breaks the fluid interface pattern [1], as noted in a >> >> similar discussion on pandas [2], FWIW, though there's likely some way to >> >> have both worlds. >> > >> > Ah-hah, here's the email where Guide officially proclaims that there >> > shall be no "fluent interface" nonsense applied to in-place operators >> > in Python, because it hurts readability (at least for Dutch people >> > ): >> > http://mail.python.org/pipermail/python-dev/2003-October/038855.html >> >> That's a statement about the policy for the stdlib, and just one >> person's opinion. You, and numpy, are permitted to have a different >> opinion. >> >> In any case, I'm not strongly advocating for it. It's violation of >> principle ("no fluent interfaces") is roughly in the same ballpark as >> np.filled() ("not every two-liner needs its own function"), so I >> thought I would toss it out there for consideration. >> >> -- >> Robert Kern >> > > FWIW I'm +1 on the idea. Perhaps because I just don't see many practical > downsides to breaking the convention but I regularly see a big issue with > there > being no way to instantiate an array with a particular value. > > The one obvious way to do it is use ones and multiply by the value you want. I > work with a lot of inexperienced programmers and I see this idiom all the > time. > It takes a fair amount of numpy knowledge to know that you should do it in two > lines by using empty and setting a slice. > > In [1]: %timeit NaN*ones(10000) > 1000 loops, best of 3: 1.74 ms per loop > > In [2]: %%timeit > ...: x = empty(10000, dtype=float) > ...: x[:] = NaN > ...: > 10000 loops, best of 3: 28 us per loop > > In [3]: 1.74e-3/28e-6 > Out[3]: 62.142857142857146 > > > Even when not in the mythical "tight loop" setting an array to one and then > multiplying uses up a lot of cycles - it's nearly 2 orders of magnitude slower > than what we know they *should* be doing. > > I'm agnostic as to whether fill should be modified or new functions provided > but > I think numpy is currently missing this functionality and that providing it > would save a lot of new users from shooting themselves in the foot > performance- > wise.
Is this a fair summary? => fill(shape, val), fill_like(arr, val) - new functions, as proposed For: readable, seems to fit a pattern often used, presence in namespace may clue people into using the 'fill' rather than * val or + val Con: a very simple alias for a = ones(shape) ; a.fill(val), maybe cluttering already full namespace. => empty(shape).fill(val) - by allowing return value from arr.fill(val) For: readable Con: breaks guideline not to return anything from in-place operations, no presence in namespace means users may not find this pattern. => no new API For : easy maintenance Con : harder for users to discover fill pattern, filling a new array requires two lines instead of one. So maybe the decision rests on: How important is it that users see these function names in the namespace in order to discover the pattern "a = ones(shape) ; a.fill(val)"? How important is it to obey guidelines for no-return-from-in-place? How important is it to avoid expanding the namespace? How common is this pattern? On the last, I'd say that the only common use I have for this pattern is to fill an array with NaN. Cheers, Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion