Re: [Numpy-discussion] nan_to_num and bool arrays
On Fri, Dec 11, 2009 at 6:38 PM, Robert Kern wrote: > On Fri, Dec 11, 2009 at 18:38, Keith Goodman wrote: > >> That seems to work. To avoid changing the input >> x = np.array(1) x.shape >> () y = nan_to_num(x) x.shape >> (1,) >> >> I moved y = x.copy() further up and switched x's to y's. Here's what >> it looks like: >> >> def nan_to_num(x): >> is_scalar = False >> if not isinstance(x, _nx.ndarray): >> x = asarray(x) >> if x.shape == (): >> # Must return this as a scalar later. >> is_scalar = True >> y = x.copy() >> old_shape = y.shape >> if y.shape == (): >> # We need element access. >> y.shape = (1,) >> t = y.dtype.type >> if issubclass(t, _nx.complexfloating): >> return nan_to_num(y.real) + 1j * nan_to_num(y.imag) > > Almost! You need to handle the shape restoration in this branch, too. > > In [9]: nan_to_num(array(1+1j)) > Out[9]: array([ 1.+1.j]) Taking care of my imaginary bug has the nice side effect of leaving us with only one return statement. I changed return nan_to_num(y.real) + 1j * nan_to_num(y.imag) to y = nan_to_num(y.real) + 1j * nan_to_num(y.imag) And changed the if on the next line to elif. def nan_to_num(x): is_scalar = False if not isinstance(x, _nx.ndarray): x = asarray(x) if x.shape == (): # Must return this as a scalar later. is_scalar = True y = x.copy() old_shape = y.shape if y.shape == (): # We need element access. y.shape = (1,) t = y.dtype.type if issubclass(t, _nx.complexfloating): y = nan_to_num(y.real) + 1j * nan_to_num(y.imag) elif issubclass(t, _nx.inexact): are_inf = isposinf(y) are_neg_inf = isneginf(y) are_nan = isnan(y) maxf, minf = _getmaxmin(y.dtype.type) y[are_nan] = 0 y[are_inf] = maxf y[are_neg_inf] = minf if is_scalar: y = y[0] else: y.shape = old_shape return y ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Slicing slower than matrix multiplication?
> One thing to note is that dot uses optimized atlas if available, which > makes it quite faster than equivalent operations you would do using > purely numpy. I doubt that's the reason here, since the arrays are > small, but that's something to keep in mind when performances matter: > use dot wherever possible, it is generally faster than prod/sum, This is quite true; I once had a very large matrix (600 x 200,000) that I needed to normalize. Using .sum( ) and /= took about 30 minutes. When I switched to using dot( ) to do the same operation (matrix multiplication with a vector of 1's, then turning that into a diagonal matrix and using dot() again to normalize it), it dropped the computation time down to about 2 minutes. Most of the gain was likely due to ATLAS using all the cores and numpy only using 1, but I was still impressed. --Hoyt + Hoyt Koepke + University of Washington Department of Statistics + http://www.stat.washington.edu/~hoytak/ + hoy...@gmail.com ++ ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Slicing slower than matrix multiplication?
On Fri, Dec 11, 2009 at 10:06 PM, Bruce Southey wrote: > > Having said that, the more you can vectorize your function, the more > efficient it will likely be especially with Atlas etc. One thing to note is that dot uses optimized atlas if available, which makes it quite faster than equivalent operations you would do using purely numpy. I doubt that's the reason here, since the arrays are small, but that's something to keep in mind when performances matter: use dot wherever possible, it is generally faster than prod/sum, cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] non-standard standard deviation
Anne Archibald wrote: > > 2009/11/29 Dr. Phillip M. Feldman : > >> All of the statistical packages that I am currently using and have used >> in >> the past (Matlab, Minitab, R, S-plus) calculate standard deviation using >> the >> sqrt(1/(n-1)) normalization, which gives a result that is unbiased when >> sampling from a normally-distributed population. NumPy uses the >> sqrt(1/n) >> normalization. I'm currently using the following code to calculate >> standard >> deviations, but would much prefer if this could be fixed in NumPy itself: > > This issue was the subject of lengthy discussions on the mailing list, > the upshot of which is that in current versions of scipy, std and var > take an optional argument "ddof", into which you can supply 1 to get > the normalization you want. > > Anne > You are right that I can get the result that I want by setting ddof. Thanks! I still feel that the default value for ddof should be 1 rather than 0; new users are unlikely to read the documentation for a command like std, because it is reasonable to expect standard behavior across all statistical packages. Phillip -- View this message in context: http://old.nabble.com/non-standard-standard-deviation-tp26566808p26753999.html Sent from the Numpy-discussion mailing list archive at Nabble.com. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nan_to_num and bool arrays
On Fri, Dec 11, 2009 at 18:38, Keith Goodman wrote: > That seems to work. To avoid changing the input > >>> x = np.array(1) >>> x.shape > () >>> y = nan_to_num(x) >>> x.shape > (1,) > > I moved y = x.copy() further up and switched x's to y's. Here's what > it looks like: > > def nan_to_num(x): > is_scalar = False > if not isinstance(x, _nx.ndarray): > x = asarray(x) > if x.shape == (): > # Must return this as a scalar later. > is_scalar = True > y = x.copy() > old_shape = y.shape > if y.shape == (): > # We need element access. > y.shape = (1,) > t = y.dtype.type > if issubclass(t, _nx.complexfloating): > return nan_to_num(y.real) + 1j * nan_to_num(y.imag) Almost! You need to handle the shape restoration in this branch, too. In [9]: nan_to_num(array(1+1j)) Out[9]: array([ 1.+1.j]) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nan_to_num and bool arrays
On Fri, Dec 11, 2009 at 4:06 PM, Robert Kern wrote: > On Fri, Dec 11, 2009 at 17:44, Keith Goodman wrote: >> On Fri, Dec 11, 2009 at 2:22 PM, Robert Kern wrote: >>> On Fri, Dec 11, 2009 at 16:09, Keith Goodman wrote: On Fri, Dec 11, 2009 at 1:14 PM, Robert Kern wrote: > On Fri, Dec 11, 2009 at 14:41, Keith Goodman wrote: >> On Fri, Dec 11, 2009 at 12:08 PM, Bruce Southey >> wrote: > >>> So I agree that it should leave the input untouched when a non-float >>> dtype is used for some array-like input. >> >> Would only one line need to be changed? Would changing >> >> if not issubclass(t, _nx.integer): >> >> to >> >> if not issubclass(t, _nx.integer) and not issubclass(t, _nx.bool_): >> >> do the trick? > > That still leaves strings, voids, and objects. I recommend: > > if issubclass(t, _nx.inexact): > > Arguably, one should handle nan float objects in object arrays and > float columns in structured arrays, but the current code does not > handle either of those anyways. Without your change both >> np.nan_to_num(np.array([True, False])) >> np.nan_to_num([1]) raise exceptions. With your change: >> np.nan_to_num(np.array([True, False])) array([ True, False], dtype=bool) >> np.nan_to_num([1]) array([1]) >>> >>> I think this is correct, though the latter one happens by accident. >>> Lists don't have a .dtype attribute so obj2sctype(type([1])) is >>> checked and happens to be object_. The latter line is intended to >>> handle scalars, not sequences. I think that sequences should be >>> coerced to arrays for output and this check should be more explicit >>> about what it handles. [1.0] will have a problem if you don't. >> >> That makes sense. But I'm not smart enough to implement it. > > Something like the following at the top should help distinguish the > various cases.: > > is_scalar = False > if not isinstance(x, _nx.ndarray): > x = np.asarray(x) > if x.shape == (): > # Must return this as a scalar later. > is_scalar = True > old_shape = x.shape > if x.shape == (): > # We need element access. > x.shape = (1,) > t = x.dtype.type > > This should allow one to pass in [np.inf] and have it correctly get > interpreted as a float array rather than an object scalar. That seems to work. To avoid changing the input >> x = np.array(1) >> x.shape () >> y = nan_to_num(x) >> x.shape (1,) I moved y = x.copy() further up and switched x's to y's. Here's what it looks like: def nan_to_num(x): is_scalar = False if not isinstance(x, _nx.ndarray): x = asarray(x) if x.shape == (): # Must return this as a scalar later. is_scalar = True y = x.copy() old_shape = y.shape if y.shape == (): # We need element access. y.shape = (1,) t = y.dtype.type if issubclass(t, _nx.complexfloating): return nan_to_num(y.real) + 1j * nan_to_num(y.imag) if issubclass(t, _nx.inexact): are_inf = isposinf(y) are_neg_inf = isneginf(y) are_nan = isnan(y) maxf, minf = _getmaxmin(y.dtype.type) y[are_nan] = 0 y[are_inf] = maxf y[are_neg_inf] = minf if is_scalar: y = y[0] else: y.shape = old_shape return y ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nan_to_num and bool arrays
On Fri, Dec 11, 2009 at 18:03, Keith Goodman wrote: > Ack! The "if issubclass(t, _nx.inexact)" fix doesn't work. It solves > the bool problem but it introduces its own problem since numpy.object_ > is not a subclass of inexact: > >>> nan_to_num([np.inf]) > array([ Inf]) Right. This is the problem I was referring to: "I think that sequences should be coerced to arrays for output and this check should be more explicit about what it handles. [1.0] will have a problem if you don't." -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nan_to_num and bool arrays
On Fri, Dec 11, 2009 at 17:44, Keith Goodman wrote: > On Fri, Dec 11, 2009 at 2:22 PM, Robert Kern wrote: >> On Fri, Dec 11, 2009 at 16:09, Keith Goodman wrote: >>> On Fri, Dec 11, 2009 at 1:14 PM, Robert Kern wrote: On Fri, Dec 11, 2009 at 14:41, Keith Goodman wrote: > On Fri, Dec 11, 2009 at 12:08 PM, Bruce Southey > wrote: >> So I agree that it should leave the input untouched when a non-float >> dtype is used for some array-like input. > > Would only one line need to be changed? Would changing > > if not issubclass(t, _nx.integer): > > to > > if not issubclass(t, _nx.integer) and not issubclass(t, _nx.bool_): > > do the trick? That still leaves strings, voids, and objects. I recommend: if issubclass(t, _nx.inexact): Arguably, one should handle nan float objects in object arrays and float columns in structured arrays, but the current code does not handle either of those anyways. >>> >>> Without your change both >>> > np.nan_to_num(np.array([True, False])) > np.nan_to_num([1]) >>> >>> raise exceptions. With your change: >>> > np.nan_to_num(np.array([True, False])) >>> array([ True, False], dtype=bool) > np.nan_to_num([1]) >>> array([1]) >> >> I think this is correct, though the latter one happens by accident. >> Lists don't have a .dtype attribute so obj2sctype(type([1])) is >> checked and happens to be object_. The latter line is intended to >> handle scalars, not sequences. I think that sequences should be >> coerced to arrays for output and this check should be more explicit >> about what it handles. [1.0] will have a problem if you don't. > > That makes sense. But I'm not smart enough to implement it. Something like the following at the top should help distinguish the various cases.: is_scalar = False if not isinstance(x, _nx.ndarray): x = np.asarray(x) if x.shape == (): # Must return this as a scalar later. is_scalar = True old_shape = x.shape if x.shape == (): # We need element access. x.shape = (1,) t = x.dtype.type This should allow one to pass in [np.inf] and have it correctly get interpreted as a float array rather than an object scalar. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nan_to_num and bool arrays
On Fri, Dec 11, 2009 at 3:44 PM, Keith Goodman wrote: > On Fri, Dec 11, 2009 at 2:22 PM, Robert Kern wrote: >> On Fri, Dec 11, 2009 at 16:09, Keith Goodman wrote: >>> On Fri, Dec 11, 2009 at 1:14 PM, Robert Kern wrote: On Fri, Dec 11, 2009 at 14:41, Keith Goodman wrote: > On Fri, Dec 11, 2009 at 12:08 PM, Bruce Southey > wrote: >> So I agree that it should leave the input untouched when a non-float >> dtype is used for some array-like input. > > Would only one line need to be changed? Would changing > > if not issubclass(t, _nx.integer): > > to > > if not issubclass(t, _nx.integer) and not issubclass(t, _nx.bool_): > > do the trick? That still leaves strings, voids, and objects. I recommend: if issubclass(t, _nx.inexact): Arguably, one should handle nan float objects in object arrays and float columns in structured arrays, but the current code does not handle either of those anyways. >>> >>> Without your change both >>> > np.nan_to_num(np.array([True, False])) > np.nan_to_num([1]) >>> >>> raise exceptions. With your change: >>> > np.nan_to_num(np.array([True, False])) >>> array([ True, False], dtype=bool) > np.nan_to_num([1]) >>> array([1]) >> >> I think this is correct, though the latter one happens by accident. >> Lists don't have a .dtype attribute so obj2sctype(type([1])) is >> checked and happens to be object_. The latter line is intended to >> handle scalars, not sequences. I think that sequences should be >> coerced to arrays for output and this check should be more explicit >> about what it handles. [1.0] will have a problem if you don't. > > That makes sense. But I'm not smart enough to implement it. > >>> On a separate note, this seems a little awkward: >>> > np.nan_to_num(1.0) >>> 1.0 > np.nan_to_num(1) >>> array(1) > x = np.ones(1, dtype=np.int) > np.nan_to_num(x[0]) >>> 1 >> >> Worth fixing. > > Would this work? > > def nan_to_num(x): > try: > t = x.dtype.type > except AttributeError: > t = obj2sctype(type(x)) > if issubclass(t, _nx.complexfloating): > return nan_to_num(x.real) + 1j * nan_to_num(x.imag) > else: > try: > y = x.copy() > except AttributeError: > y = array(x) > if not y.shape: > y = array([x]) > scalar = True > else: > scalar = False > if issubclass(t, _nx.inexact): > are_inf = isposinf(y) > are_neg_inf = isneginf(y) > are_nan = isnan(y) > maxf, minf = _getmaxmin(y.dtype.type) > y[are_nan] = 0 > y[are_inf] = maxf > y[are_neg_inf] = minf > if scalar: > y = y[0] > return y > > Instead of > >>> nan_to_num(1.0) > 1.0 >>> nan_to_num(1) > array(1) >>> nan_to_num(np.array(1.0)) > 1.0 >>> nan_to_num(np.array(1)) > array(1) > > it gives > >>> nan_to_num(1.0) > 1.0 >>> nan_to_num(1) > 1 >>> nan_to_num(np.array(1.0)) > 1.0 >>> nan_to_num(np.array(1)) > 1 > > I guess a lot of unit tests need to be written before nan_to_num can > be fixed. But for now, your bool fix is an improvement. Ack! The "if issubclass(t, _nx.inexact)" fix doesn't work. It solves the bool problem but it introduces its own problem since numpy.object_ is not a subclass of inexact: >> nan_to_num([np.inf]) array([ Inf]) Yeah, way too many special cases here to do this without full unit test coverage. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nan_to_num and bool arrays
On Fri, Dec 11, 2009 at 2:22 PM, Robert Kern wrote: > On Fri, Dec 11, 2009 at 16:09, Keith Goodman wrote: >> On Fri, Dec 11, 2009 at 1:14 PM, Robert Kern wrote: >>> On Fri, Dec 11, 2009 at 14:41, Keith Goodman wrote: On Fri, Dec 11, 2009 at 12:08 PM, Bruce Southey wrote: >>> > So I agree that it should leave the input untouched when a non-float > dtype is used for some array-like input. Would only one line need to be changed? Would changing if not issubclass(t, _nx.integer): to if not issubclass(t, _nx.integer) and not issubclass(t, _nx.bool_): do the trick? >>> >>> That still leaves strings, voids, and objects. I recommend: >>> >>> if issubclass(t, _nx.inexact): >>> >>> Arguably, one should handle nan float objects in object arrays and >>> float columns in structured arrays, but the current code does not >>> handle either of those anyways. >> >> Without your change both >> np.nan_to_num(np.array([True, False])) np.nan_to_num([1]) >> >> raise exceptions. With your change: >> np.nan_to_num(np.array([True, False])) >> array([ True, False], dtype=bool) np.nan_to_num([1]) >> array([1]) > > I think this is correct, though the latter one happens by accident. > Lists don't have a .dtype attribute so obj2sctype(type([1])) is > checked and happens to be object_. The latter line is intended to > handle scalars, not sequences. I think that sequences should be > coerced to arrays for output and this check should be more explicit > about what it handles. [1.0] will have a problem if you don't. That makes sense. But I'm not smart enough to implement it. >> On a separate note, this seems a little awkward: >> np.nan_to_num(1.0) >> 1.0 np.nan_to_num(1) >> array(1) x = np.ones(1, dtype=np.int) np.nan_to_num(x[0]) >> 1 > > Worth fixing. Would this work? def nan_to_num(x): try: t = x.dtype.type except AttributeError: t = obj2sctype(type(x)) if issubclass(t, _nx.complexfloating): return nan_to_num(x.real) + 1j * nan_to_num(x.imag) else: try: y = x.copy() except AttributeError: y = array(x) if not y.shape: y = array([x]) scalar = True else: scalar = False if issubclass(t, _nx.inexact): are_inf = isposinf(y) are_neg_inf = isneginf(y) are_nan = isnan(y) maxf, minf = _getmaxmin(y.dtype.type) y[are_nan] = 0 y[are_inf] = maxf y[are_neg_inf] = minf if scalar: y = y[0] return y Instead of >> nan_to_num(1.0) 1.0 >> nan_to_num(1) array(1) >> nan_to_num(np.array(1.0)) 1.0 >> nan_to_num(np.array(1)) array(1) it gives >> nan_to_num(1.0) 1.0 >> nan_to_num(1) 1 >> nan_to_num(np.array(1.0)) 1.0 >> nan_to_num(np.array(1)) 1 I guess a lot of unit tests need to be written before nan_to_num can be fixed. But for now, your bool fix is an improvement. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nan_to_num and bool arrays
On Fri, Dec 11, 2009 at 16:09, Keith Goodman wrote: > On Fri, Dec 11, 2009 at 1:14 PM, Robert Kern wrote: >> On Fri, Dec 11, 2009 at 14:41, Keith Goodman wrote: >>> On Fri, Dec 11, 2009 at 12:08 PM, Bruce Southey wrote: >> So I agree that it should leave the input untouched when a non-float dtype is used for some array-like input. >>> >>> Would only one line need to be changed? Would changing >>> >>> if not issubclass(t, _nx.integer): >>> >>> to >>> >>> if not issubclass(t, _nx.integer) and not issubclass(t, _nx.bool_): >>> >>> do the trick? >> >> That still leaves strings, voids, and objects. I recommend: >> >> if issubclass(t, _nx.inexact): >> >> Arguably, one should handle nan float objects in object arrays and >> float columns in structured arrays, but the current code does not >> handle either of those anyways. > > Without your change both > >>> np.nan_to_num(np.array([True, False])) >>> np.nan_to_num([1]) > > raise exceptions. With your change: > >>> np.nan_to_num(np.array([True, False])) > array([ True, False], dtype=bool) >>> np.nan_to_num([1]) > array([1]) I think this is correct, though the latter one happens by accident. Lists don't have a .dtype attribute so obj2sctype(type([1])) is checked and happens to be object_. The latter line is intended to handle scalars, not sequences. I think that sequences should be coerced to arrays for output and this check should be more explicit about what it handles. [1.0] will have a problem if you don't. > On a separate note, this seems a little awkward: > >>> np.nan_to_num(1.0) > 1.0 >>> np.nan_to_num(1) > array(1) >>> x = np.ones(1, dtype=np.int) >>> np.nan_to_num(x[0]) > 1 Worth fixing. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nan_to_num and bool arrays
On Fri, Dec 11, 2009 at 1:14 PM, Robert Kern wrote: > On Fri, Dec 11, 2009 at 14:41, Keith Goodman wrote: >> On Fri, Dec 11, 2009 at 12:08 PM, Bruce Southey wrote: > >>> So I agree that it should leave the input untouched when a non-float >>> dtype is used for some array-like input. >> >> Would only one line need to be changed? Would changing >> >> if not issubclass(t, _nx.integer): >> >> to >> >> if not issubclass(t, _nx.integer) and not issubclass(t, _nx.bool_): >> >> do the trick? > > That still leaves strings, voids, and objects. I recommend: > > if issubclass(t, _nx.inexact): > > Arguably, one should handle nan float objects in object arrays and > float columns in structured arrays, but the current code does not > handle either of those anyways. Without your change both >> np.nan_to_num(np.array([True, False])) >> np.nan_to_num([1]) raise exceptions. With your change: >> np.nan_to_num(np.array([True, False])) array([ True, False], dtype=bool) >> np.nan_to_num([1]) array([1]) On a separate note, this seems a little awkward: >> np.nan_to_num(1.0) 1.0 >> np.nan_to_num(1) array(1) >> x = np.ones(1, dtype=np.int) >> np.nan_to_num(x[0]) 1 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nan_to_num and bool arrays
On Fri, Dec 11, 2009 at 14:41, Keith Goodman wrote: > On Fri, Dec 11, 2009 at 12:08 PM, Bruce Southey wrote: >> So I agree that it should leave the input untouched when a non-float >> dtype is used for some array-like input. > > Would only one line need to be changed? Would changing > > if not issubclass(t, _nx.integer): > > to > > if not issubclass(t, _nx.integer) and not issubclass(t, _nx.bool_): > > do the trick? That still leaves strings, voids, and objects. I recommend: if issubclass(t, _nx.inexact): Arguably, one should handle nan float objects in object arrays and float columns in structured arrays, but the current code does not handle either of those anyways. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nan_to_num and bool arrays
On Fri, Dec 11, 2009 at 12:08 PM, Bruce Southey wrote: > On 12/11/2009 01:33 PM, Robert Kern wrote: >> On Fri, Dec 11, 2009 at 13:11, Bruce Southey wrote: >> >> >>> As documented, nan_to_num returns a float so it does not return the >>> input unchanged. >>> > Sorry for my mistake: > Given an int input, np.nan_to_num returns an int dtype > >>> np.nan_to_num(np.zeros((3,3), dtype=np.int)).dtype > dtype('int64') > >> I think that is describing the current behavior rather than >> documenting the intent of the function. Given the high level purpose >> of the function, to "[r]eplace nan with zero and inf with finite >> numbers," I think it is fairly reasonable to implement it as a no-op >> for integers and related dtypes. There are no nans or infs for those >> dtypes so the input can be passed back unchanged. >> > > So I agree that it should leave the input untouched when a non-float > dtype is used for some array-like input. Would only one line need to be changed? Would changing if not issubclass(t, _nx.integer): to if not issubclass(t, _nx.integer) and not issubclass(t, _nx.bool_): do the trick? Here's nan_to_num for reference: def nan_to_num(x): try: t = x.dtype.type except AttributeError: t = obj2sctype(type(x)) if issubclass(t, _nx.complexfloating): return nan_to_num(x.real) + 1j * nan_to_num(x.imag) else: try: y = x.copy() except AttributeError: y = array(x) if not issubclass(t, _nx.integer): if not y.shape: y = array([x]) scalar = True else: scalar = False are_inf = isposinf(y) are_neg_inf = isneginf(y) are_nan = isnan(y) maxf, minf = _getmaxmin(y.dtype.type) y[are_nan] = 0 y[are_inf] = maxf y[are_neg_inf] = minf if scalar: y = y[0] return y ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nan_to_num and bool arrays
On 12/11/2009 01:33 PM, Robert Kern wrote: > On Fri, Dec 11, 2009 at 13:11, Bruce Southey wrote: > > >> As documented, nan_to_num returns a float so it does not return the >> input unchanged. >> Sorry for my mistake: Given an int input, np.nan_to_num returns an int dtype >>> np.nan_to_num(np.zeros((3,3), dtype=np.int)).dtype dtype('int64') > I think that is describing the current behavior rather than > documenting the intent of the function. Given the high level purpose > of the function, to "[r]eplace nan with zero and inf with finite > numbers," I think it is fairly reasonable to implement it as a no-op > for integers and related dtypes. There are no nans or infs for those > dtypes so the input can be passed back unchanged. > So I agree that it should leave the input untouched when a non-float dtype is used for some array-like input. Bruce ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nan_to_num and bool arrays
On Fri, Dec 11, 2009 at 13:11, Bruce Southey wrote: > As documented, nan_to_num returns a float so it does not return the > input unchanged. I think that is describing the current behavior rather than documenting the intent of the function. Given the high level purpose of the function, to "[r]eplace nan with zero and inf with finite numbers," I think it is fairly reasonable to implement it as a no-op for integers and related dtypes. There are no nans or infs for those dtypes so the input can be passed back unchanged. > That is the output of np.nan_to_num(np.zeros((3,3))) is a float array > not an int array. This is also why np.finfo() fails because it is not > give a float (that is, it also gives the same output if the argument to > np.finfo() is an int rather than an boolean type). > > I am curious why do you expect this conversion to work given how Python > defines boolean types > (http://docs.python.org/library/stdtypes.html#boolean-values). > > It is ambiguous to convert from boolean to float since anything that is > not zero is 'True' and that NaN is not zero: > >>> bool(np.PINF) > True > >>> bool(np.NINF) > True > >>> bool(np.NaN) > True > >>> bool(np.PZERO) > False > >>> bool(np.NZERO) > False No, that's the other way around, converting floats to bools. Converting bools to floats is trivial: True->1.0, False->0.0. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nan_to_num and bool arrays
On 12/11/2009 10:21 AM, Keith Goodman wrote: > On Fri, Dec 11, 2009 at 12:50 AM, Nicolas Rougier > wrote: > >> Hello, >> >> Using both numpy 1.3.0 and 1.4.0rc1 I got the following exception using >> nan_to_num on a bool array, is that the expected behavior ? >> >> >> > import numpy > Z = numpy.zeros((3,3),dtype=bool) > numpy.nan_to_num(Z) > >> Traceback (most recent call last): >> File "", line 1, in >> File "/usr/lib/python2.6/dist-packages/numpy/lib/type_check.py", line >> 374, in nan_to_num >> maxf, minf = _getmaxmin(y.dtype.type) >> File "/usr/lib/python2.6/dist-packages/numpy/lib/type_check.py", line >> 307, in _getmaxmin >> f = getlimits.finfo(t) >> File "/usr/lib/python2.6/dist-packages/numpy/core/getlimits.py", line >> 103, in __new__ >> raise ValueError, "data type %r not inexact" % (dtype) >> ValueError: data type not inexact >> > I guess a check for bool could be added at the top of nan_to_num. If > the input x is a bool then nan_to_num would just return x unchanged. > Or perhaps > > maxf, minf = _getmaxmin(y.dtype.type) > > could return False, True. > > Best bet is probably to file a ticket. And then pray. > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > As documented, nan_to_num returns a float so it does not return the input unchanged. That is the output of np.nan_to_num(np.zeros((3,3))) is a float array not an int array. This is also why np.finfo() fails because it is not give a float (that is, it also gives the same output if the argument to np.finfo() is an int rather than an boolean type). I am curious why do you expect this conversion to work given how Python defines boolean types (http://docs.python.org/library/stdtypes.html#boolean-values). It is ambiguous to convert from boolean to float since anything that is not zero is 'True' and that NaN is not zero: >>> bool(np.PINF) True >>> bool(np.NINF) True >>> bool(np.NaN) True >>> bool(np.PZERO) False >>> bool(np.NZERO) False So what do you behavior do you expect to see? Bruce ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] December Webinar: SciPy India with Travis Oliphant
Next Friday Enthought will be hosting our monthly Scientific Computing with Python Webinar: Summary of SciPy India Friday December 18 1pm CST/ 7pm UTC Register at GoToMeeting Enthought President Travis Oliphant is currently in Kerala, India as the keynote speaker at SciPy India 2009. Due to a training engagement, Travis missed SciPy for the first time this summer, so he’s excited for this additional opportunity to meet and collaborate with the scientific Python community. Speakers at the event include Jarrod Millman, David Cournapeau, Christopher Burns, Prabhu Ramachandran, and Asokan Pichai — a great group. We’re looking forward to hearing Travis’ review of the proceedings. Hope to see you there! Enthought Media ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Slicing slower than matrix multiplication?
A Friday 11 December 2009 17:36:54 Bruce Southey escrigué: > On 12/11/2009 10:03 AM, Francesc Alted wrote: > > A Friday 11 December 2009 16:44:29 Dag Sverre Seljebotn escrigué: > >> Jasper van de Gronde wrote: > >>> Dag Sverre Seljebotn wrote: > Jasper van de Gronde wrote: > > I've attached a test file which shows the problem. It also tries > > adding columns instead of rows (in case the memory layout is playing > > tricks), but this seems to make no difference. This is the output I > > got: > > > > Dot product: 5.188786 > > Add a row: 8.032767 > > Add a column: 8.070953 > > > > Any ideas on why adding a row (or column) of a matrix is slower than > > computing a matrix product with a similarly sized matrix... (Xi has > > less columns than Xi2, but just as many rows.) > > I think we need some numbers to put this into context -- how big are > the vectors/matrices? How many iterations was the loop run? If the > vectors are small and the loop is run many times, how fast the > operation "ought" to be is irrelevant as it would drown in Python > overhead. > >>> > >>> Originally I had attached a Python file demonstrating the problem, but > >>> apparently this wasn't accepted by the list. In any case, the matrices > >>> and vectors weren't too big (60x20), so I tried making them bigger and > >>> indeed the "fast" version was now considerably faster. > >> > >> 60x20 is "nothing", so a full matrix multiplication or a single > >> matrix-vector probably takes the same time (that is, the difference > >> between them in itself is likely smaller than the error you make during > >> measuring). > >> > >> In this context, the benchmarks will be completely dominated by the > >> number of Python calls you make (each, especially taking the slice, > >> means allocating Python objects, calling a bunch of functions in C, etc. > >> etc). So it's not that strange, taking a slice isn't free, some Python > >> objects must be created etc. etc. > > > > Yeah, I think taking slices here is taking quite a lot of time: > > > > In [58]: timeit E + Xi2[P/2,:] > > 10 loops, best of 3: 3.95 µs per loop > > > > In [59]: timeit E + Xi2[P/2] > > 10 loops, best of 3: 2.17 µs per loop > > > > don't know why the additional ',:' in the slice is taking so much time, > > but my guess is that passing& analyzing the second argument > > (slice(None,None,None)) could be the responsible for the slowdown (but > > that is taking too much time). Mmh, perhaps it would be worth to study > > this more carefully so that an optimization could be done in NumPy. > > > >> I think the lesson mostly should be that with so little data, > >> benchmarking becomes a very difficult art. > > > > Well, I think it is not difficult, it is just that you are perhaps > > benchmarking Python/NumPy machinery instead ;-) I'm curious whether > > Matlab can do slicing much more faster than NumPy. Jasper? > > What are using actually trying to test here? Good question. I don't know for sure :-) > I do not see any equivalence in the operations or output here. > -With your slices you need two dot products but ultimately you are only > using one for your dot product. > -There are addition operations on the slices that are not present in the > dot product. > -The final E arrays are not the same for all three operations. I don't understand the ultimate goal of the OP either, but what caught my attention was that: In [74]: timeit Xi2[P/2] 100 loops, best of 3: 278 ns per loop In [75]: timeit Xi2[P/2,:] 100 loops, best of 3: 1.04 µs per loop i.e. adding an additional parameter (the ':') to the slice, drives the time to run almost 4x slower. And with this, it is *partially* explained the problem exposed by OP, i.e.: In [77]: timeit np.dot(Xi,w) 10 loops, best of 3: 2.91 µs per loop In [78]: timeit E + Xi2[P/2] 10 loops, best of 3: 2.05 µs per loop In [79]: timeit E + Xi2[P/2,:] 10 loops, best of 3: 3.81 µs per loop But again, don't ask me whether the results are okay or not. I'm playing here the role of a pure computational scientist on a very concrete problem ;-) > Having said that, the more you can vectorize your function, the more > efficient it will likely be especially with Atlas etc. Except if your arrays are small enough, which is the underlying issue here IMO. -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nan_to_num and bool arrays
I've created a ticket (#1327). Nicolas On Dec 11, 2009, at 17:21 , Keith Goodman wrote: > On Fri, Dec 11, 2009 at 12:50 AM, Nicolas Rougier > wrote: >> >> Hello, >> >> Using both numpy 1.3.0 and 1.4.0rc1 I got the following exception using >> nan_to_num on a bool array, is that the expected behavior ? >> >> > import numpy > Z = numpy.zeros((3,3),dtype=bool) > numpy.nan_to_num(Z) >> Traceback (most recent call last): >> File "", line 1, in >> File "/usr/lib/python2.6/dist-packages/numpy/lib/type_check.py", line >> 374, in nan_to_num >>maxf, minf = _getmaxmin(y.dtype.type) >> File "/usr/lib/python2.6/dist-packages/numpy/lib/type_check.py", line >> 307, in _getmaxmin >>f = getlimits.finfo(t) >> File "/usr/lib/python2.6/dist-packages/numpy/core/getlimits.py", line >> 103, in __new__ >>raise ValueError, "data type %r not inexact" % (dtype) >> ValueError: data type not inexact > > I guess a check for bool could be added at the top of nan_to_num. If > the input x is a bool then nan_to_num would just return x unchanged. > Or perhaps > > maxf, minf = _getmaxmin(y.dtype.type) > > could return False, True. > > Best bet is probably to file a ticket. And then pray. > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Slicing slower than matrix multiplication?
On 12/11/2009 10:03 AM, Francesc Alted wrote: > A Friday 11 December 2009 16:44:29 Dag Sverre Seljebotn escrigué: > >> Jasper van de Gronde wrote: >> >>> Dag Sverre Seljebotn wrote: >>> Jasper van de Gronde wrote: > I've attached a test file which shows the problem. It also tries adding > columns instead of rows (in case the memory layout is playing tricks), > but this seems to make no difference. This is the output I got: > > Dot product: 5.188786 > Add a row: 8.032767 > Add a column: 8.070953 > > Any ideas on why adding a row (or column) of a matrix is slower than > computing a matrix product with a similarly sized matrix... (Xi has > less columns than Xi2, but just as many rows.) > I think we need some numbers to put this into context -- how big are the vectors/matrices? How many iterations was the loop run? If the vectors are small and the loop is run many times, how fast the operation "ought" to be is irrelevant as it would drown in Python overhead. >>> Originally I had attached a Python file demonstrating the problem, but >>> apparently this wasn't accepted by the list. In any case, the matrices >>> and vectors weren't too big (60x20), so I tried making them bigger and >>> indeed the "fast" version was now considerably faster. >>> >> 60x20 is "nothing", so a full matrix multiplication or a single >> matrix-vector probably takes the same time (that is, the difference >> between them in itself is likely smaller than the error you make during >> measuring). >> >> In this context, the benchmarks will be completely dominated by the >> number of Python calls you make (each, especially taking the slice, >> means allocating Python objects, calling a bunch of functions in C, etc. >> etc). So it's not that strange, taking a slice isn't free, some Python >> objects must be created etc. etc. >> > Yeah, I think taking slices here is taking quite a lot of time: > > In [58]: timeit E + Xi2[P/2,:] > 10 loops, best of 3: 3.95 µs per loop > > In [59]: timeit E + Xi2[P/2] > 10 loops, best of 3: 2.17 µs per loop > > don't know why the additional ',:' in the slice is taking so much time, but my > guess is that passing& analyzing the second argument (slice(None,None,None)) > could be the responsible for the slowdown (but that is taking too much time). > Mmh, perhaps it would be worth to study this more carefully so that an > optimization could be done in NumPy. > > >> I think the lesson mostly should be that with so little data, >> benchmarking becomes a very difficult art. >> > Well, I think it is not difficult, it is just that you are perhaps > benchmarking Python/NumPy machinery instead ;-) I'm curious whether Matlab > can do slicing much more faster than NumPy. Jasper? > > What are using actually trying to test here? I do not see any equivalence in the operations or output here. -With your slices you need two dot products but ultimately you are only using one for your dot product. -There are addition operations on the slices that are not present in the dot product. -The final E arrays are not the same for all three operations. Having said that, the more you can vectorize your function, the more efficient it will likely be especially with Atlas etc. Bruce ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] nan_to_num and bool arrays
On Fri, Dec 11, 2009 at 12:50 AM, Nicolas Rougier wrote: > > Hello, > > Using both numpy 1.3.0 and 1.4.0rc1 I got the following exception using > nan_to_num on a bool array, is that the expected behavior ? > > import numpy Z = numpy.zeros((3,3),dtype=bool) numpy.nan_to_num(Z) > Traceback (most recent call last): > File "", line 1, in > File "/usr/lib/python2.6/dist-packages/numpy/lib/type_check.py", line > 374, in nan_to_num > maxf, minf = _getmaxmin(y.dtype.type) > File "/usr/lib/python2.6/dist-packages/numpy/lib/type_check.py", line > 307, in _getmaxmin > f = getlimits.finfo(t) > File "/usr/lib/python2.6/dist-packages/numpy/core/getlimits.py", line > 103, in __new__ > raise ValueError, "data type %r not inexact" % (dtype) > ValueError: data type not inexact I guess a check for bool could be added at the top of nan_to_num. If the input x is a bool then nan_to_num would just return x unchanged. Or perhaps maxf, minf = _getmaxmin(y.dtype.type) could return False, True. Best bet is probably to file a ticket. And then pray. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [Nipy-devel] Impossibility to build nipy on recent numpy?
On Thu, Dec 10, 2009 at 05:19:24PM +0100, Gael Varoquaux wrote: > On Thu, Dec 10, 2009 at 10:17:43AM -0600, Robert Kern wrote: > > > OK, so we need to bug report to ubuntu. Anybody feels like doing it, or > > > do I need to go ahead :). > > It's your problem. :-) > That's kinda what I thought. I was just try to dump work on someone else > :). I'll do it. OK, done: https://bugs.launchpad.net/ubuntu/+source/python-numpy/+bug/495537 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Slicing slower than matrix multiplication?
A Friday 11 December 2009 16:44:29 Dag Sverre Seljebotn escrigué: > Jasper van de Gronde wrote: > > Dag Sverre Seljebotn wrote: > >> Jasper van de Gronde wrote: > >>> I've attached a test file which shows the problem. It also tries adding > >>> columns instead of rows (in case the memory layout is playing tricks), > >>> but this seems to make no difference. This is the output I got: > >>> > >>> Dot product: 5.188786 > >>> Add a row: 8.032767 > >>> Add a column: 8.070953 > >>> > >>> Any ideas on why adding a row (or column) of a matrix is slower than > >>> computing a matrix product with a similarly sized matrix... (Xi has > >>> less columns than Xi2, but just as many rows.) > >> > >> I think we need some numbers to put this into context -- how big are the > >> vectors/matrices? How many iterations was the loop run? If the vectors > >> are small and the loop is run many times, how fast the operation "ought" > >> to be is irrelevant as it would drown in Python overhead. > > > > Originally I had attached a Python file demonstrating the problem, but > > apparently this wasn't accepted by the list. In any case, the matrices > > and vectors weren't too big (60x20), so I tried making them bigger and > > indeed the "fast" version was now considerably faster. > > 60x20 is "nothing", so a full matrix multiplication or a single > matrix-vector probably takes the same time (that is, the difference > between them in itself is likely smaller than the error you make during > measuring). > > In this context, the benchmarks will be completely dominated by the > number of Python calls you make (each, especially taking the slice, > means allocating Python objects, calling a bunch of functions in C, etc. > etc). So it's not that strange, taking a slice isn't free, some Python > objects must be created etc. etc. Yeah, I think taking slices here is taking quite a lot of time: In [58]: timeit E + Xi2[P/2,:] 10 loops, best of 3: 3.95 µs per loop In [59]: timeit E + Xi2[P/2] 10 loops, best of 3: 2.17 µs per loop don't know why the additional ',:' in the slice is taking so much time, but my guess is that passing & analyzing the second argument (slice(None,None,None)) could be the responsible for the slowdown (but that is taking too much time). Mmh, perhaps it would be worth to study this more carefully so that an optimization could be done in NumPy. > I think the lesson mostly should be that with so little data, > benchmarking becomes a very difficult art. Well, I think it is not difficult, it is just that you are perhaps benchmarking Python/NumPy machinery instead ;-) I'm curious whether Matlab can do slicing much more faster than NumPy. Jasper? -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Slicing slower than matrix multiplication?
Jasper van de Gronde wrote: > Dag Sverre Seljebotn wrote: > >> Jasper van de Gronde wrote: >> >>> I've attached a test file which shows the problem. It also tries adding >>> columns instead of rows (in case the memory layout is playing tricks), >>> but this seems to make no difference. This is the output I got: >>> >>> Dot product: 5.188786 >>> Add a row: 8.032767 >>> Add a column: 8.070953 >>> >>> Any ideas on why adding a row (or column) of a matrix is slower than >>> computing a matrix product with a similarly sized matrix... (Xi has less >>> columns than Xi2, but just as many rows.) >>> >>> >> I think we need some numbers to put this into context -- how big are the >> vectors/matrices? How many iterations was the loop run? If the vectors >> are small and the loop is run many times, how fast the operation "ought" >> to be is irrelevant as it would drown in Python overhead. >> > > Originally I had attached a Python file demonstrating the problem, but > apparently this wasn't accepted by the list. In any case, the matrices > and vectors weren't too big (60x20), so I tried making them bigger and > indeed the "fast" version was now considerably faster. > 60x20 is "nothing", so a full matrix multiplication or a single matrix-vector probably takes the same time (that is, the difference between them in itself is likely smaller than the error you make during measuring). In this context, the benchmarks will be completely dominated by the number of Python calls you make (each, especially taking the slice, means allocating Python objects, calling a bunch of functions in C, etc. etc). So it's not that strange, taking a slice isn't free, some Python objects must be created etc. etc. I think the lesson mostly should be that with so little data, benchmarking becomes a very difficult art. Dag Sverre > But still, this seems like a very odd difference. I know Python is an > interpreted language and has a lot of overhead, but still, selecting a > row/column shouldn't be THAT slow, should it? To be clear, this is the > code I used for testing: > > import timeit > > setupCode = """ > import numpy as np > > P = 60 > N = 20 > > Xi = np.random.standard_normal((P,N)) > w = np.random.standard_normal((N)) > Xi2 = np.dot(Xi,Xi.T) > E = np.dot(Xi,w) > """ > > N = 1 > > dotProduct = timeit.Timer('E = np.dot(Xi,w)',setupCode) > additionRow = timeit.Timer('E += Xi2[P/2,:]',setupCode) > additionCol = timeit.Timer('E += Xi2[:,P/2]',setupCode) > print "Dot product: %f" % dotProduct.timeit(N) > print "Add a row: %f" % additionRow.timeit(N) > print "Add a column: %f" % additionCol.timeit(N) > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Slicing slower than matrix multiplication?
Dag Sverre Seljebotn wrote: > Jasper van de Gronde wrote: >> I've attached a test file which shows the problem. It also tries adding >> columns instead of rows (in case the memory layout is playing tricks), >> but this seems to make no difference. This is the output I got: >> >> Dot product: 5.188786 >> Add a row: 8.032767 >> Add a column: 8.070953 >> >> Any ideas on why adding a row (or column) of a matrix is slower than >> computing a matrix product with a similarly sized matrix... (Xi has less >> columns than Xi2, but just as many rows.) >> > I think we need some numbers to put this into context -- how big are the > vectors/matrices? How many iterations was the loop run? If the vectors > are small and the loop is run many times, how fast the operation "ought" > to be is irrelevant as it would drown in Python overhead. Originally I had attached a Python file demonstrating the problem, but apparently this wasn't accepted by the list. In any case, the matrices and vectors weren't too big (60x20), so I tried making them bigger and indeed the "fast" version was now considerably faster. But still, this seems like a very odd difference. I know Python is an interpreted language and has a lot of overhead, but still, selecting a row/column shouldn't be THAT slow, should it? To be clear, this is the code I used for testing: import timeit setupCode = """ import numpy as np P = 60 N = 20 Xi = np.random.standard_normal((P,N)) w = np.random.standard_normal((N)) Xi2 = np.dot(Xi,Xi.T) E = np.dot(Xi,w) """ N = 1 dotProduct = timeit.Timer('E = np.dot(Xi,w)',setupCode) additionRow = timeit.Timer('E += Xi2[P/2,:]',setupCode) additionCol = timeit.Timer('E += Xi2[:,P/2]',setupCode) print "Dot product: %f" % dotProduct.timeit(N) print "Add a row: %f" % additionRow.timeit(N) print "Add a column: %f" % additionCol.timeit(N) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Slicing slower than matrix multiplication?
Jasper van de Gronde wrote: > (Resending without attachment as I don't think my previous message arrived.) > > I just started using numpy and am very, very pleased with the > functionality and cleanness so far. However, I tried what I though would > be a simple optimization and found that the opposite was true. > Specifically, I had a loop where something like this was done: > > w += Xi[mu,:] > E = np.dot(Xi,w) > > Instead of repeatedly doing the matrix product I thought I'd do the > matrix product just once, before the loop, compute the product > np.dot(Xi,Xi.T) and then do: > > w += Xi[mu,:] > E += Xi2[mu,:] > > Seems like a clear winner, instead of doing a matrix multiplication it > simply has to sum two vectors (in-place). However, it turned out to be > 1.5 times SLOWER... > > I've attached a test file which shows the problem. It also tries adding > columns instead of rows (in case the memory layout is playing tricks), > but this seems to make no difference. This is the output I got: > > Dot product: 5.188786 > Add a row: 8.032767 > Add a column: 8.070953 > > Any ideas on why adding a row (or column) of a matrix is slower than > computing a matrix product with a similarly sized matrix... (Xi has less > columns than Xi2, but just as many rows.) > I think we need some numbers to put this into context -- how big are the vectors/matrices? How many iterations was the loop run? If the vectors are small and the loop is run many times, how fast the operation "ought" to be is irrelevant as it would drown in Python overhead. Dag Sverre ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Slicing slower than matrix multiplication?
(Resending without attachment as I don't think my previous message arrived.) I just started using numpy and am very, very pleased with the functionality and cleanness so far. However, I tried what I though would be a simple optimization and found that the opposite was true. Specifically, I had a loop where something like this was done: w += Xi[mu,:] E = np.dot(Xi,w) Instead of repeatedly doing the matrix product I thought I'd do the matrix product just once, before the loop, compute the product np.dot(Xi,Xi.T) and then do: w += Xi[mu,:] E += Xi2[mu,:] Seems like a clear winner, instead of doing a matrix multiplication it simply has to sum two vectors (in-place). However, it turned out to be 1.5 times SLOWER... I've attached a test file which shows the problem. It also tries adding columns instead of rows (in case the memory layout is playing tricks), but this seems to make no difference. This is the output I got: Dot product: 5.188786 Add a row: 8.032767 Add a column: 8.070953 Any ideas on why adding a row (or column) of a matrix is slower than computing a matrix product with a similarly sized matrix... (Xi has less columns than Xi2, but just as many rows.) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] nan_to_num and bool arrays
Hello, Using both numpy 1.3.0 and 1.4.0rc1 I got the following exception using nan_to_num on a bool array, is that the expected behavior ? >>> import numpy >>> Z = numpy.zeros((3,3),dtype=bool) >>> numpy.nan_to_num(Z) Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.6/dist-packages/numpy/lib/type_check.py", line 374, in nan_to_num maxf, minf = _getmaxmin(y.dtype.type) File "/usr/lib/python2.6/dist-packages/numpy/lib/type_check.py", line 307, in _getmaxmin f = getlimits.finfo(t) File "/usr/lib/python2.6/dist-packages/numpy/core/getlimits.py", line 103, in __new__ raise ValueError, "data type %r not inexact" % (dtype) ValueError: data type not inexact Nicolas ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion