Re: [Numpy-discussion] Medians that ignore values
David Cournapeau ar.media.kyoto-u.ac.jp> writes: > Unfortunately, we can't, because we would loose generality: we need to > compute median on any axis, not only the last one. The proper solution > would be to have a sort/max/min/etc... which knows about nan in numpy, > which is what Chuck and I are working on ATM, > Of course - thanks for looking at this. Peter ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Peter Saffrey wrote: > > I've found that if I just cut nans from the list and use regular numpy median, > it is quicker - 10 times slower than list median, rather than 35 times slower. > Could you just wire nanmedian to do it this way? Unfortunately, we can't, because we would loose generality: we need to compute median on any axis, not only the last one. The proper solution would be to have a sort/max/min/etc... which knows about nan in numpy, which is what Chuck and I are working on ATM, cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
David Cournapeau ar.media.kyoto-u.ac.jp> writes: > Still, it is indeed really slow for your case; when I fixed nanmean and > co, I did not know much about numpy, I just wanted them to give the > right answer :) I think this can be made faster, specially for your case > (where the axis along which the median is computed is really small). > I've found that if I just cut nans from the list and use regular numpy median, it is quicker - 10 times slower than list median, rather than 35 times slower. Could you just wire nanmedian to do it this way? The only difference is that on an empty list, nanmedian gives nan, but median throws an IndexError. Below is my profiling code with this change. Sample output: $ ./arrayspeed3.py list build time: 0.16 list median time: 0.08 array nanmedian time: 0.98 Peter === from numpy import * from pylab import rand from time import clock from scipy.stats.stats import nanmedian def my_median(vallist): num_vals = len(vallist) if num_vals == 0: return nan vallist.sort() if num_vals % 2 == 1: # odd index = (num_vals - 1) / 2 return vallist[index] else: # even index = num_vals / 2 return (vallist[index] + vallist[index - 1]) / 2 numtests = 100 testsize = 1000 pointlen = 3 t0 = clock() natests = rand(numtests,testsize,pointlen) # have to start with inf because list.remove(nan) doesn't remove nan natests[natests > 0.9] = inf tests = natests.tolist() natests[natests==inf] = nan for test in tests: for point in test: while inf in point: point.remove(inf) t1 = clock() print "list build time:", t1-t0 allmedians = [] t0 = clock() for test in tests: medians = [ my_median(x) for x in test ] allmedians.append(medians) t1 = clock() print "list median time:", t1-t0 t0 = clock() namedians = [] for natest in natests: thismed = [] for point in natest: maskpoint = point[negative(isnan(point))] if len(maskpoint) > 0: med = median(maskpoint) else: med = nan thismed.append(med) namedians.append(thismed) t1 = clock() print "array nanmedian time:", t1-t0 ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
David Cournapeau wrote: > > The isnan thing is surprising, because the whole point to have a isnan > is that you can do it without branching. I checked, and numpy does use > the macro of isnan, not the function (glibc has both). Ok, see my patch #913 for this. The slowdown is actually specific to one tested machine (my P4). On my macbook (running Mac os X) and another linux machine running a core 2 duo, the performances are the same before and after the patch. I have not tested on windows, though. I also saw this mentioned: http://projects.scipy.org/scipy/numpy/ticket/241 Where Travis made the same argument as me concerning NaN. It seems that the slowdowns are not so significant, at least on the dataset I tested (isnan is actually quite fast on my core 2 duo: 10 cycles / double for large arrays on average, compared to the 60 / double on my P4 for the exact same binary). Travis, if you are reading this, would you reconsider your position on nan handling for min/max/co if we can keep reasonable speed ? cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Sun, Sep 21, 2008 at 12:56 AM, David Cournapeau < [EMAIL PROTECTED]> wrote: > David Cournapeau wrote: > > Anne Archibald wrote: > >> If users are concerned about performance, it's worth noting that on > >> some machines nans force a fallback to software floating-point > >> handling, with a corresponding very large performance hit. This > >> includes some but not all x86 (and I think x86-64) CPUs. How this > >> compares to the performance of masked arrays is not clear. > > > > I spent some time on this. In particular, for max.min, I did the > > following for the core loop (always return nan if nan is in the array): > > > > /* nan + x and x + nan are nan, where x can be anything: > > normal, > > * denormal, nan, infinite > > */ > > tmp = *((@typ@ *)i1) + *((@typ@ > > *)i2); > > if(isnan(tmp)) > > { > > *((@typ@ *)op) = > > tmp; > > } else > > { > > *((@typ@ *)op)=*((@typ@ *)i1) @OP@ *((@typ@ *)i2) ? *((@typ@ > > *)i1) : *((@typ@ *)i2); > > } > > Grr, sorry for the mangling: > > /* nan + x and x + nan are nan, where x can be anything: normal, > * denormal, nan, infinite */ > tmp = *((@typ@ *)i1) + *((@[EMAIL PROTECTED])i2); > if(isnan(tmp)) { >*((@typ@ *)op) = tmp; > } else { >*((@typ@ *)op) = *((@typ@ *)i1) @OP@ *((@typ@ *)i2) ? *((@[EMAIL > PROTECTED])i1) : > *((@typ@ *)i2); > } > You can use type instead of typ so the code is a bit easier to read. It's one of the changes I've made. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
David Cournapeau wrote: > Anne Archibald wrote: >> If users are concerned about performance, it's worth noting that on >> some machines nans force a fallback to software floating-point >> handling, with a corresponding very large performance hit. This >> includes some but not all x86 (and I think x86-64) CPUs. How this >> compares to the performance of masked arrays is not clear. > > I spent some time on this. In particular, for max.min, I did the > following for the core loop (always return nan if nan is in the array): > > /* nan + x and x + nan are nan, where x can be anything: > normal, > * denormal, nan, infinite > */ > tmp = *((@typ@ *)i1) + *((@typ@ > *)i2); > if(isnan(tmp)) > { > *((@typ@ *)op) = > tmp; > } else > { > *((@typ@ *)op)=*((@typ@ *)i1) @OP@ *((@typ@ *)i2) ? *((@typ@ > *)i1) : *((@typ@ *)i2); > } Grr, sorry for the mangling: /* nan + x and x + nan are nan, where x can be anything: normal, * denormal, nan, infinite */ tmp = *((@typ@ *)i1) + *((@[EMAIL PROTECTED])i2); if(isnan(tmp)) { *((@typ@ *)op) = tmp; } else { *((@typ@ *)op) = *((@typ@ *)i1) @OP@ *((@typ@ *)i2) ? *((@[EMAIL PROTECTED])i1) : *((@typ@ *)i2); } cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Anne Archibald wrote: > > If users are concerned about performance, it's worth noting that on > some machines nans force a fallback to software floating-point > handling, with a corresponding very large performance hit. This > includes some but not all x86 (and I think x86-64) CPUs. How this > compares to the performance of masked arrays is not clear. I spent some time on this. In particular, for max.min, I did the following for the core loop (always return nan if nan is in the array): /* nan + x and x + nan are nan, where x can be anything: normal, * denormal, nan, infinite */ tmp = *((@typ@ *)i1) + *((@typ@ *)i2); if(isnan(tmp)) { *((@typ@ *)op) = tmp; } else { *((@typ@ *)op)=*((@typ@ *)i1) @OP@ *((@typ@ *)i2) ? *((@typ@ *)i1) : *((@typ@ *)i2); } For large arrays (on my CPU, it is around 1 items), the function is 3x slower than the original one. I think the main cost is the isnan. 3x is quite expensive, so I tested a bit isnan on Linux, and it is surprisingly slow. If I use my own, trivial @define isnan(x) ((x) != (x)), it is twice faster than the glibc isnan, and then max/min are as fast as before, except they are working :) The isnan thing is surprising, because the whole point to have a isnan is that you can do it without branching. I checked, and numpy does use the macro of isnan, not the function (glibc has both). cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Sat, Sep 20, 2008 at 11:02 AM, Jake Harris <[EMAIL PROTECTED]>wrote: > > Because you're always working with probabilities, there is almost always no > ambiguity...whenever NaN is encounter, 0 is what is desired. > ...of course, division presents a good counterexample. > Bad idea? > So probably. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
(sorry for starting a new thread...I wasn't subscribed yet) Stéfan van der Walt wrote the following on 09/19/2008 02:10 AM: > > So am I. In all my use cases, NaNs indicate trouble. > I can provide a use case where NaNs do not indicate trouble. In fact, they need to be treated as 0. For example, As x->0 in y(x) = x log x, it is traditional (eg in information theory) to take y(0) = 0. So if one is multiplying arrays and 0 * -inf is encountered, the desirable behavior is that we get 0. Because you're always working with probabilities, there is almost always no ambiguity...whenever NaN is encounter, 0 is what is desired. Perhaps numpy can have some method by which a user can specify how NaNis treated (in addition to ignore, raise, etc). Good idea? Bad idea? ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Charles R Harris wrote: > > > > I would be happy to implement nan sorts if someone can provide me with > a portable and easy way to detect nans for single, double, and long > double floats. And not have it fail if the architecture doesn't > support nans. I think getting all the needed nan detection and setup > in place is the first step for anything else. I guess you mean when isnan is available but broken, since we do not support platforms without IEEE 754 support ? I want to take care of this for my umathmodule cleaning (all the configuration checks/replacements are in place; if we want to be paranoid, we could check whether isnan works for all types if found on the system). cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
2008/9/19 Eric Firing <[EMAIL PROTECTED]>: > Pierre GM wrote: > >>> It seems to me that there are pragmatic reasons >>> why people work with NaNs for missing values, >>> that perhaps shd not be dismissed so quickly. >>> But maybe I am overlooking a simple solution. >> >> nansomething solutions tend to be considerably faster, that might be one >> reason. A lack of visibility of numpy.ma could be a second. In any case, I >> can't but agree with other posters: a NaN in an array usually means something >> went astray. > > Additional reasons for using nans: > > 1) years of experience with Matlab, in which using nan for missing > values is the standard idiom. Users are already retraining to use zero-based indexing; I don't think asking them to use a full-featured masked array package is an unreasonable retraining burden, particularly since this idiom breaks as soon as they want to work with arrays of integers or records. > 2) convenient interfacing with extension code in C or C++. > > The latter is a factor in the present use of nan in matplotlib; using > nan for missing values in an array passed into extension code saves > having to pass and process a second (mask) array. It is fast and simple. How hard is it to pass an array where the masked values have been filled with nans? It's certainly easy to go the other way (mask all nans). I think this is less painful than supporting two differently-featured sets of functions for dealing with arrays containing some invalid values. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Sat, Sep 20, 2008 at 01:15, Charles R Harris <[EMAIL PROTECTED]> wrote: > I would be happy to implement nan sorts if someone can provide me with a > portable and easy way to detect nans for single, double, and long double > floats. And not have it fail if the architecture doesn't support nans. I > think getting all the needed nan detection and setup in place is the first > step for anything else. We explicitly only support IEEE-754 architectures, so we are always on an architecture that supports NaNs. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Fri, Sep 19, 2008 at 11:41 PM, David Cournapeau < [EMAIL PROTECTED]> wrote: > Anne Archibald wrote: > > > > I, on the other hand, was making specifically that suggestion: users > > should not use nans to indicate missing values. Users should use > > masked arrays to indicate missing values. > > I agree it is the nicest solution in theory, but I think it is > impractical (as mentioned by Eric Firing in his email). > > > > > This part I pretty much agree with. > > I can't really see which one is better (failing or returning NaN for > sort/min/max and their sort counterpat), or if we should let the choice > be left to the user. I am fine with both, and they both require the same > amount of work. > > > Or we can make them behave drastically differently. > > Masked arrays clearly need to be able to handle masked values flexibly > > and explicitly. So I think nans should be handled simply and > > conservatively: propagate them if possible, raise if not. > > I agree about this behavior being the default. I just think that for a > couple of functions, we could we give either separate functions, or > additional arguments to existing functions to ignore them: I am thinking > about min/max/sort and their arg* counterpart, because those are really > basic, and because we already have nanmean/nanstd/nanmedian (e.g. having > a nansort would help for nanmean to be much faster). > I would be happy to implement nan sorts if someone can provide me with a portable and easy way to detect nans for single, double, and long double floats. And not have it fail if the architecture doesn't support nans. I think getting all the needed nan detection and setup in place is the first step for anything else. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Anne Archibald wrote: > > I, on the other hand, was making specifically that suggestion: users > should not use nans to indicate missing values. Users should use > masked arrays to indicate missing values. I agree it is the nicest solution in theory, but I think it is impractical (as mentioned by Eric Firing in his email). > > This part I pretty much agree with. I can't really see which one is better (failing or returning NaN for sort/min/max and their sort counterpat), or if we should let the choice be left to the user. I am fine with both, and they both require the same amount of work. > Or we can make them behave drastically differently. > Masked arrays clearly need to be able to handle masked values flexibly > and explicitly. So I think nans should be handled simply and > conservatively: propagate them if possible, raise if not. I agree about this behavior being the default. I just think that for a couple of functions, we could we give either separate functions, or additional arguments to existing functions to ignore them: I am thinking about min/max/sort and their arg* counterpart, because those are really basic, and because we already have nanmean/nanstd/nanmedian (e.g. having a nansort would help for nanmean to be much faster). > > If users are concerned about performance, it's worth noting that on > some machines nans force a fallback to software floating-point > handling, with a corresponding very large performance hit. I was more concerned with the cost of treating NaN when you do not have NaN in your array when you have to treat for NaN explicitely (everything involving comparison). But I don't see any obvious way to avoid that cost, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
2008/9/19 David Cournapeau <[EMAIL PROTECTED]>: > I guess my formulation was poor: I never use NaN as missing values > because I never use missing values, which is why I wanted the opinion of > people who use NaN in a different manner (because I don't have a good > idea on how those people would like to see numpy behave). I was > certainly not arguing they should not be use for the purpose of missing > value. I, on the other hand, was making specifically that suggestion: users should not use nans to indicate missing values. Users should use masked arrays to indicate missing values. > The problem with NaN is that you cannot mix the missing value behavior > and the error behavior. Dealing with them in a consistent manner is > difficult. Because numpy is a general numerical computation tool, I > think that NaN should be propagated and never ignored *by default*. If > you have NaN because of divide by 0, etc... it should not be ignored at > all. But if you want it to ignore, then numpy should make it possible: > >- max, min: should return NaN if NaN is in the array, or maybe even > fail ? >- argmax, argmin ? >- sort: should fail ? >- mean, std, variance: should return Nan >- median: should fail (to be consistent if sort fails) ? Should > return NaN ? This part I pretty much agree with. > We could then add an argument to failing functions to tell them either > to ignore NaN/put them at some special location (like R does, for > example). The ones I am not sure are median and argmax/argmin. For > median, failing when sort does is consistent; but this can break a lot > of code. For argmin/argmax, failing is the most logical, but OTOH, > making argmin/argmax failing and not max/min is not consistent either. > Breaking the code is maybe not that bad because currently, neither > max/min nor argmax/argmin nor sort does return a meaningful function. > Does that sound reasonable to you ? The problem with this approach is that all those decisions need to be made and all that code needs to be implemented for masked arrays. In fact I suspect that it has already been done in that case. So really what you are suggesting here is that we duplicate all this effort to implement the same functions for nans as we have for masked arrays. It's important, too, that the masked array implementation and the nan implementation behave the same way, or users will become badly confused. Who gets the task of keeping the two implementations in sync? The current situation is that numpy has two ways to indicate bad data for floating-point arrays: nans and masked arrays. We can't get rid of either: nans appear on their own, and masked arrays are the only way to mark bad data in non-floating-point arrays. We can try to make them behave the same, which will be a lot of work to provide redundant capabilities. Or we can make them behave drastically differently. Masked arrays clearly need to be able to handle masked values flexibly and explicitly. So I think nans should be handled simply and conservatively: propagate them if possible, raise if not. If users are concerned about performance, it's worth noting that on some machines nans force a fallback to software floating-point handling, with a corresponding very large performance hit. This includes some but not all x86 (and I think x86-64) CPUs. How this compares to the performance of masked arrays is not clear. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Robert Kern wrote: > On Fri, Sep 19, 2008 at 22:25, David Cournapeau > <[EMAIL PROTECTED]> wrote: > > > How, exactly? ndarray.min() is the where the implementation is. > Ah, I keep forgetting those are implemented in the array object, sorry for that. Now I understand Stefan point. Do I understand correctly that we should then do: - implement a min/max NaN aware for every float type (real and complex) in umathmodule.c, which ignores nan (called @[EMAIL PROTECTED], etc...) - fix the current min/max to propagate NaN instead of giving broken result - How to do the dispatching ? Having PyArray_Min and PyArray_NanMin sounds the easiest (we don't change any C api, only add an argument to the python-callable function min, in array_min method ?) Or am I missing something ? If this is the right way to fix it I am willing to do it (we still have to agree on the default behavior first). I am not really familiar with sort module, but maybe it is really similar to min/max case. cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Alan G Isaac wrote: > On 9/19/2008 4:35 AM David Cournapeau apparently wrote: >> I never use NaN as missing value > > What do you use? > > Recently I needed to fill a 2d array with values > from computations that could "go wrong". > I created an array of NaN and then replaced > the elements where the computation produced > a useful value. I then applied ``nanmax``, > to get the maximum of the useful values. > > What should I have done? I guess my formulation was poor: I never use NaN as missing values because I never use missing values, which is why I wanted the opinion of people who use NaN in a different manner (because I don't have a good idea on how those people would like to see numpy behave). I was certainly not arguing they should not be use for the purpose of missing value. The problem with NaN is that you cannot mix the missing value behavior and the error behavior. Dealing with them in a consistent manner is difficult. Because numpy is a general numerical computation tool, I think that NaN should be propagated and never ignored *by default*. If you have NaN because of divide by 0, etc... it should not be ignored at all. But if you want it to ignore, then numpy should make it possible: - max, min: should return NaN if NaN is in the array, or maybe even fail ? - argmax, argmin ? - sort: should fail ? - mean, std, variance: should return Nan - median: should fail (to be consistent if sort fails) ? Should return NaN ? We could then add an argument to failing functions to tell them either to ignore NaN/put them at some special location (like R does, for example). The ones I am not sure are median and argmax/argmin. For median, failing when sort does is consistent; but this can break a lot of code. For argmin/argmax, failing is the most logical, but OTOH, making argmin/argmax failing and not max/min is not consistent either. Breaking the code is maybe not that bad because currently, neither max/min nor argmax/argmin nor sort does return a meaningful function. Does that sound reasonable to you ? cheer, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Fri, Sep 19, 2008 at 22:25, David Cournapeau <[EMAIL PROTECTED]> wrote: > Stéfan van der Walt wrote: >> >> Why shouldn't we have "nanmin"-like behaviour for the C min itself? >> > > Ah, I was not arguing we should not do it in C, but rather we did not > have to do in C. The current behavior for nan with functions relying on > ordering is broken; if someone prefer fixing it in C, great. But I was > guessing more people could fix it using python, that's all. How, exactly? ndarray.min() is the where the implementation is. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Stéfan van der Walt wrote: > > Why shouldn't we have "nanmin"-like behaviour for the C min itself? > Ah, I was not arguing we should not do it in C, but rather we did not have to do in C. The current behavior for nan with functions relying on ordering is broken; if someone prefer fixing it in C, great. But I was guessing more people could fix it using python, that's all. I opened a bug for min/max and nan, this should be fixed for 1.3.0, maybe 1.2.1 too. cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Friday 19 September 2008 17:25:53 Alan G Isaac wrote: > On 9/19/2008 4:54 PM Pierre GM apparently wrote: > > Another way is > > ma.array(np.empty(yourshape,yourdtype), mask=True) > > which should work with earlier versions. > > Seems like ``mask`` would be a natural > keyword for ``ma.empty``? Not a bad idea. I'll plug that in. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On 9/19/2008 4:54 PM Pierre GM apparently wrote: > Another way is > ma.array(np.empty(yourshape,yourdtype), mask=True) > which should work with earlier versions. Seems like ``mask`` would be a natural keyword for ``ma.empty``? Thanks, Alan Isaac ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Friday 19 September 2008 16:35:23 Alan G Isaac wrote: > On 9/19/2008 4:54 AM Pierre GM apparently wrote: > > I know. I was more dreading the time when MaskedArrays would have to be > > ported to C. In a way, that would probably simplify a few issues. OTOH, I > > don't really see it happening any time soon. > > Is this possibly a GSoC sized project? > Alan Isaac If we can find someone who knows C and masked arrays well, that could be. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Friday 19 September 2008 16:28:34 Alan G Isaac wrote: > On 9/19/2008 11:46 AM Pierre GM apparently wrote: > a.mask=True > This is great, but is apparently > new behavior as of NumPy 1.2? I'm not sure, sorry. Another way is ma.array(np.empty(yourshape,yourdtype), mask=True) which should work with earlier versions. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On 9/19/2008 4:54 AM Pierre GM apparently wrote: > I know. I was more dreading the time when MaskedArrays would have to be > ported > to C. In a way, that would probably simplify a few issues. OTOH, I don't > really see it happening any time soon. Is this possibly a GSoC sized project? Alan Isaac ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On 9/19/2008 11:46 AM Pierre GM apparently wrote: a.mask=True This is great, but is apparently new behavior as of NumPy 1.2? Alan Isaac ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On 9/19/2008 1:58 PM Robert Kern apparently wrote: > there are no objects inside non-object arrays. There is > nothing with identity inside the arrays to compare against. Got it. Thanks. Alan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Friday 19 September 2008 14:01:13 Eric Firing wrote: > Pierre GM wrote: > 2) convenient interfacing with extension code in C or C++. > > The latter is a factor in the present use of nan in matplotlib; using > nan for missing values in an array passed into extension code saves > having to pass and process a second (mask) array. It is fast and simple. As long as you deal with floats. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Pierre GM wrote: >> It seems to me that there are pragmatic reasons >> why people work with NaNs for missing values, >> that perhaps shd not be dismissed so quickly. >> But maybe I am overlooking a simple solution. > > nansomething solutions tend to be considerably faster, that might be one > reason. A lack of visibility of numpy.ma could be a second. In any case, I > can't but agree with other posters: a NaN in an array usually means something > went astray. Additional reasons for using nans: 1) years of experience with Matlab, in which using nan for missing values is the standard idiom. 2) convenient interfacing with extension code in C or C++. The latter is a factor in the present use of nan in matplotlib; using nan for missing values in an array passed into extension code saves having to pass and process a second (mask) array. It is fast and simple. Eric ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Fri, Sep 19, 2008 at 11:34, Alan G Isaac <[EMAIL PROTECTED]> wrote: > On 9/19/2008 12:02 PM Peter Saffrey apparently wrote: >> >>> a = array([1,2,nan]) >> >>> nan in a >> False > > Huh. I'm inclined to call this a bug, > since normal Python behavior is that > ``in`` should check for identity:: > >>>> xl = [1.,np.nan] >>>> np.nan in xl >True Except that there are no objects inside non-object arrays. There is nothing with identity inside the arrays to compare against. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On 9/19/2008 12:02 PM Peter Saffrey apparently wrote: > >>> a = array([1,2,nan]) > >>> nan in a > False Huh. I'm inclined to call this a bug, since normal Python behavior is that ``in`` should check for identity:: >>> xl = [1.,np.nan] >>> np.nan in xl True Alan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Fri, Sep 19, 2008 at 1:11 AM, David Cournapeau < [EMAIL PROTECTED]> wrote: > Anne Archibald wrote: > > > > Well, for example, you might ask that all the non-nan elements be in > > order, even if you don't specify where the nan goes. > > > Ah, there are two problems, then: >- sort >- how median use sort. > > For sort, I don't know how sort speed would be influenced by treating > nan. In a way, calling sort with nan inside is a user error (if you take > the POV nan are not comparable), but nan are used for all kind of > purpose, used <- misused. Using nan to flag anything but a numerical error is going to cause problems. It wouldn't be too hard to implement nansorts, they just need a real comparison function so that all the nans end up at on end or the other. I don't know that that would make medians any easier, though. Are the nans part of the data set? A nansearchsorted would probably be needed also. If this functionality is added, the best way might be something like kind='nanquicksort'. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On 9/19/2008 11:46 AM Pierre GM apparently wrote: > No, but you may do the opposite: just start with an array completely masked, > and unmasked it as you need: Very useful example. I did not understand this possibility. Alan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On 9/19/2008 11:46 AM Pierre GM apparently wrote: > You can't compare NaNs to anything. How do you know this np.miss is a masked > value, when np.sqrt(-1.) is NaN ? I thought you could use ``is``. E.g., >>> np.nan == np.nan False >>> np.nan is np.nan True Alan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Friday 19 September 2008 12:02:08 Peter Saffrey wrote: > Alan G Isaac american.edu> writes: > > Recently I needed to fill a 2d array with values > > from computations that could "go wrong". > Should I take the earlier advice and switch to masked arrays? > > Peter Yes. As you've noticed, you can't compare nans (after all, nans are not numbers...), which limits their use. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Alan G Isaac american.edu> writes: > Recently I needed to fill a 2d array with values > from computations that could "go wrong". > I created an array of NaN and then replaced > the elements where the computation produced > a useful value. I then applied ``nanmax``, > to get the maximum of the useful values. > I'm glad you posted this, because this is exactly the method I'm using. How do you detect whether there are still any missing spots in your array? nan has some rather unfortunate properties: >>> from numpy import * >>> a = array([1,2,nan]) >>> nan in a False >>> nan == nan False Should I take the earlier advice and switch to masked arrays? Peter ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Friday 19 September 2008 11:36:17 Alan G Isaac wrote: > On 9/19/2008 11:09 AM Stefan Van der Walt apparently wrote: > > Masked arrays. Using NaN's for missing values is dangerous. You may > > do some operation, which generates invalid results, and then you have > > a mixed bag of missing and invalid values. > > That rather evades my full question, I think? > > In the case I mentioned, > I am filling an array inside a loop, > and the possible fill values are not constrained. > So I cannot mask based on value, > and I cannot mask based on position > (at least until after the computations are complete). No, but you may do the opposite: just start with an array completely masked, and unmasked it as you need: Say, you have 4x5 array, and want to unmask (0,0), (1,2), (3,4) >>> a = ma.empty((4,5), dtype=float) >>> a.mask=True >>> a[0,0] = 0 >>> a[1,2]=1 >>> a[3,4]=3 >>>a masked_array(data = [[0.0 -- -- -- --] [-- -- 1.0 -- --] [-- -- -- -- --] [-- -- -- -- 3.0]], mask = [[False True True True True] [ True True False True True] [ True True True True True] [ True True True True False]], fill_value=1e+20) >>>a.max(axis=0) masked_array(data = [0.0 -- 1.0 -- 3.0], mask = [False True False True False], fill_value=1e+20) > It seems to me that there are pragmatic reasons > why people work with NaNs for missing values, > that perhaps shd not be dismissed so quickly. > But maybe I am overlooking a simple solution. nansomething solutions tend to be considerably faster, that might be one reason. A lack of visibility of numpy.ma could be a second. In any case, I can't but agree with other posters: a NaN in an array usually means something went astray. > PS I confess I do not understand NaNs. > E.g., why could there not be a value np.miss > that would be a NaN that represents a missing value? You can't compare NaNs to anything. How do you know this np.miss is a masked value, when np.sqrt(-1.) is NaN ? ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On 9/19/2008 11:09 AM Stefan Van der Walt apparently wrote: > Masked arrays. Using NaN's for missing values is dangerous. You may > do some operation, which generates invalid results, and then you have > a mixed bag of missing and invalid values. That rather evades my full question, I think? In the case I mentioned, I am filling an array inside a loop, and the possible fill values are not constrained. So I cannot mask based on value, and I cannot mask based on position (at least until after the computations are complete). It seems to me that there are pragmatic reasons why people work with NaNs for missing values, that perhaps shd not be dismissed so quickly. But maybe I am overlooking a simple solution. Alan PS I confess I do not understand NaNs. E.g., why could there not be a value np.miss that would be a NaN that represents a missing value? Are all NaNs already assigned standard meanings? ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On 19 Sep 2008, at 16:07 , Alan G Isaac wrote: > On 9/19/2008 4:35 AM David Cournapeau apparently wrote: >> I never use NaN as missing value > > What do you use? Masked arrays. Using NaN's for missing values is dangerous. You may do some operation, which generates invalid results, and then you have a mixed bag of missing and invalid values. Cheers Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On 9/19/2008 4:35 AM David Cournapeau apparently wrote: > I never use NaN as missing value What do you use? Recently I needed to fill a 2d array with values from computations that could "go wrong". I created an array of NaN and then replaced the elements where the computation produced a useful value. I then applied ``nanmax``, to get the maximum of the useful values. What should I have done? Thanks, Alan Isaac ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
2008/9/19 David Cournapeau <[EMAIL PROTECTED]>: > But cannot this be fixed at the python level of the max function ? I Why shouldn't we have "nanmin"-like behaviour for the C min itself? I'd rather have a specialised function to deal with the rare kinds of datasets where NaNs are guaranteed never to occur. > But on my numpy, it looks like nan breaks min/max, they are not ignored: Yes, that's the problem. Cheers Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Peter Saffrey wrote: > > I've posted my test code below, which gives me the results: > > $ ./arrayspeed3.py > list build time: 0.01 > list median time: 0.01 > array nanmedian time: 0.36 > > I must have done something wrong to hobble nanmedian in this way... I'm quite > new to numpy, so feel free to point out any obviously egregious errors. Ok: it is "pathological", and can be done better :) First: > for natest in natests: > thismed = nanmedian(natest, axis=1) > namedians.append(thismed) ^^^ Here, you are doing nanmedian on a direction with 3 elements: this will be slow in numpy, because numpy involves some relatively heavy machinery to run on arrays. The machinery pays off for 'big' arrays, but for really small arrays like here, list can (and often are) be faster. Still, it is indeed really slow for your case; when I fixed nanmean and co, I did not know much about numpy, I just wanted them to give the right answer :) I think this can be made faster, specially for your case (where the axis along which the median is computed is really small). I opened a bug: http://scipy.org/scipy/scipy/ticket/740 cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Peter Saffrey wrote: > Pierre GM gmail.com> writes: > >> I think there were some changes on the C side of numpy between 1.0 and 1.1, >> you may have to recompile scipy and matplotlib from sources. What versions >> are you using for those 2 packages ? >> > > $ dpkg -l | grep scipy > ii python-scipy 0.6.0-8ubuntu1 > > scientific tools for Python > > $ dpkg -l | grep matplotlib > ii python-matplotlib 0.91.2-0ubuntu1 > > Python based plotting system in a style simi > ii python-matplotlib-data 0.91.2-0ubuntu1 > > Python based plotting system (data package) > ii python-matplotlib-doc 0.91.2-0ubuntu1 > > Python based plotting system (documentation If you build numpy from sources, please don't install it into /usr ! It will more than likely break everything which depends on numpy, as well as your debian installation (because you will overwrite packages handled by dpkg). You should really install in a local directory, outside /usr. You will have to install scipy and matplotlib in any case, too. cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Pierre GM gmail.com> writes: > I think there were some changes on the C side of numpy between 1.0 and 1.1, > you may have to recompile scipy and matplotlib from sources. What versions > are you using for those 2 packages ? > $ dpkg -l | grep scipy ii python-scipy 0.6.0-8ubuntu1 scientific tools for Python $ dpkg -l | grep matplotlib ii python-matplotlib 0.91.2-0ubuntu1 Python based plotting system in a style simi ii python-matplotlib-data 0.91.2-0ubuntu1 Python based plotting system (data package) ii python-matplotlib-doc 0.91.2-0ubuntu1 Python based plotting system (documentation Peter ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
David Cournapeau ar.media.kyoto-u.ac.jp> writes: > It may be that nanmedian is slow. But I would sincerly be surprised if > it were slower than python list, except for some pathological cases, or > maybe a bug in nanmedian. What do your data look like ? (size, number of > nan, etc...) > I've posted my test code below, which gives me the results: $ ./arrayspeed3.py list build time: 0.01 list median time: 0.01 array nanmedian time: 0.36 I must have done something wrong to hobble nanmedian in this way... I'm quite new to numpy, so feel free to point out any obviously egregious errors. Peter === from numpy import array, nan, inf from pylab import rand from time import clock from scipy.stats.stats import nanmedian import pdb _pdb = pdb.Pdb() breakpoint = _pdb.set_trace def my_median(vallist): num_vals = len(vallist) vallist.sort() if num_vals % 2 == 1: # odd index = (num_vals - 1) / 2 return vallist[index] else: # even index = num_vals / 2 return (vallist[index] + vallist[index - 1]) / 2 numtests = 100 testsize = 100 pointlen = 3 t0 = clock() natests = rand(numtests,testsize,pointlen) # have to start with inf because list.remove(nan) doesn't remove nan natests[natests > 0.9] = inf tests = natests.tolist() natests[natests==inf] = nan for test in tests: for point in test: if inf in point: point.remove(inf) t1 = clock() print "list build time:", t1-t0 t0 = clock() allmedians = [] for test in tests: medians = [ my_median(x) for x in test ] allmedians.append(medians) t1 = clock() print "list median time:", t1-t0 t0 = clock() namedians = [] for natest in natests: thismed = nanmedian(natest, axis=1) namedians.append(thismed) t1 = clock() print "array nanmedian time:", t1-t0 ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Peter Saffrey wrote: > > I rejoiced when I saw this answer, because it looks like a function I can just > drop in and it works. Unfortunately, nanmedian seems to be quite a bit slower > than just using lists (ignoring nan values from my experiments) and a > home-brew > implementation of median. I was mostly using numpy for speed... It may be that nanmedian is slow. But I would sincerly be surprised if it were slower than python list, except for some pathological cases, or maybe a bug in nanmedian. What do your data look like ? (size, number of nan, etc...) I quickly benchmarked on relatively small dataset (a few thousand samples with a few random nan), and nanmedian is "only" a few times slower than median. > > I would like to try the masked array approach, but the Ubuntu packages for > scipy > and matplotlib depend on numpy. Does anybody know whether I can naively do > "sudo > python setup.py install" on a more modern numpy without disturbing scipy and > matplotlib, or do I need to uninstall all three packages and install them > manually from source? My advice would be to never ever install a package from source into /usr. This will cause trouble. The way I do it is to install everything from sources into $HOME/local (of course, any directory you have regular write access to will do). > > On my 64 bit machine, the Ubuntu numpy package is even more out of date: > > $ dpkg -l | grep numpy > ii python-numpy 1:1.0.4-6ubuntu3 > > Does anybody know why this is? Yes, ubuntu updates every 6 months, the last time in last April. Numpy 1.1.0 (the first version after 1.0.4) was released in May. Also, Ubuntu updates from debian, general 4-5 months before ubuntu release data. So even if debian were to release a package the day we release a new package, Ubuntu will be one year late. I personally think that the solution would be to provide our own .deb up to date, but this is a lot of work. I think Ondrej did some work related to that; recent tools like opensuse build service and launchpad ppa makes it somewhat a bit easier, too (for the build part, at least; you still need to know how to build rpm/deb). cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Friday 19 September 2008 05:51:55 Peter Saffrey wrote: > I would like to try the masked array approach, but the Ubuntu packages for > scipy and matplotlib depend on numpy. Does anybody know whether I can > naively do "sudo python setup.py install" on a more modern numpy without > disturbing scipy and matplotlib, or do I need to uninstall all three > packages and install them manually from source? I think there were some changes on the C side of numpy between 1.0 and 1.1, you may have to recompile scipy and matplotlib from sources. What versions are you using for those 2 packages ? ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
David Cournapeau ar.media.kyoto-u.ac.jp> writes: > You can use nanmean (from scipy.stats): > I rejoiced when I saw this answer, because it looks like a function I can just drop in and it works. Unfortunately, nanmedian seems to be quite a bit slower than just using lists (ignoring nan values from my experiments) and a home-brew implementation of median. I was mostly using numpy for speed... I would like to try the masked array approach, but the Ubuntu packages for scipy and matplotlib depend on numpy. Does anybody know whether I can naively do "sudo python setup.py install" on a more modern numpy without disturbing scipy and matplotlib, or do I need to uninstall all three packages and install them manually from source? On my 64 bit machine, the Ubuntu numpy package is even more out of date: $ dpkg -l | grep numpy ii python-numpy 1:1.0.4-6ubuntu3 Does anybody know why this is? I might be willing to help bring the repository up to date, if anybody can give me pointers on how to do this. Peter ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Stéfan van der Walt wrote: > > So am I. In all my use cases, NaNs indicate trouble. Yes, so I would like to have the opinion of people with other usage than ours. > > Because we have x.max() silently ignoring NaNs, which causes a lot of > head-scratching, swearing and failed experiments. But cannot this be fixed at the python level of the max function ? I think it is expected to have the low level C functions to ignore/be bogus if you have Nan. After all, if you use sort of the libc with nan, or sort in C++ for a vector of double, it will not work either. But on my numpy, it looks like nan breaks min/max, they are not ignored: np.min(np.array([0, np.nan, 1])) -> 1.0 # bogus np.min(np.array([0, np.nan, 2])) -> 2.0 # ok np.min(np.array([0, np.nan, -1])) -> -1.0 # ok np.max(np.array([0, np.nan, -1])) > -1.0 # bogus Which only makes sense when you guess how they are implemented in C... cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
2008/9/19 David Cournapeau <[EMAIL PROTECTED]>: > Stéfan van der Walt wrote: >> >> I agree completely. > > Me too, but I am extremely biased toward nan is always bogus by my own > usage of numpy/scipy (I never use NaN as missing value, and nan is > always caused by divide by 0 and co). So am I. In all my use cases, NaNs indicate trouble. > Why ? Because we have x.max() silently ignoring NaNs, which causes a lot of head-scratching, swearing and failed experiments. Cheers Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Friday 19 September 2008 04:31:38 David Cournapeau wrote: > Pierre GM wrote: > > That said, numpy.nanmin, numpy.nansum... don't come with the heavy > > machinery of numpy.ma, and are therefore faster. > > I'm really going to have to learn C. > > FWIW, nanmean/nanmean/etc... are written in python, I know. I was more dreading the time when MaskedArrays would have to be ported to C. In a way, that would probably simplify a few issues. OTOH, I don't really see it happening any time soon. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Stéfan van der Walt wrote: > > I agree completely. Me too, but I am extremely biased toward nan is always bogus by my own usage of numpy/scipy (I never use NaN as missing value, and nan is always caused by divide by 0 and co). I like that sort raise an exception by default with NaN: it breaks the API, OTOH, I can't see a good use of sort with NaN since sort does not sort values in that case: we would break the API of a broken function. > > Unfortunately, this needs to happen at the C level. Why ? cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Pierre GM wrote: > That said, numpy.nanmin, numpy.nansum... don't come with the heavy machinery > of numpy.ma, and are therefore faster. > I'm really going to have to learn C. > FWIW, nanmean/nanmean/etc... are written in python, cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Friday 19 September 2008 04:10:24 Anne Archibald wrote: > (is there a convenience > function that makes a masked array with a mask everywhere the data is > nan?). numpy.ma.fix_invalid, that masks your Nans and Infs and sets the underlying data to some filling value. That way, you don't carry NaNs/Infs along. > I am assuming that appropriate masked sort/amax/maximum/mean/median > exist already. They're definitely needed, so how much effort is it > worth putting in to duplicate that functionality with nans instead of > masked elements? My opinion indeed. The MaskedArray.sort method has an extra flag that lets you decide whether you want masked data at the beginning or the end of your array. That said, numpy.nanmin, numpy.nansum... don't come with the heavy machinery of numpy.ma, and are therefore faster. I'm really going to have to learn C. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
2008/9/19 Anne Archibald <[EMAIL PROTECTED]>: > I think the numpy attitude to nans should be that they are unexpected > bogus values that signify that something went wrong with the > calculation somewhere. They can be left in place for most operations, > but any operation that depends on the value should (ideally) return > nan, or failing that, raise an exception. I agree completely. > I am assuming that appropriate masked sort/amax/maximum/mean/median > exist already. They're definitely needed, so how much effort is it > worth putting in to duplicate that functionality with nans instead of > masked elements? Unfortunately, this needs to happen at the C level. Is anyone reading this willing to spend some time taking care of the issue? It's an important one. Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
2008/9/19 Pierre GM <[EMAIL PROTECTED]>: > On Friday 19 September 2008 03:11:05 David Cournapeau wrote: > >> Hm, I am always puzzled when I think about nan handling :) It always >> seem there is not good answer. > > Which is why we have masked arrays, of course ;) I think the numpy attitude to nans should be that they are unexpected bogus values that signify that something went wrong with the calculation somewhere. They can be left in place for most operations, but any operation that depends on the value should (ideally) return nan, or failing that, raise an exception. (If users want exceptions all the time, that's what seterr is for.) If people want to flag bad data, let's tell them to use masked arrays. So by this rule amax/maximum/mean/median should all return nan when there's a nan in their input; I don't think it's reasonable for sort to return an array full of nans, so I think its default behaviour should be to raise an exception if there's a nan. It's valuable (for example in median) to be able to sort them all to the end, but I don't think this should be the default. If people want nanmin, I would be tempted to tell them to use masked arrays (is there a convenience function that makes a masked array with a mask everywhere the data is nan?). I am assuming that appropriate masked sort/amax/maximum/mean/median exist already. They're definitely needed, so how much effort is it worth putting in to duplicate that functionality with nans instead of masked elements? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Friday 19 September 2008 03:11:05 David Cournapeau wrote: > Hm, I am always puzzled when I think about nan handling :) It always > seem there is not good answer. Which is why we have masked arrays, of course ;) ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Anne Archibald wrote: > > Well, for example, you might ask that all the non-nan elements be in > order, even if you don't specify where the nan goes. Ah, there are two problems, then: - sort - how median use sort. For sort, I don't know how sort speed would be influenced by treating nan. In a way, calling sort with nan inside is a user error (if you take the POV nan are not comparable), but nan are used for all kind of purpose, hence maybe having a nansort would be nice. OTOH (I took a look at this when I fixed nanmean and co a while ago in scipy), matlab and R treat sort differently than mean and co. I am puzzled by this: - R sort arrays with nan as you want by default (nan can be ignored, put in front or at the end of the array). - R max does not ignore nan by default. - R median does not ignore median by default. I don't know how to set a consistency here. I don't think we are consistent by having max/amax/etc... ignoring nan but sort not ignoring it. OTOH, R is not consistent either. > > You can always just set numpy to raise an exception whenever it comes > across a nan. In fact, apart from the difficulty of correctly frobbing > numpy's floating-point handling, how reasonable is it for (say) median > to just run as it is now, but if an exception is thrown, fall back to > a nan-aware version? It would be different from the current nan vs usual function behavior for median/mean/etc...: why should sort handle nan by default, but not the other functions ? For mean/std/variance/median, if having nan is an error, you see it in the result (once we fix our median), but not with sort. Hm, I am always puzzled when I think about nan handling :) It always seem there is not good answer. David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
2008/9/19 David Cournapeau <[EMAIL PROTECTED]>: > Anne Archibald wrote: >> >> That was in amax/amin. Pretty much every other function that does >> comparisons needs to be fixed to work with nans. In some cases it's >> not even clear how: where should a sort put the nans in an array? > > The problem is more on how the functions use sort than sort itself in > the case of median. There can't be a 'good' way to put nan in soft, for > example, since nans cannot be ordered. Well, for example, you might ask that all the non-nan elements be in order, even if you don't specify where the nan goes. > I don't know about the best strategy: either we fix every function using > comparison, handling nan as a special case as you mentioned, or there > may be a more clever thing to do to avoid special casing everywhere. I > don't have a clear idea of how many functions rely on ordering in numpy. You can always just set numpy to raise an exception whenever it comes across a nan. In fact, apart from the difficulty of correctly frobbing numpy's floating-point handling, how reasonable is it for (say) median to just run as it is now, but if an exception is thrown, fall back to a nan-aware version? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Anne Archibald wrote: > > That was in amax/amin. Pretty much every other function that does > comparisons needs to be fixed to work with nans. In some cases it's > not even clear how: where should a sort put the nans in an array? The problem is more on how the functions use sort than sort itself in the case of median. There can't be a 'good' way to put nan in soft, for example, since nans cannot be ordered. I don't know about the best strategy: either we fix every function using comparison, handling nan as a special case as you mentioned, or there may be a more clever thing to do to avoid special casing everywhere. I don't have a clear idea of how many functions rely on ordering in numpy. cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
2008/9/18 David Cournapeau <[EMAIL PROTECTED]>: > Anne Archibald wrote: >> >> I don't think I agree: >> >> In [4]: np.median([1,3,nan]) >> Out[4]: 3.0 >> >> In [5]: np.median([1,nan,3]) >> Out[5]: nan >> >> In [6]: np.median([nan,1,3]) >> Out[6]: 1.0 >> > > I was referring to the fact that if you have nan in your array, you > should use nanmean if you want to ignore them correctly. Now, the > different behavior depending on the order of items in the arrays is > indeed buggy, I thought this was fixed. That was in amax/amin. Pretty much every other function that does comparisons needs to be fixed to work with nans. In some cases it's not even clear how: where should a sort put the nans in an array? I suppose some enterprising soul should write up a fileful of tests making sure that all numpy's functions do something sane with arrays containing nans... Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Anne Archibald wrote: > > I don't think I agree: > > In [4]: np.median([1,3,nan]) > Out[4]: 3.0 > > In [5]: np.median([1,nan,3]) > Out[5]: nan > > In [6]: np.median([nan,1,3]) > Out[6]: 1.0 > I was referring to the fact that if you have nan in your array, you should use nanmean if you want to ignore them correctly. Now, the different behavior depending on the order of items in the arrays is indeed buggy, I thought this was fixed. cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
2008/9/18 David Cournapeau <[EMAIL PROTECTED]>: > Peter Saffrey wrote: >> >> Is this the correct behavior for median with nan? > > That's the expected behavior, at least :) (this is also the expected > behavior of most math packages I know, including matlab and R, so this > should not be too surprising if you have used those). I don't think I agree: In [4]: np.median([1,3,nan]) Out[4]: 3.0 In [5]: np.median([1,nan,3]) Out[5]: nan In [6]: np.median([nan,1,3]) Out[6]: 1.0 I think the expected behaviour would be for all of these to return nan. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Peter Saffrey wrote: > > Is this the correct behavior for median with nan? That's the expected behavior, at least :) (this is also the expected behavior of most math packages I know, including matlab and R, so this should not be too surprising if you have used those). > Is there a fix for > this or am I going to have to settle with using lists? You can use nanmean (from scipy.stats): >>> stats.nanmedian(np.array([1, np.nan, 3, 9])) 3 cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Thu, Sep 18, 2008 at 12:23 PM, Pierre GM <[EMAIL PROTECTED]> wrote: > On Thursday 18 September 2008 13:31:18 Peter Saffrey wrote: > > The version in the Ubuntu package repository. It says 1:1.0.4-6ubuntu3. > > So it's 1.0 ? It's fairly old, that would explain. > > > > if you don't give an axis > > > parameter, you should get the median of the flattened array, therefore > a > > > scalar, not an array. > > > > Not for my version. > > Indeed. Looks like the default axis changed from 0 in 1.0 to None in the > incoming 1.2. But that's a detail at this point. > > > > Anyway: you should use ma.median for masked arrays. Else, you're just > > > keeping the NaNs where they were. > > > > That will be the problem. My version does not have median or mean methods > > for masked arrays, only the average() method. > > The method mean has always been around for masked arrays, so has the > corresponding function. But I'm surprised, median has been in > numpy.ma.extras > for a while. Maybe not 1.0... > > > According to this page: > > > > http://www.scipy.org/Download > > > > 1.1.0 is the latest release. > > You need to update your internet ;) 1.1.1 was released 6 weeks ago. > The page had a typo, I've fixed it. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Thursday 18 September 2008 13:31:18 Peter Saffrey wrote: > The version in the Ubuntu package repository. It says 1:1.0.4-6ubuntu3. So it's 1.0 ? It's fairly old, that would explain. > > if you don't give an axis > > parameter, you should get the median of the flattened array, therefore a > > scalar, not an array. > > Not for my version. Indeed. Looks like the default axis changed from 0 in 1.0 to None in the incoming 1.2. But that's a detail at this point. > > Anyway: you should use ma.median for masked arrays. Else, you're just > > keeping the NaNs where they were. > > That will be the problem. My version does not have median or mean methods > for masked arrays, only the average() method. The method mean has always been around for masked arrays, so has the corresponding function. But I'm surprised, median has been in numpy.ma.extras for a while. Maybe not 1.0... > According to this page: > > http://www.scipy.org/Download > > 1.1.0 is the latest release. You need to update your internet ;) 1.1.1 was released 6 weeks ago. > Do I need to use an SVN build to get the > ma.median functionality? No, you can install 1.1.1, that should work. Note that I just fixed a bug in median in SVN (it would fail when trying to get the median of a 2D array with axis=1), so you may want to check this one instead if you feel like it. You can still use 1.1.1 : as a quick workaround the forementioned bug, use ma.median(a.T, axis=0) instead of ma.median(a,axis=1) when working w/ 2D arrays. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Thu, Sep 18, 2008 at 11:31 AM, Peter Saffrey <[EMAIL PROTECTED]> wrote: > Pierre GM gmail.com> writes: > > > Mmh, typo? > > > > Yes, apologies. I was aiming for thorough, but ended up just careless. It's > been > a long day. > > > Ohoh. What version of numpy are you using ? > > The version in the Ubuntu package repository. It says 1:1.0.4-6ubuntu3. > > > if you don't give an axis > > parameter, you should get the median of the flattened array, therefore a > > scalar, not an array. > > Not for my version. > > >>> a = rand(10,3) > >>> a > array([[ 0.1269796 , 0.43003978, 0.4700416 ], > [ 0.28867077, 0.85265318, 0.35908364], > [ 0.72967127, 0.41856028, 0.54724918], > [ 0.28821876, 0.69684144, 0.54647616], > [ 0.09592476, 0.83704808, 0.52425368], > [ 0.743552 , 0.44433314, 0.7362179 ], > [ 0.4283931 , 0.13305385, 0.68422292], > [ 0.68860674, 0.15057373, 0.99206493], > [ 0.31846329, 0.77237046, 0.986883 ], > [ 0.4578616 , 0.4580833 , 0.97754176]]) > >>> median(a.T) > array([ 0.43003978, 0.35908364, 0.54724918, 0.54647616, 0.52425368, >0.7362179 , 0.4283931 , 0.68860674, 0.77237046, 0.4580833 ]) > > > Anyway: you should use ma.median for masked arrays. Else, you're just > keeping > > the NaNs where they were. > > > > That will be the problem. My version does not have median or mean methods > for > masked arrays, only the average() method. > > According to this page: > > http://www.scipy.org/Download > > 1.1.0 is the latest release. Do I need to use an SVN build to get the > ma.median > functionality? > 1.1.1 is the latest release and 1.2 is coming out shortly. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Pierre GM gmail.com> writes: > Mmh, typo? > Yes, apologies. I was aiming for thorough, but ended up just careless. It's been a long day. > Ohoh. What version of numpy are you using ? The version in the Ubuntu package repository. It says 1:1.0.4-6ubuntu3. > if you don't give an axis > parameter, you should get the median of the flattened array, therefore a > scalar, not an array. Not for my version. >>> a = rand(10,3) >>> a array([[ 0.1269796 , 0.43003978, 0.4700416 ], [ 0.28867077, 0.85265318, 0.35908364], [ 0.72967127, 0.41856028, 0.54724918], [ 0.28821876, 0.69684144, 0.54647616], [ 0.09592476, 0.83704808, 0.52425368], [ 0.743552 , 0.44433314, 0.7362179 ], [ 0.4283931 , 0.13305385, 0.68422292], [ 0.68860674, 0.15057373, 0.99206493], [ 0.31846329, 0.77237046, 0.986883 ], [ 0.4578616 , 0.4580833 , 0.97754176]]) >>> median(a.T) array([ 0.43003978, 0.35908364, 0.54724918, 0.54647616, 0.52425368, 0.7362179 , 0.4283931 , 0.68860674, 0.77237046, 0.4580833 ]) > Anyway: you should use ma.median for masked arrays. Else, you're just keeping > the NaNs where they were. > That will be the problem. My version does not have median or mean methods for masked arrays, only the average() method. According to this page: http://www.scipy.org/Download 1.1.0 is the latest release. Do I need to use an SVN build to get the ma.median functionality? Peter ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Thursday 18 September 2008 10:59:12 Peter Saffrey wrote: > I had looked at masked arrays, but couldn't quite get them to work. That's unfortunate. > >>> from numeric import * Mmh, typo? > >>> from pylab import rand > >>> a = rand(10,3) > >>> a[a > 0.8] = nan > >>> m = ma.masked_array(a, isnan(a)) > >>> m Another way would be m = ma.masked_where(a>0.8,a) > Remember I want medians of each triple, so I need to median the > transposed matrix: > >>> median(m.T) Ohoh. What version of numpy are you using ? if you don't give an axis parameter, you should get the median of the flattened array, therefore a scalar, not an array. Anyway: you should use ma.median for masked arrays. Else, you're just keeping the NaNs where they were. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Hi, > >>> median(m.T) > array([ 1.e+20, 2.12298948e-01, 3.57822574e-01, I believe 1.e+20 is a reserved value and signifys the missing value or NaN in your case. That"s the way it was in a Fortran77 package I worked with ten years ago... --- On Thu, 9/18/08, Peter Saffrey <[EMAIL PROTECTED]> wrote: > From: Peter Saffrey <[EMAIL PROTECTED]> > Subject: Re: [Numpy-discussion] Medians that ignore values > To: numpy-discussion@scipy.org > Date: Thursday, September 18, 2008, 10:59 AM > physics.ucf.edu> writes: > > > Currently the only way you can handle NaNs is by > using masked arrays. > > Create a mask by doing isfinite(a), then call the > masked array > > median(). There's an example here: > > > > http://sd-2116.dedibox.fr/pydocweb/doc/numpy.ma/ > > > > I had looked at masked arrays, but couldn't quite get > them to work. > Generating them is fine (I've randomly introduced a few > nan values into > this array): > > >>> from numeric import * > >>> from pylab import rand > >>> a = rand(10,3) > >>> a[a > 0.8] = nan > >>> m = ma.masked_array(a, isnan(a)) > >>> m > array(data = > [[ 5.97400164e-01 1.e+20 1.e+20] > [ 3.34623242e-01 6.53582662e-02 2.12298948e-01] > [ 2.11879853e-01 1.e+20 3.57822574e-01] > [ 6.06911592e-01 1.96229341e-01 5.49953059e-02] > [ 1.e+20 2.75493584e-01 4.70929957e-01] > [ 2.92845118e-01 2.11261529e-02 3.49211381e-02] > [ 7.11963636e-01 2.17277855e-01 5.45487384e-02] > [ 5.20995579e-01 7.57676845e-01 1.e+20] > [ 1.84189196e-01 7.58291436e-02 6.26567116e-01] > [ 2.42083978e-01 1.e+20 2.30202562e-02]], >mask = > [[False True True] > [False False False] > [False True False] > [False False False] > [ True False False] > [False False False] > [False False False] > [False False True] > [False False False] > [False True False]], >fill_value=1e+20) > > > Remember I want medians of each triple, so I need to median > the > transposed matrix: > > >>> median(m.T) > array([ 1.e+20, 2.12298948e-01, > 3.57822574e-01, > 1.96229341e-01, 4.70929957e-01, > 3.49211381e-02, > 2.17277855e-01, 7.57676845e-01, > 1.84189196e-01, > 2.42083978e-01]) > > The first value is NaN, indicating that the median routine > has failed to > ignore the masked values. What have I missed? > > Thanks, > > Peter > ___ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
physics.ucf.edu> writes: > Currently the only way you can handle NaNs is by using masked arrays. > Create a mask by doing isfinite(a), then call the masked array > median(). There's an example here: > > http://sd-2116.dedibox.fr/pydocweb/doc/numpy.ma/ > I had looked at masked arrays, but couldn't quite get them to work. Generating them is fine (I've randomly introduced a few nan values into this array): >>> from numeric import * >>> from pylab import rand >>> a = rand(10,3) >>> a[a > 0.8] = nan >>> m = ma.masked_array(a, isnan(a)) >>> m array(data = [[ 5.97400164e-01 1.e+20 1.e+20] [ 3.34623242e-01 6.53582662e-02 2.12298948e-01] [ 2.11879853e-01 1.e+20 3.57822574e-01] [ 6.06911592e-01 1.96229341e-01 5.49953059e-02] [ 1.e+20 2.75493584e-01 4.70929957e-01] [ 2.92845118e-01 2.11261529e-02 3.49211381e-02] [ 7.11963636e-01 2.17277855e-01 5.45487384e-02] [ 5.20995579e-01 7.57676845e-01 1.e+20] [ 1.84189196e-01 7.58291436e-02 6.26567116e-01] [ 2.42083978e-01 1.e+20 2.30202562e-02]], mask = [[False True True] [False False False] [False True False] [False False False] [ True False False] [False False False] [False False False] [False False True] [False False False] [False True False]], fill_value=1e+20) Remember I want medians of each triple, so I need to median the transposed matrix: >>> median(m.T) array([ 1.e+20, 2.12298948e-01, 3.57822574e-01, 1.96229341e-01, 4.70929957e-01, 3.49211381e-02, 2.17277855e-01, 7.57676845e-01, 1.84189196e-01, 2.42083978e-01]) The first value is NaN, indicating that the median routine has failed to ignore the masked values. What have I missed? Thanks, Peter ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
> You might want to try isfinite() to first remove nan, +/- infinity > before doing that. > numpy.median(a[numpy.isfinite(a)]) We just had this discussion a month or two ago, I think even on this list, and continued it at the SciPy conference. The problem with numpy.median(a[numpy.isfinite(a)]) is that it breaks when you have a multi-dimensional array, such as an array of 5000x3 as in this case, and take median down an axis. The example above flattens the array and eliminates the possibility of taking the median down an axis in a single call, as the poster desires. Currently the only way you can handle NaNs is by using masked arrays. Create a mask by doing isfinite(a), then call the masked array median(). There's an example here: http://sd-2116.dedibox.fr/pydocweb/doc/numpy.ma/ Note that our competitor language IDL does have a /nan flag to its single median routine, making this common task much easier in that language than ours. --jh-- ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Nadav Horesh wrote: > I think you need to use masked arrays. > > Nadav > > > -הודעה מקורית- > מאת: [EMAIL PROTECTED] בשם Peter Saffrey > נשלח: ה 18-ספטמבר-08 14:27 > אל: numpy-discussion@scipy.org > נושא: [Numpy-discussion] Medians that ignore values > > I have data from biological experiments that is represented as a list of > about 5000 triples. I would like to convert this to a list of the median > of each triple. I did some profiling and found that numpy was much about > 12 times faster for this application than using regular Python lists and > a list median implementation. I'll be performing quite a few > mathematical operations on these values, so using numpy arrays seems > sensible. > > The only problem is that my data has gaps in it - where an experiment > failed, a "triple" will not have three values. Some will have 2, 1 or > even no values. To keep the arrays regular so that they can be used by > numpy, is there some dummy value I can use to fill these gaps that will > be ignored by the median routine? > > I tried NaN for this, but as far as median is concerned, it counts as > infinity: > > >>> from numpy import * > >>> median(array([1,3,nan])) > 3.0 > >>> median(array([1,nan,nan])) > nan > > Is this the correct behavior for median with nan? Is there a fix for > this or am I going to have to settle with using lists? > > Thanks, > > Peter > ___ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > > > > > ___ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > Hi, The counting of infinity is correct due to the implementation of IEEE Standard for Binary Floating-Point for Arithmetic (IEEE 754). You might want to try isfinite() to first remove nan, +/- infinity before doing that. numpy.median(a[numpy.isfinite(a)]) Bruce ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
I think you need to use masked arrays. Nadav -הודעה מקורית- מאת: [EMAIL PROTECTED] בשם Peter Saffrey נשלח: ה 18-ספטמבר-08 14:27 אל: numpy-discussion@scipy.org נושא: [Numpy-discussion] Medians that ignore values I have data from biological experiments that is represented as a list of about 5000 triples. I would like to convert this to a list of the median of each triple. I did some profiling and found that numpy was much about 12 times faster for this application than using regular Python lists and a list median implementation. I'll be performing quite a few mathematical operations on these values, so using numpy arrays seems sensible. The only problem is that my data has gaps in it - where an experiment failed, a "triple" will not have three values. Some will have 2, 1 or even no values. To keep the arrays regular so that they can be used by numpy, is there some dummy value I can use to fill these gaps that will be ignored by the median routine? I tried NaN for this, but as far as median is concerned, it counts as infinity: >>> from numpy import * >>> median(array([1,3,nan])) 3.0 >>> median(array([1,nan,nan])) nan Is this the correct behavior for median with nan? Is there a fix for this or am I going to have to settle with using lists? Thanks, Peter ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion <>___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Medians that ignore values
I have data from biological experiments that is represented as a list of about 5000 triples. I would like to convert this to a list of the median of each triple. I did some profiling and found that numpy was much about 12 times faster for this application than using regular Python lists and a list median implementation. I'll be performing quite a few mathematical operations on these values, so using numpy arrays seems sensible. The only problem is that my data has gaps in it - where an experiment failed, a "triple" will not have three values. Some will have 2, 1 or even no values. To keep the arrays regular so that they can be used by numpy, is there some dummy value I can use to fill these gaps that will be ignored by the median routine? I tried NaN for this, but as far as median is concerned, it counts as infinity: >>> from numpy import * >>> median(array([1,3,nan])) 3.0 >>> median(array([1,nan,nan])) nan Is this the correct behavior for median with nan? Is there a fix for this or am I going to have to settle with using lists? Thanks, Peter ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion