On Fri, Mar 6, 2009 at 9:25 AM, Alois Schlögl <[email protected]> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Jaroslav Hajek wrote: >> On Fri, Mar 6, 2009 at 8:09 AM, Alois Schlögl <[email protected]> >> wrote: >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> Jaroslav Hajek wrote: >>>> On Thu, Mar 5, 2009 at 4:04 PM, Alois Schlögl <[email protected]> >>>> wrote: >>>>> -----BEGIN PGP SIGNED MESSAGE----- >>>>> Hash: SHA1 >>>>> >>>>> Jaroslav Hajek wrote: >>>>>> On Thu, Mar 5, 2009 at 12:02 PM, Alois Schlögl >>>>>> <[email protected]> wrote: >>>>>>> -----BEGIN PGP SIGNED MESSAGE----- >>>>>>> Hash: SHA1 >>>>>>> >>>>>>> Jaroslav Hajek wrote: >>>>>>>>> sumskipnan counts also the number of non-NaNs. >>>>>>>>> [s,c]=sumskipnan(...) >>>>>>>>> >>>>>>>>> computing both s and c in a single step is beneficial for estimating >>>>>>>>> mean, variance and other statistics. >>>>>>>>> >>>>>>>> well, you can do >>>>>>>> >>>>>>>> nans = isnan (x); >>>>>>>> x(nans) = 0; >>>>>>>> s = sum (x, dim); >>>>>>>> c = size (x, dim) - sum (nans); >>>>>>>> >>>>>>>> Not exactly as fast as doing it all in a single loop, but simplistic. >>>>>>> I guess, you meant >>>>>>> c = size (x, dim) - sum (nans,dim); >>>>>>> >>>>>>> In terms of simplicity, >>>>>>> [s,c]=sumskipnan(x,dim); >>>>>>> will win. >>>>>>> >>>>>> Depends on what you count in. I wrote the first from top of my head, >>>>>> whereas for the second I'd need to look up the syntax. But I don't >>>>>> have any fundamental objections against the existence of sumskipnan, >>>>>> of course. >>>>> Fine. >>>>> >>>>>>>>>> Besides, I think the fact that the NaN package shadows Octave's >>>>>>>>>> built-in functions is very dangerous and confusing, even though I >>>>>>>>>> understand the motivation. I think this package should not be >>>>>>>>>> installed by default. >>>>>>>>> Where do you see a danger ? Please explain. >>>>>>>>> >>>>>>>> It seems that sometimes users (especially windows users) get this >>>>>>>> package unknowingly loaded. Not that this is your fault, just that it >>>>>>>> probably shouldn't be on by default in distributions. >>>>>>>> >>>>>>>> The more painful issue is that it makes the package less attractive to >>>>>>>> use - for instance, if I want to use the nanmean function to get >>>>>>>> nan-free mean, but I *don't* want the built-in mean to be shadowed >>>>>>>> (because the replacement is slower). >>>>>>> Therefore, it would be nice to have a pre-compiled sumskipnan that >>>>>>> limits the performance hit. And their is certainly room for further >>>>>>> improvement. >>>>>> I don't want to limit it. I just don't want it to be there. I would >>>>>> like to be able to use *both* nanmean and the default mean at the same >>>>>> time. >>>>> And there are many others, like me for example, that do not want to >>>>> think about, whether nanmean or mean is the proper function for a >>>>> specific problem. >>>>> >>>>> In case there are no NaN's, both yield the same result. >>>>> In the presence of NaN's, the default mean results in NaN, while a >>>>> perfectly valid result could be obtained. >>>>> >>>>> Or can You think of any reasonable problem, when mean should propagate >>>>> the NaN's ? I can not. Consequently, there is no need to have both >>>>> nanmean and mean. >>>>> >>>> Just like Soren said, in most cases where NaN does not represent a >>>> missing value. >>> >>> It statistics nobody is asking what the meaning of the NaN is. Ignoring >>> NaN is just the right thing to do. >>> >>> Again, I'm just talking about statistical functions, and do not >>> generalize this to other areas. >>> >> >> That's OK. But I may want to use both "statistical" mean and >> "non-statistical" in totally different areas of a single computation. > > > Do you really have a case where you want the mean estimation to behave > differently than the statistical mean ? That is, were NaN's should be > propagated ? >
You just think too statistically of the mean. I may well use "mean" just for it's mathematical definition, that is, sum divided by count, completely unrelated to any statistics. For instance, to calculate the centroid of a simplex. In that case, skipping NaNs is a complete nonsense because it will give silently a wrong result. > I'm asking because in 15+ years of using Matlab and Octave, I've never > found such a case. Maybe I can learn something new. See above. > Even in case, NaN propagation is desired, I guess I'd prefer to have an > explicit check for NaN's in order to emphasize that special case and > make the code more readable. Again, I've never come across a case were I > needed the mean to propagate NaN's. > Same thing - you're just used to skipping NaNs in mean, others may not be. > >> But the different NaN treatment is not actually that bad, I doubt >> anyone would notice (the performance hit may be noticeable, but it is >> also unlikely). > > I'm aware that the performance hit might be a disadvantage in using the > NaN-toolbox (although the benchmark tests have not been widely applied). > I guess its the major obstacle for a more widely application. > I can't judge that. Maybe most people are fine with it. In any case, I'm certainly free to not use the package if I don't like what it does. Besides, the functionality I was asking for (i.e. nanmean without shadowed mean) is provided by another package, so I just have no problem. > On the other hand, you gain in terms of programming effort: > (i) software is doing more often the right thing, depends... > (ii) its less likely to fail due to NaN-related issues. depends... > (iii) its more likely that users unaware of the NaN-issue get it right > in the first place, and stay unaware... (if it's right, of course) > (iv) no need to think about whether nanmean or mean is the right function; > (v) of course using always nanmean() would also do, but its nicer to > write only mean(); > I strongly prefer to have different syntax for functions doing different things. > In my experience, these advantages outweigh the small performance > penalty. These are also the reasons, why it was developed. Except for > compatibility tests, I've never found a need to turn off the NaN-toolbox. > Good for you :) cheers -- RNDr. Jaroslav Hajek computing expert & GNU Octave developer Aeronautical Research and Test Institute (VZLU) Prague, Czech Republic url: www.highegg.matfyz.cz ------------------------------------------------------------------------------ Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H _______________________________________________ Octave-dev mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/octave-dev
