On Thu, Mar 5, 2009 at 4:04 PM, Alois Schlögl <[email protected]> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Jaroslav Hajek wrote: >> On Thu, Mar 5, 2009 at 12:02 PM, Alois Schlögl <[email protected]> >> wrote: >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> Jaroslav Hajek wrote: >>>>> sumskipnan counts also the number of non-NaNs. >>>>> [s,c]=sumskipnan(...) >>>>> >>>>> computing both s and c in a single step is beneficial for estimating >>>>> mean, variance and other statistics. >>>>> >>>> well, you can do >>>> >>>> nans = isnan (x); >>>> x(nans) = 0; >>>> s = sum (x, dim); >>>> c = size (x, dim) - sum (nans); >>>> >>>> Not exactly as fast as doing it all in a single loop, but simplistic. >>> I guess, you meant >>> c = size (x, dim) - sum (nans,dim); >>> >>> In terms of simplicity, >>> [s,c]=sumskipnan(x,dim); >>> will win. >>> >> >> Depends on what you count in. I wrote the first from top of my head, >> whereas for the second I'd need to look up the syntax. But I don't >> have any fundamental objections against the existence of sumskipnan, >> of course. > > Fine. > >> >>>>>> Besides, I think the fact that the NaN package shadows Octave's >>>>>> built-in functions is very dangerous and confusing, even though I >>>>>> understand the motivation. I think this package should not be >>>>>> installed by default. >>>>> Where do you see a danger ? Please explain. >>>>> >>>> It seems that sometimes users (especially windows users) get this >>>> package unknowingly loaded. Not that this is your fault, just that it >>>> probably shouldn't be on by default in distributions. >>>> >>>> The more painful issue is that it makes the package less attractive to >>>> use - for instance, if I want to use the nanmean function to get >>>> nan-free mean, but I *don't* want the built-in mean to be shadowed >>>> (because the replacement is slower). >>> Therefore, it would be nice to have a pre-compiled sumskipnan that >>> limits the performance hit. And their is certainly room for further >>> improvement. >> >> I don't want to limit it. I just don't want it to be there. I would >> like to be able to use *both* nanmean and the default mean at the same >> time. > > > And there are many others, like me for example, that do not want to > think about, whether nanmean or mean is the proper function for a > specific problem. > > In case there are no NaN's, both yield the same result. > In the presence of NaN's, the default mean results in NaN, while a > perfectly valid result could be obtained. > > Or can You think of any reasonable problem, when mean should propagate > the NaN's ? I can not. Consequently, there is no need to have both > nanmean and mean. >
Just like Soren said, in most cases where NaN does not represent a missing value. > > Concerning the performance, how detailed was your testing ? I get > actually mixed results about the performance of sum and sumskipnan. > > octave:16> x=randn(1e4); %% !!! requires about 800 MBytes of RAM !!! > octave:17> tic,[y]=sum(x,2);toc > Elapsed time is 5.43515 seconds. > octave:18> tic,[y]=sumskipnan(x,2);toc > Elapsed time is 2.54446 seconds. > > In this case, sumskipnan is twice as fast than sum. ;-) > (using Octave 3.1.51+ on Ubuntu and QuadCore AMD64, with 4 GB RAM on > Ubuntu). > I'm using Octave 3.1.53+ on Core 2 Duo @2.83 GHz: octave:5> tic,[y]=sum(x,2);toc Elapsed time is 0.139597 seconds. octave:6> tic,[y]=sumskipnan(x,2);toc Elapsed time is 0.981461 seconds. so it seems that there is a penalty factor about 7 (sum was optimized since 3.1.51). That's significant, even though getting a mean is seldom a bottleneck. >> >>>> OTOH, I admit sometimes it may be good to be able to just substitute >>>> the default stats by nan-free ones. >>>> >>>> I think it would be better to split the package in two, say, "nan" and >>>> "nan-shadow" that would separate the two uses, because right now I >>>> need to manually edit "path" after the package is loaded if I don't >>>> want the default funcs to be shadowed. >>> >>> I donot know how this should work. We have already two competing >>> stats-packages, the default one and the NaN-toolbox. A third option >>> would just increase the confusion. Personally, I'd prefer merging the >>> advantages of both approaches in a single solution. >>> >>> >>> However, I do not see any *danger* is using the NaN-toolbox. >>> >> >> "danger" was an exaggeration. >> But as you've seen, users report failures due to the NaN package as >> bugs in Octave. >> >> > I've fixed the present problem, by renaming sumskipnan.cc to > sumskipnan_oct.cc. Accordingly, sumskipnan.m is used by default, which > does not have this problem. > OK. It's fine with me; and I've discovered that the statistics pkg also has the nanmean et al. funcs (thanks for pointing me to it). I just think that maybe there should be a warning given when Octave's functions are shadowed. cheers -- RNDr. Jaroslav Hajek computing expert & GNU Octave developer Aeronautical Research and Test Institute (VZLU) Prague, Czech Republic url: www.highegg.matfyz.cz ------------------------------------------------------------------------------ Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H _______________________________________________ Octave-dev mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/octave-dev
