On Thu, Mar 5, 2009 at 4:04 PM, Alois Schlögl <[email protected]> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Jaroslav Hajek wrote:
>> On Thu, Mar 5, 2009 at 12:02 PM, Alois Schlögl <[email protected]> 
>> wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Jaroslav Hajek wrote:
>>>>> sumskipnan counts also the number of non-NaNs.
>>>>> [s,c]=sumskipnan(...)
>>>>>
>>>>> computing both s and c in a single step is beneficial for estimating
>>>>> mean, variance and other statistics.
>>>>>
>>>> well, you can do
>>>>
>>>> nans = isnan (x);
>>>> x(nans) = 0;
>>>> s = sum (x, dim);
>>>> c = size (x, dim) - sum (nans);
>>>>
>>>> Not exactly as fast as doing it all in a single loop, but simplistic.
>>> I guess, you meant
>>>    c = size (x, dim) - sum (nans,dim);
>>>
>>> In terms of simplicity,
>>>       [s,c]=sumskipnan(x,dim);
>>> will win.
>>>
>>
>> Depends on what you count in. I wrote the first from top of my head,
>> whereas for the second I'd need to look up the syntax. But I don't
>> have any fundamental objections against the existence of sumskipnan,
>> of course.
>
> Fine.
>
>>
>>>>>> Besides, I think the fact that the NaN package shadows Octave's
>>>>>> built-in functions is very dangerous and confusing, even though I
>>>>>> understand the motivation. I think this package should not be
>>>>>> installed by default.
>>>>> Where do you see a danger ? Please explain.
>>>>>
>>>> It seems that sometimes users (especially windows users) get this
>>>> package unknowingly loaded. Not that this is your fault, just that it
>>>> probably shouldn't be on by default in distributions.
>>>>
>>>> The more painful issue is that it makes the package less attractive to
>>>> use - for instance, if I want to use the nanmean function to get
>>>> nan-free mean, but I *don't* want the built-in mean to be shadowed
>>>> (because the replacement is slower).
>>> Therefore, it would be nice to have a pre-compiled sumskipnan that
>>> limits the performance hit. And their is certainly room for further
>>> improvement.
>>
>> I don't want to limit it. I just don't want it to be there. I would
>> like to be able to use *both* nanmean and the default mean at the same
>> time.
>
>
> And there are many others, like me for example, that do not want to
> think about, whether nanmean or mean is the proper function for a
> specific problem.
>
> In case there are no NaN's, both yield the same result.
> In the presence of NaN's, the default mean results in NaN, while a
> perfectly valid result could be obtained.
>
> Or can You think of any reasonable problem, when mean should propagate
> the NaN's ? I can not. Consequently, there is no need to have both
> nanmean and mean.
>

Just like Soren said, in most cases where NaN does not represent a
missing value.

>
> Concerning the performance, how detailed was your testing ? I get
> actually mixed results about the performance of sum and sumskipnan.
>
> octave:16> x=randn(1e4);   %% !!! requires about 800 MBytes of RAM !!!
> octave:17> tic,[y]=sum(x,2);toc
> Elapsed time is 5.43515 seconds.
> octave:18> tic,[y]=sumskipnan(x,2);toc
> Elapsed time is 2.54446 seconds.
>
> In this case, sumskipnan is twice as fast than sum. ;-)
> (using Octave 3.1.51+ on Ubuntu and QuadCore AMD64, with 4 GB RAM on
> Ubuntu).
>

I'm using Octave 3.1.53+ on Core 2 Duo @2.83 GHz:

octave:5> tic,[y]=sum(x,2);toc
Elapsed time is 0.139597 seconds.
octave:6> tic,[y]=sumskipnan(x,2);toc
Elapsed time is 0.981461 seconds.

so it seems that there is a penalty factor about 7 (sum was optimized
since 3.1.51). That's significant, even though getting a mean is
seldom a bottleneck.

>>
>>>> OTOH, I admit sometimes it may be good to be able to just substitute
>>>> the default stats by nan-free ones.
>>>>
>>>> I think it would be better to split the package in two, say, "nan" and
>>>> "nan-shadow" that would separate the two uses, because right now I
>>>> need to manually edit "path" after the package is loaded if I don't
>>>> want the default funcs to be shadowed.
>>>
>>> I donot know how this should work. We have already two competing
>>> stats-packages, the default one and the NaN-toolbox. A third option
>>> would just increase the confusion. Personally, I'd prefer merging the
>>> advantages of both approaches in a single solution.
>>>
>>>
>>> However, I do not see any *danger* is using the NaN-toolbox.
>>>
>>
>> "danger" was an exaggeration.
>> But as you've seen, users report failures due to the NaN package as
>> bugs in Octave.
>>
>>
> I've fixed the present problem, by renaming sumskipnan.cc to
> sumskipnan_oct.cc. Accordingly, sumskipnan.m is used by default, which
> does not have this problem.
>

OK. It's fine with me; and I've discovered that the statistics pkg
also has the nanmean et al. funcs (thanks for pointing me to it). I
just think that maybe there should be a warning given when Octave's
functions are shadowed.

cheers

-- 
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Octave-dev mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/octave-dev

Reply via email to