On Fri, Mar 6, 2009 at 9:25 AM, Alois Schlögl <[email protected]> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Jaroslav Hajek wrote:
>> On Fri, Mar 6, 2009 at 8:09 AM, Alois Schlögl <[email protected]> 
>> wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Jaroslav Hajek wrote:
>>>> On Thu, Mar 5, 2009 at 4:04 PM, Alois Schlögl <[email protected]> 
>>>> wrote:
>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>> Hash: SHA1
>>>>>
>>>>> Jaroslav Hajek wrote:
>>>>>> On Thu, Mar 5, 2009 at 12:02 PM, Alois Schlögl 
>>>>>> <[email protected]> wrote:
>>>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>>>> Hash: SHA1
>>>>>>>
>>>>>>> Jaroslav Hajek wrote:
>>>>>>>>> sumskipnan counts also the number of non-NaNs.
>>>>>>>>> [s,c]=sumskipnan(...)
>>>>>>>>>
>>>>>>>>> computing both s and c in a single step is beneficial for estimating
>>>>>>>>> mean, variance and other statistics.
>>>>>>>>>
>>>>>>>> well, you can do
>>>>>>>>
>>>>>>>> nans = isnan (x);
>>>>>>>> x(nans) = 0;
>>>>>>>> s = sum (x, dim);
>>>>>>>> c = size (x, dim) - sum (nans);
>>>>>>>>
>>>>>>>> Not exactly as fast as doing it all in a single loop, but simplistic.
>>>>>>> I guess, you meant
>>>>>>>    c = size (x, dim) - sum (nans,dim);
>>>>>>>
>>>>>>> In terms of simplicity,
>>>>>>>       [s,c]=sumskipnan(x,dim);
>>>>>>> will win.
>>>>>>>
>>>>>> Depends on what you count in. I wrote the first from top of my head,
>>>>>> whereas for the second I'd need to look up the syntax. But I don't
>>>>>> have any fundamental objections against the existence of sumskipnan,
>>>>>> of course.
>>>>> Fine.
>>>>>
>>>>>>>>>> Besides, I think the fact that the NaN package shadows Octave's
>>>>>>>>>> built-in functions is very dangerous and confusing, even though I
>>>>>>>>>> understand the motivation. I think this package should not be
>>>>>>>>>> installed by default.
>>>>>>>>> Where do you see a danger ? Please explain.
>>>>>>>>>
>>>>>>>> It seems that sometimes users (especially windows users) get this
>>>>>>>> package unknowingly loaded. Not that this is your fault, just that it
>>>>>>>> probably shouldn't be on by default in distributions.
>>>>>>>>
>>>>>>>> The more painful issue is that it makes the package less attractive to
>>>>>>>> use - for instance, if I want to use the nanmean function to get
>>>>>>>> nan-free mean, but I *don't* want the built-in mean to be shadowed
>>>>>>>> (because the replacement is slower).
>>>>>>> Therefore, it would be nice to have a pre-compiled sumskipnan that
>>>>>>> limits the performance hit. And their is certainly room for further
>>>>>>> improvement.
>>>>>> I don't want to limit it. I just don't want it to be there. I would
>>>>>> like to be able to use *both* nanmean and the default mean at the same
>>>>>> time.
>>>>> And there are many others, like me for example, that do not want to
>>>>> think about, whether nanmean or mean is the proper function for a
>>>>> specific problem.
>>>>>
>>>>> In case there are no NaN's, both yield the same result.
>>>>> In the presence of NaN's, the default mean results in NaN, while a
>>>>> perfectly valid result could be obtained.
>>>>>
>>>>> Or can You think of any reasonable problem, when mean should propagate
>>>>> the NaN's ? I can not. Consequently, there is no need to have both
>>>>> nanmean and mean.
>>>>>
>>>> Just like Soren said, in most cases where NaN does not represent a
>>>> missing value.
>>>
>>> It statistics nobody is asking what the meaning of the NaN is. Ignoring
>>> NaN is just the right thing to do.
>>>
>>> Again, I'm just talking about statistical functions, and do not
>>> generalize this to other areas.
>>>
>>
>> That's OK. But I may want to use both "statistical" mean and
>> "non-statistical" in totally different areas of a single computation.
>
>
> Do you really have a case where you want the mean estimation to behave
> differently than the statistical mean ? That is, were NaN's should be
> propagated ?
>

You just think too statistically of the mean. I may well use "mean"
just for it's mathematical definition, that is, sum divided by count,
completely unrelated to any statistics. For instance, to calculate the
centroid of a simplex. In that case, skipping NaNs is a complete
nonsense because it will give silently a wrong result.

> I'm asking because in 15+ years of using Matlab and Octave, I've never
> found such a case. Maybe I can learn something new.

See above.

> Even in case, NaN propagation is desired, I guess I'd prefer to have an
> explicit check for NaN's in order to emphasize that special case and
> make the code more readable. Again, I've never come across a case were I
> needed the mean to propagate NaN's.
>

Same thing - you're just used to skipping NaNs in mean, others may not be.

>
>> But the different NaN treatment is not actually that bad, I doubt
>> anyone would notice (the performance hit may be noticeable, but it is
>> also unlikely).
>
> I'm aware that the performance hit might be a disadvantage in using the
> NaN-toolbox (although the benchmark tests have not been widely applied).
>  I guess its the major obstacle for a more widely application.
>

I can't judge that. Maybe most people are fine with it. In any case,
I'm certainly free to not use the package if I don't like what it
does. Besides, the functionality I was asking for (i.e. nanmean
without shadowed mean) is provided by another package, so I just have
no problem.

> On the other hand, you gain in terms of programming effort:
> (i) software is doing more often the right thing,

depends...

> (ii) its less likely to fail due to NaN-related issues.

depends...

> (iii) its more likely that users unaware of the NaN-issue get it right
> in the first place,

and stay unaware... (if it's right, of course)

> (iv) no need to think about whether nanmean or mean is the right function;
> (v) of course using always nanmean() would also do, but its nicer to
> write only mean();
>

I strongly prefer to have different syntax for functions doing different things.

> In my experience, these advantages outweigh the small performance
> penalty. These are also the reasons, why it was developed. Except for
> compatibility tests, I've never found a need to turn off the NaN-toolbox.
>

Good for you :)

cheers

-- 
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Octave-dev mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/octave-dev

Reply via email to