Re: [OctDev] Function mean() fails with a complex matrix

Jaroslav Hajek Fri, 06 Mar 2009 22:32:32 -0800

On Fri, Mar 6, 2009 at 3:54 PM, Alois Schlögl <[email protected]> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Jaroslav Hajek wrote:
>> On Fri, Mar 6, 2009 at 1:50 PM, Alois Schlögl <[email protected]> 
>> wrote:
>>
>> Fair example. This example requires some explicit handling of NaN's.
>> Lets look at the case that raises an error:
>>
>> c = mean(x);
>> if any(isnan(c))
>>        error();
>> end;
>>
>>
>> With the skippingNaN-mean() you do
>>
>> if any(isnan(x(:))
>>        error();
>> end
>> c=mean(x);
>>
>>
>> In both cases you need somethink to do about the NaN's e.g. some error
>> handling. Except for the performance issue, there is no disadvantage in
>> using the nanskipping-mean().
>>
>>
>>> No, I want just to leave them there.
>
>
> I know, you are using NaN's as a marker that an error occured in some
> earlier processing steps, and you do not care where this has happened.
>
> However, at some point you need to do something about the result, and it
> will make a difference whether its a NaN or not. Because if this would
> not matter, there would be no need to keep the NaN.
>


Maybe someone other will do something. Maybe this particular result
will not even be used. Maybe the caller already knows that this is an
invalid input, and will discard the invalid result. The call may be
done purely out of convenience, e.g. as part of cellfun or arrayfun
calls. It is often more efficient to filter out just the result.


>>
>> And one could also imagine to address the performance issue by a change
>> in the interface (e.g. by raising a global flag)
>>
>> [c,N]=mean(x);
>> if flag_nans_occured(),
>>        error();
>> end
>>
>> Actually, flag_nans_occured() is now supported.
>>
>>
>>> So you don't want to remember nanmean/mean but it's OK to remember
>>> checking for flag_nans_occured ()?
>
>
> In most cases, the check is not needed because nanskip-mean() is already
> doing the right thing. Only, in the rare cases where you need some
> explicit handling of NaN's, you can use this.
>

But that's hellishly complicated for my purpose, even if I did
remember that such a function exists. While it may be OK for a novice,
advanced users don't like functions trying to be smarter than they
should be.

>
>
>>
>> ---
>>
>> You might consider it an advantage, that you can do the error checking
>> much later, e.g.
>>
>> c = mean(x);
>> d = do_some_more(c);
>> if any(isnan(d))
>>        error();
>> end;
>>
>> However, this makes reading the code and finding the error more
>> difficult. Because, one can not easily see which step is causing the NaN.
>>
>>> If I'm just writing a function that calculates, say, the centroid and
>>> distance vectors to vertices, I just want to return NaNs, not gripe
>>> inside the function. This is how most of Octave's functions work. That
>>> lets the user to choose the most suitable error handling. Maybe this
>>> particular invalid result won't be actually used in the computation -
>>> that's the whole point of NaNs, they're just more flexible than
>>> runtime errors.
>
>
> using error() above was just an example. It should point out that NaN's
> will make a difference. Typically, one is using the NaN, perhaps you
> decide on the fact whether there is a nan, whether you are throwing away
> the result or not. But you are using the NaN at some later time.
>
> What you call "flexibility" comes at the cost that you do not know what
> was causing the NaN. Its a very crude approach of error handling.

No, it's an effective error handling. That's why NaNs were invented.

>A numerical algorithm can always return NaN, and it would not be incorrect.
>

Yes. That's why I want to return them.

>
> The point is that with a crude NaN-propagation, you might reject cases,
> were a more sensible approach might work fine.
>

Might. In a different universe, maybe. But I won't. In this case, I
just want the computer to do what I mean, not what it thinks is best.

The discussion getting pointless. You wanted an example of an
application where returning NaNs, and hence using the normal mean is
the most sensible result, and I gave it to you. There's no point in
demonstrating that I can work it around even with the nanskipping
mean; sure I can, but it's still a workaround.

The problem is that "mean" has dozens of applications outside
statistics, where rejecting NaNs is useless.
Admittedly, I can only say that for "mean"; as for the other
functions, like "std", I can't think of a non-statistical use
(which doesn't necessarily mean they don't exist, but I think they're
far more rare).

Btw., are you aware that Octave supports a NA value for indicating
missing data (a special case of NaN)?
Therefore, if we do this, we should probably be only skipping NAs
(missing values) and not general NaNs (invalid results), same what R
does.

So, in principle, I could agree with the statistics functions skipping
NAs by default. I would probably want also a "plain" mean then, say,
"average" or "center", that could go in "general" or elsewhere, though
purely for convenience.

You're free to try to convince the rest of Octave community, of
course, or even make a patch straight away. I'd suggest to first write
a somewhat detailed proposal explaining the point to the maintainers
or help mailing list, to hear what others have to say. Note that even
R, a statistical language, does not skip the NAs by default. Probably
due to performance issues. But if a more optimized version of
sumskipnan (it should be named __sumskipnan__ in Octave) reduced that
penalty under, say, 20%, I'd agree it's worth the added benefit.

regards

-- 
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Octave-dev mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: [OctDev] Function mean() fails with a complex matrix

Reply via email to