On Fri, Mar 6, 2009 at 3:54 PM, Alois Schlögl <[email protected]> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Jaroslav Hajek wrote: >> On Fri, Mar 6, 2009 at 1:50 PM, Alois Schlögl <[email protected]> >> wrote: >> >> Fair example. This example requires some explicit handling of NaN's. >> Lets look at the case that raises an error: >> >> c = mean(x); >> if any(isnan(c)) >> error(); >> end; >> >> >> With the skippingNaN-mean() you do >> >> if any(isnan(x(:)) >> error(); >> end >> c=mean(x); >> >> >> In both cases you need somethink to do about the NaN's e.g. some error >> handling. Except for the performance issue, there is no disadvantage in >> using the nanskipping-mean(). >> >> >>> No, I want just to leave them there. > > > I know, you are using NaN's as a marker that an error occured in some > earlier processing steps, and you do not care where this has happened. > > However, at some point you need to do something about the result, and it > will make a difference whether its a NaN or not. Because if this would > not matter, there would be no need to keep the NaN. >
Maybe someone other will do something. Maybe this particular result will not even be used. Maybe the caller already knows that this is an invalid input, and will discard the invalid result. The call may be done purely out of convenience, e.g. as part of cellfun or arrayfun calls. It is often more efficient to filter out just the result. >> >> And one could also imagine to address the performance issue by a change >> in the interface (e.g. by raising a global flag) >> >> [c,N]=mean(x); >> if flag_nans_occured(), >> error(); >> end >> >> Actually, flag_nans_occured() is now supported. >> >> >>> So you don't want to remember nanmean/mean but it's OK to remember >>> checking for flag_nans_occured ()? > > > In most cases, the check is not needed because nanskip-mean() is already > doing the right thing. Only, in the rare cases where you need some > explicit handling of NaN's, you can use this. > But that's hellishly complicated for my purpose, even if I did remember that such a function exists. While it may be OK for a novice, advanced users don't like functions trying to be smarter than they should be. > > >> >> --- >> >> You might consider it an advantage, that you can do the error checking >> much later, e.g. >> >> c = mean(x); >> d = do_some_more(c); >> if any(isnan(d)) >> error(); >> end; >> >> However, this makes reading the code and finding the error more >> difficult. Because, one can not easily see which step is causing the NaN. >> >>> If I'm just writing a function that calculates, say, the centroid and >>> distance vectors to vertices, I just want to return NaNs, not gripe >>> inside the function. This is how most of Octave's functions work. That >>> lets the user to choose the most suitable error handling. Maybe this >>> particular invalid result won't be actually used in the computation - >>> that's the whole point of NaNs, they're just more flexible than >>> runtime errors. > > > using error() above was just an example. It should point out that NaN's > will make a difference. Typically, one is using the NaN, perhaps you > decide on the fact whether there is a nan, whether you are throwing away > the result or not. But you are using the NaN at some later time. > > What you call "flexibility" comes at the cost that you do not know what > was causing the NaN. Its a very crude approach of error handling. No, it's an effective error handling. That's why NaNs were invented. >A numerical algorithm can always return NaN, and it would not be incorrect. > Yes. That's why I want to return them. > > The point is that with a crude NaN-propagation, you might reject cases, > were a more sensible approach might work fine. > Might. In a different universe, maybe. But I won't. In this case, I just want the computer to do what I mean, not what it thinks is best. The discussion getting pointless. You wanted an example of an application where returning NaNs, and hence using the normal mean is the most sensible result, and I gave it to you. There's no point in demonstrating that I can work it around even with the nanskipping mean; sure I can, but it's still a workaround. The problem is that "mean" has dozens of applications outside statistics, where rejecting NaNs is useless. Admittedly, I can only say that for "mean"; as for the other functions, like "std", I can't think of a non-statistical use (which doesn't necessarily mean they don't exist, but I think they're far more rare). Btw., are you aware that Octave supports a NA value for indicating missing data (a special case of NaN)? Therefore, if we do this, we should probably be only skipping NAs (missing values) and not general NaNs (invalid results), same what R does. So, in principle, I could agree with the statistics functions skipping NAs by default. I would probably want also a "plain" mean then, say, "average" or "center", that could go in "general" or elsewhere, though purely for convenience. You're free to try to convince the rest of Octave community, of course, or even make a patch straight away. I'd suggest to first write a somewhat detailed proposal explaining the point to the maintainers or help mailing list, to hear what others have to say. Note that even R, a statistical language, does not skip the NAs by default. Probably due to performance issues. But if a more optimized version of sumskipnan (it should be named __sumskipnan__ in Octave) reduced that penalty under, say, 20%, I'd agree it's worth the added benefit. regards -- RNDr. Jaroslav Hajek computing expert & GNU Octave developer Aeronautical Research and Test Institute (VZLU) Prague, Czech Republic url: www.highegg.matfyz.cz ------------------------------------------------------------------------------ Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H _______________________________________________ Octave-dev mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/octave-dev
