On 02/29/2012 03:57 PM, Jordi Gutiérrez Hermoso wrote: > 2012/2/29 Alois Schloegl<alois.schlo...@ist.ac.at>: >> if you believe, that I'm doing the NaN-tb because of a petty war you are >> grossly mistaken. The NaN-toolbox tries to solve a real issue - and it >> does it very well, I think. I also do not understand your issue - You do >> not need to use the NaN-toolbox if you do not like it. So what is your >> issue? > The problem is that people continuously are installing all of > Octave-Forge without paying any attention to what they are installing > or why. This is particularly true of Octave installations for Windows > and McOS 10. Thus a lot of users are shadowing core functions without > really understanding the issue behind it.
For how long did Windows and Mac users install accidentally the NaN toolbox ? And of how many problems due to the NaN-tb have been reported to you ? So far non were reported to me, and I should know. So, I guess this is really a non-issue to the users of octave. The only issue is that people get the warning about shadowed functions, and they feel insecure about it. But I see also that these warnings are necessary - and there is no way around it, because shadowing can be a bad thing. But shadowing by the NaN-tb happens only to statistical functions where the skipping of NaNs is well justified. (you might disagree, I come back to this below). The shadowing functions affect only the NaN-handling behavior, so the differences are minor and only a concern if someone relies on NaN-propagation in a few statistical functions. It hardly causes any problems if some users use the NaN-toolbox accidentally. The only problem might arise of someone uses a different exception handling strategy based on NaN, however these users should know how to deal with NaN and whether their approach is compatible with the NaN-tb or not. For these reasons we've not seen any problems related to the shadowing by the NaN-tb. So I think this is a non-issue and does not require any action. > The other problems are that you seem to be unhappy that Octave now > warns when core functions are shadowed, and you also repeatedly insist > that Octave core functions are wrong and are in need of being fixed by > you. If you feel offended by the language, let me know and make suggestions how to improve it. However, please take into account that I try to demonstrate with the NaN-toolbox an alternative concept, that I think is beneficial and an improvement over the standard solution. > Shadowing core functions is also inconvenient from the user point of > view because to enable and disable the NaN-skipping behaviour, you > have to load/unload a whole package, instead of a simple runtime flag > to do this or not. > There is a flag flag_implicit_skip_nan() that can switch the nan-skipping behavior into a nan-propagating behavior. I do not advertise or endorse its use, because the user does not need it; and it could be abused to make the code unreadable because it uses side-effects. The flag is there only for testing, but its there if you really need it. >> Concerning your question: NA-skipping instead of NaN-tb is not a >> solution, at least not for the NaN-toolbox for the following reason: >> >> o) When you compute in statistics some expectation value, it does not >> matter whether there is a NA or a NAN, both should be skipped. > This does not make sense to me. Why should NaN be skipped if it arose > from an incorrect computation? It only makes sense to me to skip them > if they are representing missing data, not if they are representing an > incorrect computation. With "incorrect computation", I assume you mean an operation resulting in an undefined value (like 0/0 or inf-inf). Yielding NaN in such cases is not an "incorrect" but a correct computation, and in agreement with IEEE754. And the meaning of NaN is that of an "undefined value", you might want to use this to signal an exception, but in statistics you will ignore it and compute the statistic from the other available samples. Let look at an example, you have two larger vectors x and y and want to compute the average ratio x(k)/y(k). There might be cases were some k, both x(k),y(k) are zero resulting in NaN. It is reasonable to compute the average (i.e. the statistical mean) from the remaining samples. The standard solution would be (1) m = nanmean(x./y) or (2) z = x./y z(isnan(z))=[]; m = mean(z) With the NaN-toolbox, you just need: (3) m = mean(x./y) I general the story ends here, the average ratio is computed, and you do not need anything else (thats the use case I generally observe). Now lets see what we would gain with NA's and an NA-skipping mean() : (4) z = x./y z(isnan(z))=NA; m = mean(z) I do not see any advantage of this. Assuming that - in some rare case - you might want doing some exception handling that relays on the NaN-propagation. I general, this is not the case for the shadowed statistical functions. The standard solution (works only w/o the NaN-tb): (5) m = mean(x./y) if isnan(m), do_exception_handling(); end; The following solution will always work, independently whether the NaN-tb is installed or not: (6) z = x./y if any(isnan(z)),do_exception_handling(); end; m = mean(z) However, the NaN-toolbox provides also the following functionality (7) m = mean(x./y) if flag_nans_occured(), do_exception_handling(); end; The function flag_nans_occured(), tells you whether the input data contained some NaNs. Note that this solution is as short as (4), with the added benefit, that m contains some estimation even if the input contains NaN. Whatsoever, the point is that it's legitimate to skipping NaN's that are caused by a computational operation. And if we use NA's - what would we the gain ? Nothing. >> - NA do not make things simpler but more complicated. There are no clear >> rules when NA and when NAN's should be used. > They are very clear: everything is a NaN unless the user specifically > requests a NA. I've never felt the need for using NA, and I've worked a lot with data containing missing values and NaN. NaN's were always good enough. >> - NA can cause a significant performance penalty. ISNAN() is supported >> by hardware, but ISNA() needs to analyze the payload of NaN which is >> much more complicated. > This is a legitimate concern. Checking for NA is indeed slower by > about a factor of ten. > So, why should anyone want to use NA's ? >> Some final remarks on NA. Nobody is using it, and I really do not see >> any advantage of NA. If NA's would provide a solution, why do the >> statistical core functions of Octave not use it? > They use it in R. The reason Octave has it was because there was a > desire to have symmetrical data exchange between R and Octave. The > reason that NA is not really used in Octave is because nobody really > found a need to implement this behaviour until now. R has several > functions that accept a predicate that skips NA or maybe NaN, but they > don't skip NaN by default. If R, which is specifically tailored for > statistics, doesn't skip NaN by default, why do you think Octave > should? I do not know why R needs NA's. I know that I do not see a need to distinguish between NaN's and NA's when I handle data with missing values in Octave. And so far, the users of Octave did well without NA's . Alois > - Jordi G. H. ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ Octave-dev mailing list Octave-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/octave-dev