On 2012-08-03 19:21, Max Brister wrote:
> On Thu, Aug 2, 2012 at 4:27 PM, Alois Schloegl<alois.schlo...@ist.ac.at>  
> wrote:
>> On 2012-08-02 22:43, Jordi Gutiérrez Hermoso wrote:
>>>
>>> On 2 August 2012 16:40, Alois Schloegl<alois.schlo...@ist.ac.at>   wrote:
>>>
>>>> 3) after installing the NaN-toolbox,  sum([1 NaN 2]) will still result in
>>>> NaN. But with the NaN-toolbox you have an additional function
>>>> sumskipnan([1,NaN,2]) which gives 3.
>>>
>>>
>>> Why don't you name all of your functions this way and not shadow core
>>> functions, then? For example, why do you overwrite sumsq?
>>>
>>> - Jordi G. H.
>>
>>
>>
>> Ok, sumsq() is a borderline case because you might argue that is not
>> necessarily a statistical function.
>>
>> But for the other functions, why should one need to thing about whether to
>> use var() or nanvar(), mean() or nanmean(), std() or nanstd() ? There is no
>> need for the NaN-propagating version, you always should use the nan-skipping
>> version.
>
> This is not always true. For example, lets say I want to write a
> quick, simple test to see if rand is working. I might write something
> like
>
> assert (mean (rand (10000)(:)), .5, .1); # the mean value of rand
> should be around .5
>
> I expect this case to fail if rand produces a NaN.


Hi Max,


thanks for your interest and your attempt to find a solution.

rand() does never produce NaN, so it's not a good example. But lets 
assume there is some myrand()- functions, and it can produce NaN, I'd 
expect that NaN is an encoding for missing values. In that case, mean() 
should ignore the NaN's.

If you need to test for NaN's, do it in an explicit way using 
any(isnan(x(:))). That's much cleaner, and others will know that your 
code is testing for NaN's. The problem with implicit NaN-propagation is 
that it is very difficult to know, whether the NaN-handling has been is 
a conscious decision or is just a arbitrary side-effect.


>
>> When one tries to solve a challenging problem, why should one need to thing
>> about whether to use var(), nanvar(), or some_other_varfunction() ? There is
>> just no need such proliferation of function names - all doing basically the
>> same.
>
> As far as the user is concerned, I agree with you. If a user installs
> the NaN package when they 'var' they want the nan skipping version. I
> do not think we should be spitting out a bunch of warnings as what the
> user wants is unambiguous.
>
> On the other hand, this creates an issue for scripts in core. Your
> functions are doing basically, but not quite the same thing. When
> writing scripts in core I expect NaNs to be propagated. It leads to a
> maintenance nightmare if you can not be sure of exactly how a function
> behaves (see gnulib/autotools).


The functions in core and the NaN-tb are doing the same, except for the 
NaN-propagation thing. Even the core function do not mention in the 
documentation that NaN's are propagated (see help mean, help var). So, 
the NaN-handling is really not strictly defined. Applications that rely 
on NaN-propagation depended on some undocumented behaviour. If you need 
to test for NaN's, one should do it in an explicit way, e.g. using 
any(isnan(x(:))). That avoids any ambiguity about NaN handling in your 
code.


>
>> Concerning you suggestion "to partition the namespaces (classes)". To me
>> this sounds like 2nd class citizens. But perhaps it's just me, and being not
>> familiar with this technique. In that case, it would be best if someone else
>> would transform the NaN-tb into a more compatible mode. I'm open for
>> suggestions.
>
> A more practical solution would be to use a package [1]. The main
> problem here is that Octave does not support packages (yet). What do
> you think about having NaN inside of a package?
>
> [1] http://www.mathworks.com/help/techdoc/matlab_oop/brfynt_-1.html
>

I do not know - the concept of "package" must be quite new, and I've 
never used it. It seems to me that it is another way to move the issue 
to some other namespace/class/packages.

These "solutions" have one thing in common, they are just a bad 
compromise, to sidestep the really address - namely what kind of 
NaN-handling should be the default for statistical functions.

However, if you believe that there is some need for a compromise 
solution, a solution based on packages might be a good idea. In that 
case, just do it.


>
>> Alois
>
> Max Brister


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Octave-dev mailing list
Octave-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/octave-dev

Reply via email to