Re: [OctDev] pdist function

Jaroslav Hajek Wed, 12 Nov 2008 09:46:20 -0800

On Wed, Nov 12, 2008 at 6:32 PM, Francesco Potortì <[EMAIL PROTECTED]> wrote:
>>a nice job (I wanted this function several times).
>
> If you use it, and especially if you have the possibility of making
> further checks against Matlab's behaviour, please let me know, or just
> correct it.  I have not uploaded it yet, going home now.  Hope tomorrow.
>
>>I have two comments:
>>1. I think that instead of
>>if (...)
>>  switch ...
>>   case ...
>>     return
>>   case ...
>>     return
>>endif
>>
>>...code...
>>
>>you should use
>>if (...)
>>  switch ...
>>   case ...
>>   case ...
>>else
>> ...code...
>>endif
>>
>>I.e., no return in each case. This is more consistent with our
>>recommended coding style, where we try to minimize the number of exit
>>points from a function.
>
> I tried several alternatives, at last this one seemed the less ugly.  In
> fact, we have three cases for funcname: either one of the eleven known
> strings, or a function name string, or a function name handle.  The
> latter two are treated the same way using feval.  This way, the program
> flow is straightforward, I have no code duplication, and all the return
> statements are at the end of a case label, i.e., they are not scattered
> through the code, which is what makes maintenance difficult.  And
> unfortunately I cannot manage the function handle case into the
> otherwise label of switch, because switch barfs when comparing a
> function handle to a string.
>
> Mmmh.  I could avoid the return statementes and check for the function
> return value having been assigned at the end of the switch...
>


I was thinking about this:

 if (ischar (distfun))
   order = nchoosek(1:rows(x),2);
   Xi = order(:,1);
   Yi = order(:,2);
   X = x';
   y = feval (["pdist_" distfun], x', Xi, Yi, varargin{:});
   switch (distfun)
     case "euclidean"
       diff = X(:,Xi) - X(:,Yi);
       d = sqrt (sumsq (diff));
     case "seuclidean"
       diff = X(:,Xi) - X(:,Yi);
       weights = inv (diag (var (X')));
       d = sqrt (sum ((weights * diff) .* diff));
     case "mahalanobis"
       diff = X(:,Xi) - X(:,Yi);
       weights = inv (cov (X'));
       d = sqrt (sum ((weights * diff) .* diff));
     case "cityblock"
       diff = X(:,Xi) - X(:,Yi);
       d = sum (abs (diff));
     case "minkowski"
       diff = X(:,Xi) - X(:,Yi);
       if (nargin > 2)
         p = varargin{1};
         d = (sum ((abs (diff)).^p)).^(1/p);
       else
         d = sqrt (sumsq (diff)); # default p=2
       endif
     case "cosine"
       prod = X(:,Xi) .* X(:,Yi);
       weights = sumsq (X(:,Xi)) .* sumsq (X(:,Yi));
       d = 1 - sum (prod) ./ sqrt (weights);
     case "correlation"
       corr = cor (X);
       d = 1 - corr (sub2ind (size (corr), Xi, Yi))';
     case "spearman"
       corr = spearman (X);
       d = 1 - corr (sub2ind (size (corr), Xi, Yi))';
     case "hamming"
       diff = logical (X(:,Xi) - X(:,Yi));
       d = sum (diff) / rows (X);
     case "jaccard"
       diff = logical (X(:,Xi) - X(:,Yi));
       weights = X(:,Xi) | X(:,Yi);
       d = sum (diff & weights) ./ sum (weights);
     case "chebychev"
       diff = X(:,Xi) - X(:,Yi);
       d = max (abs (diff));
   endswitch
 else

   ## Distfun is a function handle or the name of an external function
    l = rows (x);
   y = zeros (1, nchoosek (l, 2))
   idx = 1;
   for ii = 1:l-1
     for jj = ii+1:l
       y(idx++) = feval (distfun, x(ii,:), x, varargin{:})(jj);
     endfor
   endfor
 endif

I don't see any problem here. Am I missing something? Note that in
Octave, switch cases are exclusive (they don't fall through).

>>2. The changeset http://hg.savannah.gnu.org/hgweb/octave/rev/b11c31849b44
>>equipped `norm' with the ability to compute column or row norms of a matrix.
>>This can be used to your explicit norm expressions like
>>`sqrt (sumsq (diff))' or `(sum ((abs (diff)).^p)).^(1/p)'
>>by
>>`norm (diff, 'cols')' or `norm (diff, p, 'cols')', respectively.
>
> Thank you, I will look into those.
>
>>Similarly for the 1- and Inf- norm.
>>Using the latter will probably be faster (avoids temporary matrices)
>>and will also be robust w.r.t. overflow (i.e. the 20-norm of numbers
>>of order 1e20 won't be Inf).
>
> Sorry, I do not follow you here.  What are the 1- and Inf- norms, and
> what is their relationship with pdist?
The minkowski 1- and Inf- norms are related to cityblock and
mahalanobis distance, i.e. for a matrix diff,
`sum (abs (diff))' is `norm (diff, 1, 'cols')'.
`max (abs (diff))' is `norm (diff, Inf, 'cols')'.


>
>>This is going to be a feature of 3.2.0, so if you're fine with pdist
>>depending on 3.2.0, I think you may exploit it.
>
> Maybe I can write commented code, to be uncommented out later on, when
> 3.2 becomes widespread.

Alternatively, you can check `ver' and use the new code if ver is >=
3.2.0, or you can even check directly for the feature using `try'. The
advantage is that you will actually get the new functionality. The old
code may be removed at some point in the future. Your choice, of
course.

regards

-- 
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Octave-dev mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: [OctDev] pdist function

Reply via email to