Gurunath -

The proper way to handle this is to apply the "stats" method to the result
of the "diagonal" slice. It is fast, succinct, and statistically correct:

my ($mean, $stdev) = $my_huge_matrix->diagonal(0, 1)->stats;

In slightly more detail from the PDL shell (ok, from prima-repl, but who's
keeping score?):

> $matrix = pdl q[ 2 2 2 ; 3 1 0 ; 2 5 1 ]
> $diagonal = $matrix->diagonal(0, 1)
> print "Matrix is $matrix and diagonal is $diagonal"
Matrix is
[
 [2 2 2]
 [3 1 0]
 [2 5 1]
]
 and diagonal is [2 1 1]
> ($mean, $rms) = $diagonal->stats
> print "Diagonal's mean is $mean and diagonal's standard deviation is $rms"
Diagonal's mean is 1.33333333333333 and diagonal's standard deviation is
0.577350269189626

David


P. S. There is something important to note about the $rms value returned as
the second argument by stats. A calculation of the standard deviation would
begin by summing the squares of the differences between the mean and the
values:

my $sq_diff_sum = sum( ($piddle - $piddle->average)**2 );

A navie calculation would then divide by the *number of elements*, and take
the square root:

my $st_dev = sqrt( $sq_diff_sum / $sq_diff_sum->nelem);

That is the so-called population standard deviation, which assumes that you
got the average value from the *population*, not the *sample*. For large
data sets, the discrepancy is not important. For small data sets, this is
only accurate if you have the entire population of interest. If you only
have a subset of the population, then the statistically correct value comes
by dividing instead by the *number of elements minus 1*, i.e.

my $st_dev = sqrt( $sq_diff_sum / ($sq_diff_sum->nelem - 1) );

In this case---finding the mean and standard deviation of the elements
along the diagonal---either method is acceptable, and you should report
which value you calculated when you show your results. For sample-based
statistical calculations, where you only take a sample of your whole
population, the latter calculation is correct.

I say all of this just to point out that the second return value of the
stats and statsover methods is the sample RMS deviation from the mean, i.e.
it divides by N-1. (The docs call this the population RMS deviation for
reasons not clear to me.) If you want to use the population RMS, you should
get the 7th argument, like so:

# Perl array slice notation
my ($mean, $rms) = ($diagonal->stats)[0, 6];
# Using undef to discard unwanted returned values:
my ($mean, undef, undef, undef, undef, undef, $rms) = $diagonal->stats;

-- 
 "Debugging is twice as hard as writing the code in the first place.
  Therefore, if you write the code as cleverly as possible, you are,
  by definition, not smart enough to debug it." -- Brian Kernighan
_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Reply via email to