Re: [GRASS-dev] PCA question

2012-06-27 Thread Nikos Alexandris
[Addendum]

Hamish:
  (not talking about pan-sharpening, but in general,)
  how about the situation where you have a map data 
  which is loudly dominated by a signal, and you want to try and remove that
  loud signal so that you can look at the subtle variations caused by a
  different source that the loud signal had been masking?

Nikos:
 
 Yes, this _can_ be a perfect use-case.
 
 Especially if the presence of the feature in question, is in at least one or
 in some of the input dimensions near/close to zero. This last statement is
 based on Pielou's (flawless explanation of how PCA works) [1] and own
 experiences [2].

If the above holds true (near-zero projection, in one or some dimensions, of 
the features of interest), and in some way, these subtle variations do form 
distinct clusters, I'd be keen to try out non-centered PCA and (re-)confirm 
Pielou and myself (claiming, for example, higher Producer's accuracies as in 
--my-- burned area mapping).

Nikos
 
 A separation/isolation attempt of the feature in question from dominant
 variances will be supported.  The loud signal would be channeled among
 the first few PCs and the subtle variations _could_ then be more evident
 in some of the higher order components.
 
 All in all, one has to look at the numbers -- drawing conclusions from the
 PC images is not safe!
 
[...]
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] PCA question

2012-06-26 Thread Markus Metz
On Tue, Jun 26, 2012 at 12:23 AM, Hamish hamis...@yahoo.com wrote:

FWIW,

 See discussion and GRASS vs. R testing/validation w/Nikos from
 c. March 2009. ( trac ticket #430)

Ticket #430 is fixed in all branches.


 See also other open i.pca tickets:
  i.pca fails to center data prior to analysis
  http://trac.osgeo.org/grass/ticket/576

Ticket #576 is fixed in 7.0 and 6.5, awaiting backport to 6.4.


  i.pca: close map before writing out metadata?
  http://trac.osgeo.org/grass/ticket/511

Ticket #511 is fixed in all branches.


  i.pca metadata truncated
  http://trac.osgeo.org/grass/ticket/1279

Ticket #1279 is fixed in trunk, difficult to backport to 6.x because
of the 80x40 limit of the history.

Markus M
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] PCA question

2012-06-26 Thread Nikos Alexandris
Michael Barton:

  Thanks Marcus,
  Please see my post below, starting with I think I've figured it out...

Markus Metz: 

 Yes, I saw, but the formula below I think I've figured it out...
 looked strange to me.
 
  Since I'm working in GRASS instead of R,
 
 [ should be and, not instead of ;) ]
 
 Here is an example based on on the landsat imagery in mapset landsat,
 North Carolina sample dataset:
 
 i.pca without rescaling of the output:
 i.pca input=lsat7_2000_10,lsat7_2000_20,lsat7_2000_30
 output_prefix=lsat7_2000_pc rescale=0,0
 
 The eigenvectors are
 0.4494, 0.5128, 0.7315
 0.6726, 0.3447,-0.6548
 0.5879,-0.7863, 0.1901
 
 this is a square 3x3 matrix. R gives the inverse matrix as
 
 0.4493372, 0.6726549, 0.5879235
 0.5128129, 0.3446144,-0.7862660
 0.7315070,-0.6548316, 0.182
 
 Test with band 10:
 the general formula is
 b1' = (ev1-1' * pc1 + ev1-2' * pc2 + ev1-3' * pc3) + mean(b1)
 
 The mean of lsat7_2000_b10 is 79.925.
 r.mapcalc lsat7_2000_10.pc = (lsat7_2000_pc.1 * 0.4493372 +
 lsat7_2000_pc.2 * 0.6726549 + lsat7_2000_pc.3 * 0.5879235) + 79.925
 
 Difference to original:
 r.mapcalc diff = lsat7_2000_10 - lsat7_2000_10.pc
 
 r.univar diff -g
 n=250325
 null_cells=0
 cells=250325
 min=-0.00140021304849824
 max=0.00723340429107111
 range=0.00863361733956935
 mean=-1.8473599814612e-05
 
 That's close enough, given that the eigenvectors are printed by i.pca
 with limited precision (only 4 decimal places).

The math demonstrated above is pretty clear and Correct!  The above 
should/will go to the wiki, i.e. in a separate Inverse PCA page.  Hopefully 
I'll get to that as soon as possible (=next month!).

A general note though:

in a new produced PC-coordinate system, one has to be very careful about its 
interpretation.  There is no prior knowledge on what (new values) is what 
(i.e. relation to land cover) exactly, as in the way we do understand that 
reflectance, for example, can be associated -- and explained -- with specific 
land cover types or other surface features.

We have to keep in mind that the re-distribution of the information contained 
in the input bands, does strictly follow math rules (i.e. achieve a minimal 
RMSE, extract highest variances into first principal components).  It does 
not care about class labels.

Any manipulation of the data, prior to and while aiming at an inverse PCA, 
might mix-up things in a way that the final data are very difficult to 
interpret 
and, especially, to automatically post-process (i.e. use well known Remote 
Sensing algorithms).

In the case of a pan-sharpening attempt, I would thoroughly test.

 
 Without using R, the inverse of a 3x3 matrix can be manually
 calculated as follows:
 
 original matrix:
 a b c
 d e f
 g h k
 
 inverse matrix:
 (ek - fh)   (ch - bk)   (bf - ce)
 (fg - dk)   (ak - cg)   (cd - af)
 (dh - eg)  (gb - ah)   (ae - bd)
 
 each entry in this inverse matrix needs to be multiplied with
 
 1 / (a(ek - fh) - b(kd - fg) + c(dh - eg))
 
 in order to get the transformation matrix to be applied to the pc scores.
 
 The formula is more complex for matrices with more than 3 dimensions,
 e.g. for i.pca with 6 input bands.
 
  So my questions are:
  
  1) Do I have the inversion correct in terms of how it needs to be
  calculated in GRASS?
 I don't think so. Using your formula and testing for the difference gives me
 min=-546.255208609669
 max=174.771853180844
 range=721.027061790513
 mean=79.9249815357317
 
 a bit too much IMHO.

It is a bit more than a bit too much...

  2) Even though the mean of all bands seems to be subtracted from each
  factor score, does inverting the matrix mean that the mean of *each* band
  is added back to its transformed value? Adding either the mean of all
  original bands or 1 original band seems to produce values that are much
  higher than the original, and so need to be rescaled. Maybe this is OK.

 The mean of each band is added back to each recalculated band. See
 above r.mapcalc formula for band 10.
 
  3) I do not normalize or rescale in i.pca. This seems to make it easier to
  do the inverse PCA with fewer calculations. Is there any reason I
  *should* rescale and/or normalize?

 Normalization applies to the input of i.pca and is by default not
 done. It needs to be done for heterogenous input data such as e.g.
 rainfall, temperature, NDVI, etc. Rescaling is automatically applied
 to the output of i.pca unless explicitly disabled with rescale=0,0.

Adding my 1 old drachma:

PCA works on/with global stats.  One would want to normalise exactly when it 
comes to heterogenous input data, speak largely different ranges of input 
data. For example there is an input band which ranges from 0 to 100 and 
another one which ranges from 0 to 10.  Normalisation would help to make 
different inputs comparable.  That is mainly the reason to apply scaling, as 
far as I understand PCA.

Note, though, that a normalisation will swipe out subtle differences between 
input bands of similar range which might 

Re: [GRASS-dev] PCA question

2012-06-26 Thread Nikos Alexandris
Hamish wrote:
 
 FWIW,
 
  See discussion and GRASS vs. R testing/validation w/Nikos from
  c. March 2009. ( trac ticket #430)

Markus M wrote:
 
 Ticket #430 is fixed in all branches.

  See also other open i.pca tickets:
   i.pca fails to center data prior to analysis
   http://trac.osgeo.org/grass/ticket/576

 Ticket #576 is fixed in 7.0 and 6.5, awaiting backport to 6.4.

   i.pca: close map before writing out metadata?
   http://trac.osgeo.org/grass/ticket/511

 Ticket #511 is fixed in all branches.

   i.pca metadata truncated
   http://trac.osgeo.org/grass/ticket/1279

 Ticket #1279 is fixed in trunk, difficult to backport to 6.x because
 of the 80x40 limit of the history.

I, too, noticed, that issues have been fixed. I keep wanting to clean the 
respective wiki-page. I feel it is kind of noisy.  I hope that, with the 
Universe conspiring on my side, I'll jump-in (grass ML) next month again ;-)
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] PCA question

2012-06-26 Thread Nikos Alexandris
Michael Barton:

[...]

 However, my question is about performing an *inverse pca*--getting back to
 the original values from the principal components maps.

Michael, getting back to the original values can _only_ be done if one does 
not touch the data in any of the intermediate steps, i.e. Input  EVD (or 
SVD)  Inverse PCA  Original values.

If one alters the data at any step prior to the Eigenanalysis or SVD, I don't 
think it is possible to land back on level 1. From the moment that global 
stats of a multivariarte dataset, subject to PCA, are changed, one will 
probably jump into a new reality.  This means that it takes an extra effort 
to interpret the new stuff.

 The idea of pca sharpening is that you perform a pca, then do an inverse pca
 substituting the pan band for pc1 to enhance the resolution.

I haven't tried PCA sharpening.  So, apologies for my simplistic question(s), 
I just want to understand the trick here.

Which resolution is to be enhanced?  The geometric?  Is it meant to keep PC1 
and mix it with the rest, or keep the Pan and throw away PC1?

Principal Component 1 will contain the highest variance of your input data -- 
which, in fact, is a composition of different amount of information originated 
from all input bands. If you throw that away you are left with a dataset which 
is likely to be useless!


 The equations I show below seem to work, but what I've read indicates that
 it is not possible to *exactly* get the original values back; you can only
 approximate them.

As Markus' demonstration showed in another post, the results can be close 
enough so the differences can be disregarded. As far as I have understood PCA, 
it depends on how many decimals are taken into account, while doing all the 
math and _not_ effectively altering the data at any of the intermediate steps.

Pff, it's been a while I got my hands dirty with PCA and I might forget 
something here.

[...]

  I'm working on a pan sharpening script that will combine your
  i.fusion.brovey with options to do other pan sharpening methods. An
  IHS transformation is easy. I think that a PCA sharpening is doable
  too if I can find an equation to rotate the PCA channels back into
  unrotated space--since i.pca does provide the eigenvectors. 

[...]

Thanks, Nikos
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] PCA question

2012-06-26 Thread Hamish
Nikos wrote:
 Which resolution is to be enhanced?  The geometric?  Is it meant
 to keep PC1 and mix it with the rest, or keep the Pan and throw
 away PC1?
 
 Principal Component 1 will contain the highest variance of
 your input data -- which, in fact, is a composition of different
 amount of information originated from all input bands. If you
 throw that away you are left with a dataset which is likely to
 be useless!

(not talking about pan-sharpening, but in general,)
how about the situation where you have a map data which is
loudly dominated by a signal, and you want to try and remove
that loud signal so that you can look at the subtle variations
caused by a different source that the loud signal had been
masking?

is removing PC1 then back-inverting a suitable method for that
sort of task? or is there another more suitable method?


thanks,
Hamish
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] PCA question

2012-06-26 Thread Nikos Alexandris
 Nikos:

  Which resolution is to be enhanced?  The geometric?  Is it meant
  to keep PC1 and mix it with the rest, or keep the Pan and throw
  away PC1?

  Principal Component 1 will contain the highest variance of
  your input data -- which, in fact, is a composition of different
  amount of information originated from all input bands. If you
  throw that away you are left with a dataset which is likely to
  be useless!

Hamish:

 (not talking about pan-sharpening, but in general,)
 how about the situation where you have a map data

( we are talking about multi-dimensional data, right? )

 which is loudly dominated by a signal, and you want to try and remove that
 loud signal so that you can look at the subtle variations caused by a
 different source that the loud signal had been masking?

Yes, this _can_ be a perfect use-case.

Especially if the presence of the feature in question, is in at least one or 
in some of the input dimensions near/close to zero. This last statement is 
based on Pielou's (flawless explanation of how PCA works) [1] and own 
experiences [2].

A separation/isolation attempt of the feature in question from dominant 
variances will be supported.  The loud signal would be channeled among the 
first few PCs and the subtle variations _could_ then be more evident in some 
of the higher order components.

All in all, one has to look at the numbers -- drawing conclusions from the PC 
images is not safe!
 
 is removing PC1 then back-inverting a suitable method for that sort of task? 

Short answer: yes, it can be, but back-inverting might not be necessary!

Longer story:  if the subtle variations (featueres of low(er) variance, 
rather homogenous stuff) are, as expected, more evident (read: enhanced as 
compared to the original data set) in some of the higher order components, why 
bother to back-invert?  Supervised classification techniques can directly 
operate on selected PCs and attempt to extract whatever is of your interest.

More on the subject of back-inverting -- quotting from Dr. Koutsias paper:

A critical issue in the back-transformation process is the
amount of information taken from each PC axis. The original
spectral pattern of the satellite image is modified to a degree that
depends on the amount of the information taken from each PC axis.

In this work (mapping burned areas), back-transformation coefficients, in the 
range of 0 to 1, were worked-out in order to 'grep' specific percentages (0 to 
100%) from each of the produced PCs and channel them back (via inverse-PCA) to 
a data set _similar_ to the original one, though different to the extent of the 
removed information (excluded PC).

 or is there another more suitable method?

Dunno more... :-(
Kindest regards, Nikos

---
[1] Book: Pielou, E. C. The interpretation of ecological data: a primer on 
classification and ordination Wiley, New York, 1984

[2] Dissertation: Burned area mapping via non-centered PCA using Public 
Domain Data and Free Open Source Software Institut für Forstökonomie, Fakultät 
für Forst- und Umweltwissenschaften, Albert-Ludwigs-Universität Freiburg, 
2011

[3] Koutsias, N.; Mallinis, G.  Karteris, M. A forward/backward principal 
component analysis of Landsat-7 ETM+ data to enhance the spectral signal of 
burnt surfaces ISPRS Journal of Photogrammetry and Remote Sensing, 2009, 64, 
37
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] PCA question

2012-06-26 Thread Nikos Alexandris
Allow to me clarify the sentence below to let it fit into the frame of the 
current thread!

On Tuesday 26 of June 2012 13:24:03 Nikos Alexandris wrote:
 If one alters the data at any step prior to the Eigenanalysis or SVD, I
 don't  think it is possible to land back on level 1.

I meant:  if the data are manipulated, at any step, prior to the back-
inverting EVD or SVD... 

Thanks, Nikos
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] PCA question

2012-06-26 Thread Michael Barton
Hi Nicos
On Jun 26, 2012, at 4:24 AM, Nikos Alexandris wrote:

 Michael Barton:
 
 [...]
 
 However, my question is about performing an *inverse pca*--getting back to
 the original values from the principal components maps.
 
 Michael, getting back to the original values can _only_ be done if one does 
 not touch the data in any of the intermediate steps, i.e. Input  EVD (or 
 SVD)  Inverse PCA  Original values.
 
 If one alters the data at any step prior to the Eigenanalysis or SVD, I don't 
 think it is possible to land back on level 1. From the moment that global 
 stats of a multivariarte dataset, subject to PCA, are changed, one will 
 probably jump into a new reality.  This means that it takes an extra effort 
 to interpret the new stuff.

Right. That is why I'm not doing any alteration of the data after transforming 
to PC's.

 
 The idea of pca sharpening is that you perform a pca, then do an inverse pca
 substituting the pan band for pc1 to enhance the resolution.
 
 I haven't tried PCA sharpening.  So, apologies for my simplistic question(s), 
 I just want to understand the trick here.
 
 Which resolution is to be enhanced?  The geometric?  Is it meant to keep PC1 
 and mix it with the rest, or keep the Pan and throw away PC1?
 
 Principal Component 1 will contain the highest variance of your input data -- 
 which, in fact, is a composition of different amount of information 
 originated 
 from all input bands. If you throw that away you are left with a dataset 
 which 
 is likely to be useless!

The way this works is to:

1) transform 3 lower resolution bands to 3 principal components
2) replace PC1 with the higher resolution panchromatic band (under the 
reasonable assumption that the pan band will include more of the total spectral 
variability than will any more spectrally limited band). Histogram matching the 
pan band to PC1 is recommended here.
3) do an inverse PCA to get back to the original bands with a similar range of 
spectral response but with higher spatial resolution.

There have been--and continue to be--studies of the performance of different 
pan sharpening algorithms from various perspectives. For myself, pan sharpening 
can help with visually resolving more features in greater detail. But this is 
at the cost of making it considerably more difficult to understand what the 
pixel values of the enhanced bands mean.

 
 
 The equations I show below seem to work, but what I've read indicates that
 it is not possible to *exactly* get the original values back; you can only
 approximate them.
 
 As Markus' demonstration showed in another post, the results can be close 
 enough so the differences can be disregarded. As far as I have understood 
 PCA, 
 it depends on how many decimals are taken into account, while doing all the 
 math and _not_ effectively altering the data at any of the intermediate steps.

Yes. Markus' demo made me more comfortable with the algorithm overall. When you 
replace PC1 with the pan band, of course, you don't get back to the original 
values. But the ranges look pretty good now. 

I'll attach the script here in case you want to try it. Ver. 2 and 3 represent 
different kinds of optimizing for calculation speed. v3 only works with a new 
GRASS python function that Hamish committed to trunk yesterday. V2 should work 
with all current versions of GRASS.

Here are some helpful references:

Amarsaikhan, D.,  Douglas, T. (2004). Data fusion and multisource image 
classification. International Journal of Remote Sensing, 25(17), 3529–3539.
Behnia, P. (2005). Comparison between four methods for data fusion of ETM+ 
multispectral and pan images. Geo-spatial Information Science, 8(2), 98–103. 
doi:10.1007/BF02826847
Du, Q., Younan, N. H., King, R.,  Shah, V. P. (2007). On the Performance 
Evaluation of Pan-Sharpening Techniques. Geoscience and Remote Sensing Letters, 
IEEE, 4(4), 518 –522. doi:10.1109/LGRS.2007.896328
Karathanassi, V., Kolokousis, P.,  Ioannidou, S. (2007). A comparison study on 
fusion methods using evaluation indicators. International Journal of Remote 
Sensing, 28(10), 2309–2341. doi:10.1080/01431160600606890

Michael

 

i.pansharpen2
Description: Binary data


i.pansharpen3
Description: Binary data

 
 Pff, it's been a while I got my hands dirty with PCA and I might forget 
 something here.
 
 [...]
 
 I'm working on a pan sharpening script that will combine your
 i.fusion.brovey with options to do other pan sharpening methods. An
 IHS transformation is easy. I think that a PCA sharpening is doable
 too if I can find an equation to rotate the PCA channels back into
 unrotated space--since i.pca does provide the eigenvectors. 
 
 [...]
 
 Thanks, Nikos

_
C. Michael Barton
Visiting Scientist, Integrated Science Program
National Center for Atmospheric Research 
University Corporation for Atmospheric Research
303-497-2889 (voice)

Director, Center for Social Dynamics  Complexity 
Professor of Anthropology, School of Human 

Re: [GRASS-dev] PCA question

2012-06-26 Thread Nikos Alexandris
Thank you for the details Michael!

(cc-ing to Dr. Koutsias, this discussion might be of his interest)

Michael Barton:

  [...]

  However, my question is about performing an *inverse pca*--getting back
  to the original values from the principal components maps.

Nikos:

  Michael, getting back to the original values can _only_ be done if one
  does not touch the data in any of the intermediate steps, i.e. Input 
  EVD (or SVD)  Inverse PCA  Original values.

  If one alters the data at any step prior to the Eigenanalysis or SVD, I
  don't think it is possible to land back on level 1. From the moment
  that global stats of a multivariarte dataset, subject to PCA, are
  changed, one will probably jump into a new reality.  This means that it
  takes an extra effort to interpret the new stuff.

Michael:
 
 Right. That is why I'm not doing any alteration of the data after
 transforming to PC's.
  The idea of pca sharpening is that you perform a pca, then do an inverse
  pca substituting the pan band for pc1 to enhance the resolution.

Nikos:

  I haven't tried PCA sharpening.  So, apologies for my simplistic
  question(s), I just want to understand the trick here.

  Which resolution is to be enhanced?  The geometric?  Is it meant to keep
  PC1 and mix it with the rest, or keep the Pan and throw away PC1?

  Principal Component 1 will contain the highest variance of your input data
  -- which, in fact, is a composition of different amount of information
  originated from all input bands. If you throw that away you are left with
  a dataset which is likely to be useless!

Michael:
 
 The way this works is to:
 
 1) transform 3 lower resolution bands to 3 principal components

 2) replace PC1 with the higher resolution panchromatic band (under the
 reasonable assumption that the pan band will include more of the total
 spectral variability than will any more spectrally limited band). Histogram
 matching the pan band to PC1 is recommended here.

This assumption is, indeed, necessary and sounds pretty rational. Interesting 
stuff.

 3) do an inverse PCA to get back to the original bands with a similar range
 of spectral response but with higher spatial resolution.

 There have been--and continue to be--studies of the performance of different
 pan sharpening algorithms from various perspectives. For myself, pan
 sharpening can help with visually resolving more features in greater
 detail. But this is at the cost of making it considerably more difficult to
 understand what the pixel values of the enhanced bands mean.

  The equations I show below seem to work, but what I've read indicates
  that it is not possible to *exactly* get the original values back; you
  can only approximate them.

Nikos:

  As Markus' demonstration showed in another post, the results can be close
  enough so the differences can be disregarded. As far as I have understood
  PCA, it depends on how many decimals are taken into account, while doing
  all the math and _not_ effectively altering the data at any of the
  intermediate steps.

Michael:

 Yes. Markus' demo made me more comfortable with the algorithm overall. When
 you replace PC1 with the pan band, of course, you don't get back to the
 original values. But the ranges look pretty good now.
 
 I'll attach the script here in case you want to try it. Ver. 2 and 3
 represent different kinds of optimizing for calculation speed. v3 only
 works with a new GRASS python function that Hamish committed to trunk
 yesterday. V2 should work with all current versions of GRASS.

Thanks a lot. Noted on my ToDo list: Check MichaelB's pan-sharpening scripts 
(after next week).

 Here are some helpful references:

 Amarsaikhan, D.,  Douglas, T. (2004). Data fusion and multisource image
 classification. International Journal of Remote Sensing, 25(17), 3529–3539.

 Behnia, P. (2005). Comparison between four methods for data fusion of ETM+
 multispectral and pan images. Geo-spatial Information Science, 8(2),
 98–103.  doi:  10.1007/BF02826847

 Du, Q., Younan, N. H., King, R.,  Shah, V. P. (2007). On the Performance
 Evaluation of Pan-Sharpening Techniques. Geoscience and Remote Sensing
 Letters, IEEE, 4(4), 518 –522.  doi:  10.1109/LGRS.2007.896328

 Karathanassi, V., Kolokousis, P.,  Ioannidou, S. (2007). A comparison study
 on fusion methods using evaluation indicators. International Journal of
 Remote Sensing, 28(10), 2309–2341.  doi:  10.1080/01431160600606890
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] PCA question

2012-06-25 Thread Markus Metz
An inverse PCA can be regarded as the inverse of a transformation
using matrix notation. PC scores are calculated with

b = A a

with A being the transformation matrix composed of the Eigenvectors, a
being the vector of the original values and b the PC scores. What you
now need is inverse of A, A^-1. The original values can then be
retrieved with

A^-1 b = a

A^-1 is the inverse of the transformation matrix A which you can get
in R with solve(A).

For a PCA, the original values are usually shifted to the mean and
optionally scaled to stddev before computing the Eigenvectors. The
mean shift is always performed by i.pca, scaling is optional. That
means that A^-1 b gives the original values shifted to the mean and
maybe scaled, and the mean of each original band needs to be added to
get the original values used as input to i.pca. With scaling applied,
the shifted values need to be multiplied by the stddev for each
original band.

HTH,

Markus M


On Tue, Jun 19, 2012 at 12:46 AM, Michael Barton michael.bar...@asu.edu wrote:
 The constant (i.e., the band mean) was in the pca primer that someone sent me 
 a link to in this discussion. Using the Eigenvectors resulting from i.pca, I 
 can achieve the results of i.pca using my formula below. This is essentially 
 the same as your formula minus the constant--which doesn't really make much 
 (of any) difference in the final result.

 However, my question is about performing an *inverse pca*--getting back to 
 the original values from the principal components maps. The idea of pca 
 sharpening is that you perform a pca, then do an inverse pca substituting the 
 pan band for pc1 to enhance the resolution. The equations I show below seem 
 to work, but what I've read indicates that it is not possible to *exactly* 
 get the original values back; you can only approximate them.

 Michael


 On Jun 17, 2012, at 10:48 AM, Duccio Rocchini wrote:

 Dear all,
 first, sorry for the delay...
 Here are my 2 cents to be added to the discussion. I re-took in my
 hands the John Jensen book.
 Accordingly

 new brightness values1,1,1 = a1,1*BV1,1,1  +a2,1*BV1,1,2. + an,1*BV1,1,m

 where a=eigenvector and BV=original brightness value.

 I found no evidence for the mean term: - ((b1+b2+b3)/3)

 Michael: do you have a proof/reference for that?

 P.S. thanks for involving me in this discussion which is really stimulating!

 Duccio

 2012/6/7 Michael Barton michael.bar...@asu.edu:

 I think I've figured it out.

 If (ev1-1, ev1-2, ev1-3) are the eigenvectors of the first principal 
 component for 3 imagery bands (b1, b2, b3), the corresponding factor scores 
 of the PC1, PC2, and PC3 maps (fs1, fs2, fs3) are calculated as:

 fs1 = (ev1-1*b1) + (ev1-2*b2) + (ev1-3*b3) - ((b1+b2+b3)/3)
 fs2 = (ev2-1*b1) + (ev2-2*b2) + (ev2-3*b3) - ((b1+b2+b3)/3)
 fs3 = (ev3-1*b1) + (ev3-2*b2) + (ev3-3*b3) - ((b1+b2+b3)/3)

 So to do an inverse PCA on the same data you need to do the following:

 b1' = (fs1/ev1-1) + (fs2/ev2-1) + (fs3/ev3-1)
 b2' = (fs1/ev1-2) + (fs2/ev2-2) + (fs3/ev3-2)
 b3' = (fs1/ev1-3) + (fs2/ev2-3) + (fs3/ev3-3)

 Adding the constant back on doesn't really seem to matter because you need 
 to rescale b1' to b1, b2' to b2, and b3' to b3 anyway.

 Michael

 On Jun 7, 2012, at 1:55 AM, Markus Neteler wrote:

 Hi Duccio,

 On Wed, Jun 6, 2012 at 11:39 PM, Michael Barton michael.bar...@asu.edu 
 wrote:
 On Jun 6, 2012, at 2:20 PM, Markus Neteler wrote:
 On Wed, Jun 6, 2012 at 5:09 PM, Michael Barton michael.bar...@asu.edu 
 wrote:
 ...
 I'm working on a pan sharpening script that will combine your 
 i.fusion.brovey with options to do other pan sharpening methods. An IHS 
 transformation is easy. I think that a PCA sharpening is doable too if 
 I can find an equation to rotate the PCA channels back into unrotated 
 space--since i.pca does provide the eigenvectors.

 Maybe there is material in (see m.eigenvector)
 http://grass.osgeo.org/wiki/Principal_Components_Analysis

 This has a lot of good information and ALMOST has what I need. Everything 
 I read suggests that there is a straightforward way to get the original 
 values from the factor scores if you have the eigenvectors. But no one 
 I've yet found provides the equation or algorithm to do it.

 @Duccio: any idea about this by chance?

 thanks
 Markus

 _
 C. Michael Barton
 Visiting Scientist, Integrated Science Program
 National Center for Atmospheric Research 
 University Corporation for Atmospheric Research
 303-497-2889 (voice)

 Director, Center for Social Dynamics  Complexity
 Professor of Anthropology, School of Human Evolution  Social Change
 Arizona State University
 www: http://www.public.asu.edu/~cmbarton, http://csdc.asu.edu




 --
 Duccio Rocchini, PhD

 http://gis.cri.fmach.it/rocchini/

 Fondazione Edmund Mach
 Research and Innovation Centre
 Department of Biodiversity and Molecular Ecology
 GIS and Remote Sensing Unit
 Via Mach 1, 38010 San Michele all'Adige (TN) - Italy
 Phone +39 0461 615 570
 

Re: [GRASS-dev] PCA question

2012-06-25 Thread Michael Barton
Thanks Marcus,

Please see my post below, starting with I think I've figured it out...

Since I'm working in GRASS instead of R, I need to work back from the values 
given by i.pca. Also, I'm not up on matrix algebra notation. So it would be a 
help to me to see exactly what is meant by an inverted Eigenvector matrix. I 
tried to give how I interpreted this in GRASS map algebra terms below. Is this 
correct? Using my simple algebraic notation, the Eigenvector matrix produced by 
i.pca would be:

ev1-1 ev1-2 ev1-3
ev2-1 ev2-2 ev2-3
ev3-1 ev3-2 ev3-3

Where ev1-2 refers to the eigenvector in the first row and second column

Spot checking the factor scores against original values, with no scaling, seems 
to indicate that it is the mean of all original bands for each pixel that are 
subtracted from the factor scores in i.pca. This is somewhat different from 
what you suggest.  

So my questions are:

1) Do I have the inversion correct in terms of how it needs to be calculated in 
GRASS?
2) Even though the mean of all bands seems to be subtracted from each factor 
score, does inverting the matrix mean that the mean of *each* band is added 
back to its transformed value? Adding either the mean of all original bands or 
1 original band seems to produce values that are much higher than the original, 
and so need to be rescaled. Maybe this is OK. 
3) I do not normalize or rescale in i.pca. This seems to make it easier to do 
the inverse PCA with fewer calculations. Is there any reason I *should* rescale 
and/or normalize?

Thanks much
Michael


On Jun 25, 2012, at 10:11 AM, Markus Metz wrote:

 An inverse PCA can be regarded as the inverse of a transformation
 using matrix notation. PC scores are calculated with
 
 b = A a
 
 with A being the transformation matrix composed of the Eigenvectors, a
 being the vector of the original values and b the PC scores. What you
 now need is inverse of A, A^-1. The original values can then be
 retrieved with
 
 A^-1 b = a
 
 A^-1 is the inverse of the transformation matrix A which you can get
 in R with solve(A).
 
 For a PCA, the original values are usually shifted to the mean and
 optionally scaled to stddev before computing the Eigenvectors. The
 mean shift is always performed by i.pca, scaling is optional. That
 means that A^-1 b gives the original values shifted to the mean and
 maybe scaled, and the mean of each original band needs to be added to
 get the original values used as input to i.pca. With scaling applied,
 the shifted values need to be multiplied by the stddev for each
 original band.
 
 HTH,
 
 Markus M
 
 
 On Tue, Jun 19, 2012 at 12:46 AM, Michael Barton michael.bar...@asu.edu 
 wrote:
 The constant (i.e., the band mean) was in the pca primer that someone sent 
 me a link to in this discussion. Using the Eigenvectors resulting from 
 i.pca, I can achieve the results of i.pca using my formula below. This is 
 essentially the same as your formula minus the constant--which doesn't 
 really make much (of any) difference in the final result.
 
 However, my question is about performing an *inverse pca*--getting back to 
 the original values from the principal components maps. The idea of pca 
 sharpening is that you perform a pca, then do an inverse pca substituting 
 the pan band for pc1 to enhance the resolution. The equations I show below 
 seem to work, but what I've read indicates that it is not possible to 
 *exactly* get the original values back; you can only approximate them.
 
 Michael
 
 
 On Jun 17, 2012, at 10:48 AM, Duccio Rocchini wrote:
 
 Dear all,
 first, sorry for the delay...
 Here are my 2 cents to be added to the discussion. I re-took in my
 hands the John Jensen book.
 Accordingly
 
 new brightness values1,1,1 = a1,1*BV1,1,1  +a2,1*BV1,1,2. + an,1*BV1,1,m
 
 where a=eigenvector and BV=original brightness value.
 
 I found no evidence for the mean term: - ((b1+b2+b3)/3)
 
 Michael: do you have a proof/reference for that?
 
 P.S. thanks for involving me in this discussion which is really stimulating!
 
 Duccio
 
 2012/6/7 Michael Barton michael.bar...@asu.edu:
 
 I think I've figured it out.
 
 If (ev1-1, ev1-2, ev1-3) are the eigenvectors of the first principal 
 component for 3 imagery bands (b1, b2, b3), the corresponding factor 
 scores of the PC1, PC2, and PC3 maps (fs1, fs2, fs3) are calculated as:
 
 fs1 = (ev1-1*b1) + (ev1-2*b2) + (ev1-3*b3) - ((b1+b2+b3)/3)
 fs2 = (ev2-1*b1) + (ev2-2*b2) + (ev2-3*b3) - ((b1+b2+b3)/3)
 fs3 = (ev3-1*b1) + (ev3-2*b2) + (ev3-3*b3) - ((b1+b2+b3)/3)
 
 So to do an inverse PCA on the same data you need to do the following:
 
 b1' = (fs1/ev1-1) + (fs2/ev2-1) + (fs3/ev3-1)
 b2' = (fs1/ev1-2) + (fs2/ev2-2) + (fs3/ev3-2)
 b3' = (fs1/ev1-3) + (fs2/ev2-3) + (fs3/ev3-3)
 
 Adding the constant back on doesn't really seem to matter because you need 
 to rescale b1' to b1, b2' to b2, and b3' to b3 anyway.
 
 Michael
 
 On Jun 7, 2012, at 1:55 AM, Markus Neteler wrote:
 
 Hi Duccio,
 
 On Wed, Jun 6, 2012 at 11:39 

Re: [GRASS-dev] PCA question

2012-06-25 Thread Markus Metz
On Mon, Jun 25, 2012 at 6:47 PM, Michael Barton michael.bar...@asu.edu wrote:
 Thanks Marcus,

 Please see my post below, starting with I think I've figured it out...

Yes, I saw, but the formula below I think I've figured it out...
looked strange to me.

 Since I'm working in GRASS instead of R,
[ should be and, not instead of ;) ]

Here is an example based on on the landsat imagery in mapset landsat,
North Carolina sample dataset:

i.pca without rescaling of the output:
i.pca input=lsat7_2000_10,lsat7_2000_20,lsat7_2000_30
output_prefix=lsat7_2000_pc rescale=0,0

The eigenvectors are
0.4494, 0.5128, 0.7315
0.6726, 0.3447,-0.6548
0.5879,-0.7863, 0.1901

this is a square 3x3 matrix. R gives the inverse matrix as

0.4493372, 0.6726549, 0.5879235
0.5128129, 0.3446144,-0.7862660
0.7315070,-0.6548316, 0.182

Test with band 10:
the general formula is
b1' = (ev1-1' * pc1 + ev1-2' * pc2 + ev1-3' * pc3) + mean(b1)

The mean of lsat7_2000_b10 is 79.925.
r.mapcalc lsat7_2000_10.pc = (lsat7_2000_pc.1 * 0.4493372 +
lsat7_2000_pc.2 * 0.6726549 + lsat7_2000_pc.3 * 0.5879235) + 79.925

Difference to original:
r.mapcalc diff = lsat7_2000_10 - lsat7_2000_10.pc

r.univar diff -g
n=250325
null_cells=0
cells=250325
min=-0.00140021304849824
max=0.00723340429107111
range=0.00863361733956935
mean=-1.8473599814612e-05

That's close enough, given that the eigenvectors are printed by i.pca
with limited precision (only 4 decimal places).

Without using R, the inverse of a 3x3 matrix can be manually
calculated as follows:

original matrix:
a b c
d e f
g h k

inverse matrix:
(ek - fh)   (ch - bk)   (bf - ce)
(fg - dk)   (ak - cg)   (cd - af)
(dh - eg)  (gb - ah)   (ae - bd)

each entry in this inverse matrix needs to be multiplied with

1 / (a(ek - fh) - b(kd - fg) + c(dh - eg))

in order to get the transformation matrix to be applied to the pc scores.

The formula is more complex for matrices with more than 3 dimensions,
e.g. for i.pca with 6 input bands.


 So my questions are:

 1) Do I have the inversion correct in terms of how it needs to be calculated 
 in GRASS?

I don't think so. Using your formula and testing for the difference gives me
min=-546.255208609669
max=174.771853180844
range=721.027061790513
mean=79.9249815357317

a bit too much IMHO.

 2) Even though the mean of all bands seems to be subtracted from each factor 
 score, does inverting the matrix mean that the mean of *each* band is added 
 back to its transformed value? Adding either the mean of all original bands 
 or 1 original band seems to produce values that are much higher than the 
 original, and so need to be rescaled. Maybe this is OK.

The mean of each band is added back to each recalculated band. See
above r.mapcalc formula for band 10.

 3) I do not normalize or rescale in i.pca. This seems to make it easier to do 
 the inverse PCA with fewer calculations. Is there any reason I *should* 
 rescale and/or normalize?

Normalization applies to the input of i.pca and is by default not
done. It needs to be done for heterogenous input data such as e.g.
rainfall, temperature, NDVI, etc. Rescaling is automatically applied
to the output of i.pca unless explicitly disabled with rescale=0,0.

Markus M


 Thanks much
 Michael


 On Jun 25, 2012, at 10:11 AM, Markus Metz wrote:

 An inverse PCA can be regarded as the inverse of a transformation
 using matrix notation. PC scores are calculated with

 b = A a

 with A being the transformation matrix composed of the Eigenvectors, a
 being the vector of the original values and b the PC scores. What you
 now need is inverse of A, A^-1. The original values can then be
 retrieved with

 A^-1 b = a

 A^-1 is the inverse of the transformation matrix A which you can get
 in R with solve(A).

 For a PCA, the original values are usually shifted to the mean and
 optionally scaled to stddev before computing the Eigenvectors. The
 mean shift is always performed by i.pca, scaling is optional. That
 means that A^-1 b gives the original values shifted to the mean and
 maybe scaled, and the mean of each original band needs to be added to
 get the original values used as input to i.pca. With scaling applied,
 the shifted values need to be multiplied by the stddev for each
 original band.

 HTH,

 Markus M


 On Tue, Jun 19, 2012 at 12:46 AM, Michael Barton michael.bar...@asu.edu 
 wrote:
 The constant (i.e., the band mean) was in the pca primer that someone sent 
 me a link to in this discussion. Using the Eigenvectors resulting from 
 i.pca, I can achieve the results of i.pca using my formula below. This is 
 essentially the same as your formula minus the constant--which doesn't 
 really make much (of any) difference in the final result.

 However, my question is about performing an *inverse pca*--getting back to 
 the original values from the principal components maps. The idea of pca 
 sharpening is that you perform a pca, then do an inverse pca substituting 
 the pan band for pc1 to enhance the resolution. The equations I show 

Re: [GRASS-dev] PCA question

2012-06-25 Thread Michael Barton
Markus,

This is very helpful. See below for a couple additional questions.

Thanks much
Michael

On Jun 25, 2012, at 12:03 PM, Markus Metz wrote:

 On Mon, Jun 25, 2012 at 6:47 PM, Michael Barton michael.bar...@asu.edu 
 wrote:
 Thanks Marcus,
 
 Please see my post below, starting with I think I've figured it out...
 
 Yes, I saw, but the formula below I think I've figured it out...
 looked strange to me.
 
 Since I'm working in GRASS instead of R,
 [ should be and, not instead of ;) ]
 
 Here is an example based on on the landsat imagery in mapset landsat,
 North Carolina sample dataset:
 
 i.pca without rescaling of the output:
 i.pca input=lsat7_2000_10,lsat7_2000_20,lsat7_2000_30
 output_prefix=lsat7_2000_pc rescale=0,0
 
 The eigenvectors are
 0.4494, 0.5128, 0.7315
 0.6726, 0.3447,-0.6548
 0.5879,-0.7863, 0.1901
 
 this is a square 3x3 matrix. R gives the inverse matrix as
 
 0.4493372, 0.6726549, 0.5879235
 0.5128129, 0.3446144,-0.7862660
 0.7315070,-0.6548316, 0.182
 
 Test with band 10:
 the general formula is
 b1' = (ev1-1' * pc1 + ev1-2' * pc2 + ev1-3' * pc3) + mean(b1)
 
 The mean of lsat7_2000_b10 is 79.925.
 r.mapcalc lsat7_2000_10.pc = (lsat7_2000_pc.1 * 0.4493372 +
 lsat7_2000_pc.2 * 0.6726549 + lsat7_2000_pc.3 * 0.5879235) + 79.925
 
 Difference to original:
 r.mapcalc diff = lsat7_2000_10 - lsat7_2000_10.pc
 
 r.univar diff -g
 n=250325
 null_cells=0
 cells=250325
 min=-0.00140021304849824
 max=0.00723340429107111
 range=0.00863361733956935
 mean=-1.8473599814612e-05
 
 That's close enough, given that the eigenvectors are printed by i.pca
 with limited precision (only 4 decimal places).
 
 Without using R, the inverse of a 3x3 matrix can be manually
 calculated as follows:
 
 original matrix:
 a b c
 d e f
 g h k
 
 inverse matrix:
 (ek - fh)   (ch - bk)   (bf - ce)
 (fg - dk)   (ak - cg)   (cd - af)
 (dh - eg)  (gb - ah)   (ae - bd)

Your example from R above *looks* like the inverse of matrix...

a b c
d e f
g h k

is (given rounding errors) ...

a d g
b e h
c f k

That is, transpose rows and columns.
Is this just a coincidence or a reasonable approximation?

 
 each entry in this inverse matrix needs to be multiplied with
 
 1 / (a(ek - fh) - b(kd - fg) + c(dh - eg))

by a(ek - fh)... , you mean a = factor score 1 of a cell, b = factor score 2 of 
a cell, c = factor score 3 of a cell (not eigenvectors a,b,c or the matrix), 
right?

 
 in order to get the transformation matrix to be applied to the pc scores.
 
 The formula is more complex for matrices with more than 3 dimensions,
 e.g. for i.pca with 6 input bands.
 
 
 So my questions are:
 
 1) Do I have the inversion correct in terms of how it needs to be calculated 
 in GRASS?
 
 I don't think so. Using your formula and testing for the difference gives me
 min=-546.255208609669
 max=174.771853180844
 range=721.027061790513
 mean=79.9249815357317
 
 a bit too much IMHO.

Right. Looks like I'll need to use r.univar to calculate the mean of each band 
to add back in. Thanks again. I will try this out manually and test before 
updating the pan sharpening routine.

 
 2) Even though the mean of all bands seems to be subtracted from each factor 
 score, does inverting the matrix mean that the mean of *each* band is added 
 back to its transformed value? Adding either the mean of all original bands 
 or 1 original band seems to produce values that are much higher than the 
 original, and so need to be rescaled. Maybe this is OK.
 
 The mean of each band is added back to each recalculated band. See
 above r.mapcalc formula for band 10.
 
 3) I do not normalize or rescale in i.pca. This seems to make it easier to 
 do the inverse PCA with fewer calculations. Is there any reason I *should* 
 rescale and/or normalize?
 
 Normalization applies to the input of i.pca and is by default not
 done. It needs to be done for heterogenous input data such as e.g.
 rainfall, temperature, NDVI, etc. Rescaling is automatically applied
 to the output of i.pca unless explicitly disabled with rescale=0,0.

Thanks. That is what I thought.

Michael

 
 Markus M
 
 
 Thanks much
 Michael
 
 
 On Jun 25, 2012, at 10:11 AM, Markus Metz wrote:
 
 An inverse PCA can be regarded as the inverse of a transformation
 using matrix notation. PC scores are calculated with
 
 b = A a
 
 with A being the transformation matrix composed of the Eigenvectors, a
 being the vector of the original values and b the PC scores. What you
 now need is inverse of A, A^-1. The original values can then be
 retrieved with
 
 A^-1 b = a
 
 A^-1 is the inverse of the transformation matrix A which you can get
 in R with solve(A).
 
 For a PCA, the original values are usually shifted to the mean and
 optionally scaled to stddev before computing the Eigenvectors. The
 mean shift is always performed by i.pca, scaling is optional. That
 means that A^-1 b gives the original values shifted to the mean and
 maybe scaled, and the mean of each original band needs to be added to
 get the original 

Re: [GRASS-dev] PCA question

2012-06-25 Thread Markus Metz
On Mon, Jun 25, 2012 at 8:21 PM, Michael Barton michael.bar...@asu.edu wrote:
 On Jun 25, 2012, at 12:03 PM, Markus Metz wrote:

 Here is an example based on on the landsat imagery in mapset landsat,
 North Carolina sample dataset:

 i.pca without rescaling of the output:
 i.pca input=lsat7_2000_10,lsat7_2000_20,lsat7_2000_30
 output_prefix=lsat7_2000_pc rescale=0,0

 The eigenvectors are
 0.4494, 0.5128, 0.7315
 0.6726, 0.3447,-0.6548
 0.5879,-0.7863, 0.1901

 this is a square 3x3 matrix. R gives the inverse matrix as

 0.4493372, 0.6726549, 0.5879235
 0.5128129, 0.3446144,-0.7862660
 0.7315070,-0.6548316, 0.182

[...]

 Without using R, the inverse of a 3x3 matrix can be manually
 calculated as follows:

 original matrix:
 a b c
 d e f
 g h k

 inverse matrix:
 (ek - fh)   (ch - bk)   (bf - ce)
 (fg - dk)   (ak - cg)   (cd - af)
 (dh - eg)  (gb - ah)   (ae - bd)

 Your example from R above *looks* like the inverse of matrix...

 a b c
 d e f
 g h k

 is (given rounding errors) ...

 a d g
 b e h
 c f k

 That is, transpose rows and columns.
 Is this just a coincidence or a reasonable approximation?

Oops, right. A PCA uses an orthogonal transformation, in which case
the inverse of the transformation matrix is identical to the transpose
of the transformation matrix. Easy solution.


 each entry in this inverse matrix needs to be multiplied with

 1 / (a(ek - fh) - b(kd - fg) + c(dh - eg))

 by a(ek - fh)... , you mean a = factor score 1 of a cell, b = factor score 2 
 of a cell, c = factor score 3 of a cell (not eigenvectors a,b,c or the 
 matrix), right?

No, a,b,c are here Eigenvector values. But this formula is the general
case for the inverse of a 3x3 matrix and not needed here, because the
inverse and the transpose are the same.

Markus M



 in order to get the transformation matrix to be applied to the pc scores.

 The formula is more complex for matrices with more than 3 dimensions,
 e.g. for i.pca with 6 input bands.


 So my questions are:

 1) Do I have the inversion correct in terms of how it needs to be 
 calculated in GRASS?

 I don't think so. Using your formula and testing for the difference gives me
 min=-546.255208609669
 max=174.771853180844
 range=721.027061790513
 mean=79.9249815357317

 a bit too much IMHO.

 Right. Looks like I'll need to use r.univar to calculate the mean of each 
 band to add back in. Thanks again. I will try this out manually and test 
 before updating the pan sharpening routine.


 2) Even though the mean of all bands seems to be subtracted from each 
 factor score, does inverting the matrix mean that the mean of *each* band 
 is added back to its transformed value? Adding either the mean of all 
 original bands or 1 original band seems to produce values that are much 
 higher than the original, and so need to be rescaled. Maybe this is OK.

 The mean of each band is added back to each recalculated band. See
 above r.mapcalc formula for band 10.

 3) I do not normalize or rescale in i.pca. This seems to make it easier to 
 do the inverse PCA with fewer calculations. Is there any reason I *should* 
 rescale and/or normalize?

 Normalization applies to the input of i.pca and is by default not
 done. It needs to be done for heterogenous input data such as e.g.
 rainfall, temperature, NDVI, etc. Rescaling is automatically applied
 to the output of i.pca unless explicitly disabled with rescale=0,0.

 Thanks. That is what I thought.

 Michael


 Markus M


 Thanks much
 Michael


 On Jun 25, 2012, at 10:11 AM, Markus Metz wrote:

 An inverse PCA can be regarded as the inverse of a transformation
 using matrix notation. PC scores are calculated with

 b = A a

 with A being the transformation matrix composed of the Eigenvectors, a
 being the vector of the original values and b the PC scores. What you
 now need is inverse of A, A^-1. The original values can then be
 retrieved with

 A^-1 b = a

 A^-1 is the inverse of the transformation matrix A which you can get
 in R with solve(A).

 For a PCA, the original values are usually shifted to the mean and
 optionally scaled to stddev before computing the Eigenvectors. The
 mean shift is always performed by i.pca, scaling is optional. That
 means that A^-1 b gives the original values shifted to the mean and
 maybe scaled, and the mean of each original band needs to be added to
 get the original values used as input to i.pca. With scaling applied,
 the shifted values need to be multiplied by the stddev for each
 original band.

 HTH,

 Markus M


 On Tue, Jun 19, 2012 at 12:46 AM, Michael Barton michael.bar...@asu.edu 
 wrote:
 The constant (i.e., the band mean) was in the pca primer that someone 
 sent me a link to in this discussion. Using the Eigenvectors resulting 
 from i.pca, I can achieve the results of i.pca using my formula below. 
 This is essentially the same as your formula minus the constant--which 
 doesn't really make much (of any) difference in the final result.

 However, my question is about performing 

Re: [GRASS-dev] PCA question

2012-06-25 Thread Michael Barton
Great news. Thanks.

Michael

On Jun 25, 2012, at 12:51 PM, Markus Metz wrote:

 On Mon, Jun 25, 2012 at 8:21 PM, Michael Barton michael.bar...@asu.edu 
 wrote:
 On Jun 25, 2012, at 12:03 PM, Markus Metz wrote:
 
 Here is an example based on on the landsat imagery in mapset landsat,
 North Carolina sample dataset:
 
 i.pca without rescaling of the output:
 i.pca input=lsat7_2000_10,lsat7_2000_20,lsat7_2000_30
 output_prefix=lsat7_2000_pc rescale=0,0
 
 The eigenvectors are
 0.4494, 0.5128, 0.7315
 0.6726, 0.3447,-0.6548
 0.5879,-0.7863, 0.1901
 
 this is a square 3x3 matrix. R gives the inverse matrix as
 
 0.4493372, 0.6726549, 0.5879235
 0.5128129, 0.3446144,-0.7862660
 0.7315070,-0.6548316, 0.182
 
 [...]
 
 Without using R, the inverse of a 3x3 matrix can be manually
 calculated as follows:
 
 original matrix:
 a b c
 d e f
 g h k
 
 inverse matrix:
 (ek - fh)   (ch - bk)   (bf - ce)
 (fg - dk)   (ak - cg)   (cd - af)
 (dh - eg)  (gb - ah)   (ae - bd)
 
 Your example from R above *looks* like the inverse of matrix...
 
 a b c
 d e f
 g h k
 
 is (given rounding errors) ...
 
 a d g
 b e h
 c f k
 
 That is, transpose rows and columns.
 Is this just a coincidence or a reasonable approximation?
 
 Oops, right. A PCA uses an orthogonal transformation, in which case
 the inverse of the transformation matrix is identical to the transpose
 of the transformation matrix. Easy solution.
 
 
 each entry in this inverse matrix needs to be multiplied with
 
 1 / (a(ek - fh) - b(kd - fg) + c(dh - eg))
 
 by a(ek - fh)... , you mean a = factor score 1 of a cell, b = factor score 2 
 of a cell, c = factor score 3 of a cell (not eigenvectors a,b,c or the 
 matrix), right?
 
 No, a,b,c are here Eigenvector values. But this formula is the general
 case for the inverse of a 3x3 matrix and not needed here, because the
 inverse and the transpose are the same.
 
 Markus M
 
 
 
 in order to get the transformation matrix to be applied to the pc scores.
 
 The formula is more complex for matrices with more than 3 dimensions,
 e.g. for i.pca with 6 input bands.
 
 
 So my questions are:
 
 1) Do I have the inversion correct in terms of how it needs to be 
 calculated in GRASS?
 
 I don't think so. Using your formula and testing for the difference gives me
 min=-546.255208609669
 max=174.771853180844
 range=721.027061790513
 mean=79.9249815357317
 
 a bit too much IMHO.
 
 Right. Looks like I'll need to use r.univar to calculate the mean of each 
 band to add back in. Thanks again. I will try this out manually and test 
 before updating the pan sharpening routine.
 
 
 2) Even though the mean of all bands seems to be subtracted from each 
 factor score, does inverting the matrix mean that the mean of *each* band 
 is added back to its transformed value? Adding either the mean of all 
 original bands or 1 original band seems to produce values that are much 
 higher than the original, and so need to be rescaled. Maybe this is OK.
 
 The mean of each band is added back to each recalculated band. See
 above r.mapcalc formula for band 10.
 
 3) I do not normalize or rescale in i.pca. This seems to make it easier to 
 do the inverse PCA with fewer calculations. Is there any reason I *should* 
 rescale and/or normalize?
 
 Normalization applies to the input of i.pca and is by default not
 done. It needs to be done for heterogenous input data such as e.g.
 rainfall, temperature, NDVI, etc. Rescaling is automatically applied
 to the output of i.pca unless explicitly disabled with rescale=0,0.
 
 Thanks. That is what I thought.
 
 Michael
 
 
 Markus M
 
 
 Thanks much
 Michael
 
 
 On Jun 25, 2012, at 10:11 AM, Markus Metz wrote:
 
 An inverse PCA can be regarded as the inverse of a transformation
 using matrix notation. PC scores are calculated with
 
 b = A a
 
 with A being the transformation matrix composed of the Eigenvectors, a
 being the vector of the original values and b the PC scores. What you
 now need is inverse of A, A^-1. The original values can then be
 retrieved with
 
 A^-1 b = a
 
 A^-1 is the inverse of the transformation matrix A which you can get
 in R with solve(A).
 
 For a PCA, the original values are usually shifted to the mean and
 optionally scaled to stddev before computing the Eigenvectors. The
 mean shift is always performed by i.pca, scaling is optional. That
 means that A^-1 b gives the original values shifted to the mean and
 maybe scaled, and the mean of each original band needs to be added to
 get the original values used as input to i.pca. With scaling applied,
 the shifted values need to be multiplied by the stddev for each
 original band.
 
 HTH,
 
 Markus M
 
 
 On Tue, Jun 19, 2012 at 12:46 AM, Michael Barton michael.bar...@asu.edu 
 wrote:
 The constant (i.e., the band mean) was in the pca primer that someone 
 sent me a link to in this discussion. Using the Eigenvectors resulting 
 from i.pca, I can achieve the results of i.pca using my formula below. 
 This is essentially the same as your 

Re: [GRASS-dev] PCA question

2012-06-25 Thread Hamish
Hi,

FWIW, it's a known feature that GRASS's i.pca and R's version
give transposed output. It's just cosmetic and the row and column
headers tell you what's what. GRASS lists the principal
components row-wise, R presents them column-wise.

See discussion and GRASS vs. R testing/validation w/Nikos from
c. March 2009. ( trac ticket #430)


See also other open i.pca tickets:
  i.pca fails to center data prior to analysis
  http://trac.osgeo.org/grass/ticket/576

  i.pca: close map before writing out metadata?
  http://trac.osgeo.org/grass/ticket/511

  i.pca metadata truncated
  http://trac.osgeo.org/grass/ticket/1279


Hamish
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] PCA question

2012-06-25 Thread Hamish
Hamish:
 See discussion and GRASS vs. R testing/validation w/Nikos
 from c. March 2009. ( trac ticket #430)

and the (colorized) discussion here:

http://grass.osgeo.org/wiki/Principal_Components_Analysis#SVD_solution


Hamish
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] PCA question

2012-06-25 Thread Michael Barton
Fortunately, those are not issues for pan sharpening.

Michael

On Jun 25, 2012, at 4:23 PM, Hamish wrote:

 Hi,
 
 FWIW, it's a known feature that GRASS's i.pca and R's version
 give transposed output. It's just cosmetic and the row and column
 headers tell you what's what. GRASS lists the principal
 components row-wise, R presents them column-wise.
 
 See discussion and GRASS vs. R testing/validation w/Nikos from
 c. March 2009. ( trac ticket #430)
 
 
 See also other open i.pca tickets:
  i.pca fails to center data prior to analysis
  http://trac.osgeo.org/grass/ticket/576
 
  i.pca: close map before writing out metadata?
  http://trac.osgeo.org/grass/ticket/511
 
  i.pca metadata truncated
  http://trac.osgeo.org/grass/ticket/1279
 
 
 Hamish

_
C. Michael Barton
Visiting Scientist, Integrated Science Program
National Center for Atmospheric Research 
University Corporation for Atmospheric Research
303-497-2889 (voice)

Director, Center for Social Dynamics  Complexity 
Professor of Anthropology, School of Human Evolution  Social Change
Arizona State University
www: http://www.public.asu.edu/~cmbarton, http://csdc.asu.edu

___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


Re: [GRASS-dev] PCA question

2012-06-18 Thread Michael Barton
The constant (i.e., the band mean) was in the pca primer that someone sent me a 
link to in this discussion. Using the Eigenvectors resulting from i.pca, I can 
achieve the results of i.pca using my formula below. This is essentially the 
same as your formula minus the constant--which doesn't really make much (of 
any) difference in the final result.

However, my question is about performing an *inverse pca*--getting back to the 
original values from the principal components maps. The idea of pca sharpening 
is that you perform a pca, then do an inverse pca substituting the pan band for 
pc1 to enhance the resolution. The equations I show below seem to work, but 
what I've read indicates that it is not possible to *exactly* get the original 
values back; you can only approximate them.

Michael


On Jun 17, 2012, at 10:48 AM, Duccio Rocchini wrote:

 Dear all,
 first, sorry for the delay...
 Here are my 2 cents to be added to the discussion. I re-took in my
 hands the John Jensen book.
 Accordingly
 
 new brightness values1,1,1 = a1,1*BV1,1,1  +a2,1*BV1,1,2. + an,1*BV1,1,m
 
 where a=eigenvector and BV=original brightness value.
 
 I found no evidence for the mean term: - ((b1+b2+b3)/3)
 
 Michael: do you have a proof/reference for that?
 
 P.S. thanks for involving me in this discussion which is really stimulating!
 
 Duccio
 
 2012/6/7 Michael Barton michael.bar...@asu.edu:
 
 I think I've figured it out.
 
 If (ev1-1, ev1-2, ev1-3) are the eigenvectors of the first principal 
 component for 3 imagery bands (b1, b2, b3), the corresponding factor scores 
 of the PC1, PC2, and PC3 maps (fs1, fs2, fs3) are calculated as:
 
 fs1 = (ev1-1*b1) + (ev1-2*b2) + (ev1-3*b3) - ((b1+b2+b3)/3)
 fs2 = (ev2-1*b1) + (ev2-2*b2) + (ev2-3*b3) - ((b1+b2+b3)/3)
 fs3 = (ev3-1*b1) + (ev3-2*b2) + (ev3-3*b3) - ((b1+b2+b3)/3)
 
 So to do an inverse PCA on the same data you need to do the following:
 
 b1' = (fs1/ev1-1) + (fs2/ev2-1) + (fs3/ev3-1)
 b2' = (fs1/ev1-2) + (fs2/ev2-2) + (fs3/ev3-2)
 b3' = (fs1/ev1-3) + (fs2/ev2-3) + (fs3/ev3-3)
 
 Adding the constant back on doesn't really seem to matter because you need 
 to rescale b1' to b1, b2' to b2, and b3' to b3 anyway.
 
 Michael
 
 On Jun 7, 2012, at 1:55 AM, Markus Neteler wrote:
 
 Hi Duccio,
 
 On Wed, Jun 6, 2012 at 11:39 PM, Michael Barton michael.bar...@asu.edu 
 wrote:
 On Jun 6, 2012, at 2:20 PM, Markus Neteler wrote:
 On Wed, Jun 6, 2012 at 5:09 PM, Michael Barton michael.bar...@asu.edu 
 wrote:
 ...
 I'm working on a pan sharpening script that will combine your 
 i.fusion.brovey with options to do other pan sharpening methods. An IHS 
 transformation is easy. I think that a PCA sharpening is doable too if I 
 can find an equation to rotate the PCA channels back into unrotated 
 space--since i.pca does provide the eigenvectors.
 
 Maybe there is material in (see m.eigenvector)
 http://grass.osgeo.org/wiki/Principal_Components_Analysis
 
 This has a lot of good information and ALMOST has what I need. Everything 
 I read suggests that there is a straightforward way to get the original 
 values from the factor scores if you have the eigenvectors. But no one 
 I've yet found provides the equation or algorithm to do it.
 
 @Duccio: any idea about this by chance?
 
 thanks
 Markus
 
 _
 C. Michael Barton
 Visiting Scientist, Integrated Science Program
 National Center for Atmospheric Research 
 University Corporation for Atmospheric Research
 303-497-2889 (voice)
 
 Director, Center for Social Dynamics  Complexity
 Professor of Anthropology, School of Human Evolution  Social Change
 Arizona State University
 www: http://www.public.asu.edu/~cmbarton, http://csdc.asu.edu
 
 
 
 
 -- 
 Duccio Rocchini, PhD
 
 http://gis.cri.fmach.it/rocchini/
 
 Fondazione Edmund Mach
 Research and Innovation Centre
 Department of Biodiversity and Molecular Ecology
 GIS and Remote Sensing Unit
 Via Mach 1, 38010 San Michele all'Adige (TN) - Italy
 Phone +39 0461 615 570
 ducciorocch...@gmail.com
 duccio.rocch...@fmach.it
 skype: duccio.rocchini

_
C. Michael Barton
Visiting Scientist, Integrated Science Program
National Center for Atmospheric Research 
University Consortium for Atmospheric Research
303-497-2889 (voice)

Director, Center for Social Dynamics  Complexity 
Professor of Anthropology, School of Human Evolution  Social Change
Arizona State University
www: http://www.public.asu.edu/~cmbarton, http://csdc.asu.edu





___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev