Re: [GRASS-dev] PCA question
[Addendum] Hamish: (not talking about pan-sharpening, but in general,) how about the situation where you have a map data which is loudly dominated by a signal, and you want to try and remove that loud signal so that you can look at the subtle variations caused by a different source that the loud signal had been masking? Nikos: Yes, this _can_ be a perfect use-case. Especially if the presence of the feature in question, is in at least one or in some of the input dimensions near/close to zero. This last statement is based on Pielou's (flawless explanation of how PCA works) [1] and own experiences [2]. If the above holds true (near-zero projection, in one or some dimensions, of the features of interest), and in some way, these subtle variations do form distinct clusters, I'd be keen to try out non-centered PCA and (re-)confirm Pielou and myself (claiming, for example, higher Producer's accuracies as in --my-- burned area mapping). Nikos A separation/isolation attempt of the feature in question from dominant variances will be supported. The loud signal would be channeled among the first few PCs and the subtle variations _could_ then be more evident in some of the higher order components. All in all, one has to look at the numbers -- drawing conclusions from the PC images is not safe! [...] ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] PCA question
On Tue, Jun 26, 2012 at 12:23 AM, Hamish hamis...@yahoo.com wrote: FWIW, See discussion and GRASS vs. R testing/validation w/Nikos from c. March 2009. ( trac ticket #430) Ticket #430 is fixed in all branches. See also other open i.pca tickets: i.pca fails to center data prior to analysis http://trac.osgeo.org/grass/ticket/576 Ticket #576 is fixed in 7.0 and 6.5, awaiting backport to 6.4. i.pca: close map before writing out metadata? http://trac.osgeo.org/grass/ticket/511 Ticket #511 is fixed in all branches. i.pca metadata truncated http://trac.osgeo.org/grass/ticket/1279 Ticket #1279 is fixed in trunk, difficult to backport to 6.x because of the 80x40 limit of the history. Markus M ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] PCA question
Michael Barton: Thanks Marcus, Please see my post below, starting with I think I've figured it out... Markus Metz: Yes, I saw, but the formula below I think I've figured it out... looked strange to me. Since I'm working in GRASS instead of R, [ should be and, not instead of ;) ] Here is an example based on on the landsat imagery in mapset landsat, North Carolina sample dataset: i.pca without rescaling of the output: i.pca input=lsat7_2000_10,lsat7_2000_20,lsat7_2000_30 output_prefix=lsat7_2000_pc rescale=0,0 The eigenvectors are 0.4494, 0.5128, 0.7315 0.6726, 0.3447,-0.6548 0.5879,-0.7863, 0.1901 this is a square 3x3 matrix. R gives the inverse matrix as 0.4493372, 0.6726549, 0.5879235 0.5128129, 0.3446144,-0.7862660 0.7315070,-0.6548316, 0.182 Test with band 10: the general formula is b1' = (ev1-1' * pc1 + ev1-2' * pc2 + ev1-3' * pc3) + mean(b1) The mean of lsat7_2000_b10 is 79.925. r.mapcalc lsat7_2000_10.pc = (lsat7_2000_pc.1 * 0.4493372 + lsat7_2000_pc.2 * 0.6726549 + lsat7_2000_pc.3 * 0.5879235) + 79.925 Difference to original: r.mapcalc diff = lsat7_2000_10 - lsat7_2000_10.pc r.univar diff -g n=250325 null_cells=0 cells=250325 min=-0.00140021304849824 max=0.00723340429107111 range=0.00863361733956935 mean=-1.8473599814612e-05 That's close enough, given that the eigenvectors are printed by i.pca with limited precision (only 4 decimal places). The math demonstrated above is pretty clear and Correct! The above should/will go to the wiki, i.e. in a separate Inverse PCA page. Hopefully I'll get to that as soon as possible (=next month!). A general note though: in a new produced PC-coordinate system, one has to be very careful about its interpretation. There is no prior knowledge on what (new values) is what (i.e. relation to land cover) exactly, as in the way we do understand that reflectance, for example, can be associated -- and explained -- with specific land cover types or other surface features. We have to keep in mind that the re-distribution of the information contained in the input bands, does strictly follow math rules (i.e. achieve a minimal RMSE, extract highest variances into first principal components). It does not care about class labels. Any manipulation of the data, prior to and while aiming at an inverse PCA, might mix-up things in a way that the final data are very difficult to interpret and, especially, to automatically post-process (i.e. use well known Remote Sensing algorithms). In the case of a pan-sharpening attempt, I would thoroughly test. Without using R, the inverse of a 3x3 matrix can be manually calculated as follows: original matrix: a b c d e f g h k inverse matrix: (ek - fh) (ch - bk) (bf - ce) (fg - dk) (ak - cg) (cd - af) (dh - eg) (gb - ah) (ae - bd) each entry in this inverse matrix needs to be multiplied with 1 / (a(ek - fh) - b(kd - fg) + c(dh - eg)) in order to get the transformation matrix to be applied to the pc scores. The formula is more complex for matrices with more than 3 dimensions, e.g. for i.pca with 6 input bands. So my questions are: 1) Do I have the inversion correct in terms of how it needs to be calculated in GRASS? I don't think so. Using your formula and testing for the difference gives me min=-546.255208609669 max=174.771853180844 range=721.027061790513 mean=79.9249815357317 a bit too much IMHO. It is a bit more than a bit too much... 2) Even though the mean of all bands seems to be subtracted from each factor score, does inverting the matrix mean that the mean of *each* band is added back to its transformed value? Adding either the mean of all original bands or 1 original band seems to produce values that are much higher than the original, and so need to be rescaled. Maybe this is OK. The mean of each band is added back to each recalculated band. See above r.mapcalc formula for band 10. 3) I do not normalize or rescale in i.pca. This seems to make it easier to do the inverse PCA with fewer calculations. Is there any reason I *should* rescale and/or normalize? Normalization applies to the input of i.pca and is by default not done. It needs to be done for heterogenous input data such as e.g. rainfall, temperature, NDVI, etc. Rescaling is automatically applied to the output of i.pca unless explicitly disabled with rescale=0,0. Adding my 1 old drachma: PCA works on/with global stats. One would want to normalise exactly when it comes to heterogenous input data, speak largely different ranges of input data. For example there is an input band which ranges from 0 to 100 and another one which ranges from 0 to 10. Normalisation would help to make different inputs comparable. That is mainly the reason to apply scaling, as far as I understand PCA. Note, though, that a normalisation will swipe out subtle differences between input bands of similar range which might
Re: [GRASS-dev] PCA question
Hamish wrote: FWIW, See discussion and GRASS vs. R testing/validation w/Nikos from c. March 2009. ( trac ticket #430) Markus M wrote: Ticket #430 is fixed in all branches. See also other open i.pca tickets: i.pca fails to center data prior to analysis http://trac.osgeo.org/grass/ticket/576 Ticket #576 is fixed in 7.0 and 6.5, awaiting backport to 6.4. i.pca: close map before writing out metadata? http://trac.osgeo.org/grass/ticket/511 Ticket #511 is fixed in all branches. i.pca metadata truncated http://trac.osgeo.org/grass/ticket/1279 Ticket #1279 is fixed in trunk, difficult to backport to 6.x because of the 80x40 limit of the history. I, too, noticed, that issues have been fixed. I keep wanting to clean the respective wiki-page. I feel it is kind of noisy. I hope that, with the Universe conspiring on my side, I'll jump-in (grass ML) next month again ;-) ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] PCA question
Michael Barton: [...] However, my question is about performing an *inverse pca*--getting back to the original values from the principal components maps. Michael, getting back to the original values can _only_ be done if one does not touch the data in any of the intermediate steps, i.e. Input EVD (or SVD) Inverse PCA Original values. If one alters the data at any step prior to the Eigenanalysis or SVD, I don't think it is possible to land back on level 1. From the moment that global stats of a multivariarte dataset, subject to PCA, are changed, one will probably jump into a new reality. This means that it takes an extra effort to interpret the new stuff. The idea of pca sharpening is that you perform a pca, then do an inverse pca substituting the pan band for pc1 to enhance the resolution. I haven't tried PCA sharpening. So, apologies for my simplistic question(s), I just want to understand the trick here. Which resolution is to be enhanced? The geometric? Is it meant to keep PC1 and mix it with the rest, or keep the Pan and throw away PC1? Principal Component 1 will contain the highest variance of your input data -- which, in fact, is a composition of different amount of information originated from all input bands. If you throw that away you are left with a dataset which is likely to be useless! The equations I show below seem to work, but what I've read indicates that it is not possible to *exactly* get the original values back; you can only approximate them. As Markus' demonstration showed in another post, the results can be close enough so the differences can be disregarded. As far as I have understood PCA, it depends on how many decimals are taken into account, while doing all the math and _not_ effectively altering the data at any of the intermediate steps. Pff, it's been a while I got my hands dirty with PCA and I might forget something here. [...] I'm working on a pan sharpening script that will combine your i.fusion.brovey with options to do other pan sharpening methods. An IHS transformation is easy. I think that a PCA sharpening is doable too if I can find an equation to rotate the PCA channels back into unrotated space--since i.pca does provide the eigenvectors. [...] Thanks, Nikos ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] PCA question
Nikos wrote: Which resolution is to be enhanced? The geometric? Is it meant to keep PC1 and mix it with the rest, or keep the Pan and throw away PC1? Principal Component 1 will contain the highest variance of your input data -- which, in fact, is a composition of different amount of information originated from all input bands. If you throw that away you are left with a dataset which is likely to be useless! (not talking about pan-sharpening, but in general,) how about the situation where you have a map data which is loudly dominated by a signal, and you want to try and remove that loud signal so that you can look at the subtle variations caused by a different source that the loud signal had been masking? is removing PC1 then back-inverting a suitable method for that sort of task? or is there another more suitable method? thanks, Hamish ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] PCA question
Nikos: Which resolution is to be enhanced? The geometric? Is it meant to keep PC1 and mix it with the rest, or keep the Pan and throw away PC1? Principal Component 1 will contain the highest variance of your input data -- which, in fact, is a composition of different amount of information originated from all input bands. If you throw that away you are left with a dataset which is likely to be useless! Hamish: (not talking about pan-sharpening, but in general,) how about the situation where you have a map data ( we are talking about multi-dimensional data, right? ) which is loudly dominated by a signal, and you want to try and remove that loud signal so that you can look at the subtle variations caused by a different source that the loud signal had been masking? Yes, this _can_ be a perfect use-case. Especially if the presence of the feature in question, is in at least one or in some of the input dimensions near/close to zero. This last statement is based on Pielou's (flawless explanation of how PCA works) [1] and own experiences [2]. A separation/isolation attempt of the feature in question from dominant variances will be supported. The loud signal would be channeled among the first few PCs and the subtle variations _could_ then be more evident in some of the higher order components. All in all, one has to look at the numbers -- drawing conclusions from the PC images is not safe! is removing PC1 then back-inverting a suitable method for that sort of task? Short answer: yes, it can be, but back-inverting might not be necessary! Longer story: if the subtle variations (featueres of low(er) variance, rather homogenous stuff) are, as expected, more evident (read: enhanced as compared to the original data set) in some of the higher order components, why bother to back-invert? Supervised classification techniques can directly operate on selected PCs and attempt to extract whatever is of your interest. More on the subject of back-inverting -- quotting from Dr. Koutsias paper: A critical issue in the back-transformation process is the amount of information taken from each PC axis. The original spectral pattern of the satellite image is modified to a degree that depends on the amount of the information taken from each PC axis. In this work (mapping burned areas), back-transformation coefficients, in the range of 0 to 1, were worked-out in order to 'grep' specific percentages (0 to 100%) from each of the produced PCs and channel them back (via inverse-PCA) to a data set _similar_ to the original one, though different to the extent of the removed information (excluded PC). or is there another more suitable method? Dunno more... :-( Kindest regards, Nikos --- [1] Book: Pielou, E. C. The interpretation of ecological data: a primer on classification and ordination Wiley, New York, 1984 [2] Dissertation: Burned area mapping via non-centered PCA using Public Domain Data and Free Open Source Software Institut für Forstökonomie, Fakultät für Forst- und Umweltwissenschaften, Albert-Ludwigs-Universität Freiburg, 2011 [3] Koutsias, N.; Mallinis, G. Karteris, M. A forward/backward principal component analysis of Landsat-7 ETM+ data to enhance the spectral signal of burnt surfaces ISPRS Journal of Photogrammetry and Remote Sensing, 2009, 64, 37 ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] PCA question
Allow to me clarify the sentence below to let it fit into the frame of the current thread! On Tuesday 26 of June 2012 13:24:03 Nikos Alexandris wrote: If one alters the data at any step prior to the Eigenanalysis or SVD, I don't think it is possible to land back on level 1. I meant: if the data are manipulated, at any step, prior to the back- inverting EVD or SVD... Thanks, Nikos ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] PCA question
Hi Nicos On Jun 26, 2012, at 4:24 AM, Nikos Alexandris wrote: Michael Barton: [...] However, my question is about performing an *inverse pca*--getting back to the original values from the principal components maps. Michael, getting back to the original values can _only_ be done if one does not touch the data in any of the intermediate steps, i.e. Input EVD (or SVD) Inverse PCA Original values. If one alters the data at any step prior to the Eigenanalysis or SVD, I don't think it is possible to land back on level 1. From the moment that global stats of a multivariarte dataset, subject to PCA, are changed, one will probably jump into a new reality. This means that it takes an extra effort to interpret the new stuff. Right. That is why I'm not doing any alteration of the data after transforming to PC's. The idea of pca sharpening is that you perform a pca, then do an inverse pca substituting the pan band for pc1 to enhance the resolution. I haven't tried PCA sharpening. So, apologies for my simplistic question(s), I just want to understand the trick here. Which resolution is to be enhanced? The geometric? Is it meant to keep PC1 and mix it with the rest, or keep the Pan and throw away PC1? Principal Component 1 will contain the highest variance of your input data -- which, in fact, is a composition of different amount of information originated from all input bands. If you throw that away you are left with a dataset which is likely to be useless! The way this works is to: 1) transform 3 lower resolution bands to 3 principal components 2) replace PC1 with the higher resolution panchromatic band (under the reasonable assumption that the pan band will include more of the total spectral variability than will any more spectrally limited band). Histogram matching the pan band to PC1 is recommended here. 3) do an inverse PCA to get back to the original bands with a similar range of spectral response but with higher spatial resolution. There have been--and continue to be--studies of the performance of different pan sharpening algorithms from various perspectives. For myself, pan sharpening can help with visually resolving more features in greater detail. But this is at the cost of making it considerably more difficult to understand what the pixel values of the enhanced bands mean. The equations I show below seem to work, but what I've read indicates that it is not possible to *exactly* get the original values back; you can only approximate them. As Markus' demonstration showed in another post, the results can be close enough so the differences can be disregarded. As far as I have understood PCA, it depends on how many decimals are taken into account, while doing all the math and _not_ effectively altering the data at any of the intermediate steps. Yes. Markus' demo made me more comfortable with the algorithm overall. When you replace PC1 with the pan band, of course, you don't get back to the original values. But the ranges look pretty good now. I'll attach the script here in case you want to try it. Ver. 2 and 3 represent different kinds of optimizing for calculation speed. v3 only works with a new GRASS python function that Hamish committed to trunk yesterday. V2 should work with all current versions of GRASS. Here are some helpful references: Amarsaikhan, D., Douglas, T. (2004). Data fusion and multisource image classification. International Journal of Remote Sensing, 25(17), 3529–3539. Behnia, P. (2005). Comparison between four methods for data fusion of ETM+ multispectral and pan images. Geo-spatial Information Science, 8(2), 98–103. doi:10.1007/BF02826847 Du, Q., Younan, N. H., King, R., Shah, V. P. (2007). On the Performance Evaluation of Pan-Sharpening Techniques. Geoscience and Remote Sensing Letters, IEEE, 4(4), 518 –522. doi:10.1109/LGRS.2007.896328 Karathanassi, V., Kolokousis, P., Ioannidou, S. (2007). A comparison study on fusion methods using evaluation indicators. International Journal of Remote Sensing, 28(10), 2309–2341. doi:10.1080/01431160600606890 Michael i.pansharpen2 Description: Binary data i.pansharpen3 Description: Binary data Pff, it's been a while I got my hands dirty with PCA and I might forget something here. [...] I'm working on a pan sharpening script that will combine your i.fusion.brovey with options to do other pan sharpening methods. An IHS transformation is easy. I think that a PCA sharpening is doable too if I can find an equation to rotate the PCA channels back into unrotated space--since i.pca does provide the eigenvectors. [...] Thanks, Nikos _ C. Michael Barton Visiting Scientist, Integrated Science Program National Center for Atmospheric Research University Corporation for Atmospheric Research 303-497-2889 (voice) Director, Center for Social Dynamics Complexity Professor of Anthropology, School of Human
Re: [GRASS-dev] PCA question
Thank you for the details Michael! (cc-ing to Dr. Koutsias, this discussion might be of his interest) Michael Barton: [...] However, my question is about performing an *inverse pca*--getting back to the original values from the principal components maps. Nikos: Michael, getting back to the original values can _only_ be done if one does not touch the data in any of the intermediate steps, i.e. Input EVD (or SVD) Inverse PCA Original values. If one alters the data at any step prior to the Eigenanalysis or SVD, I don't think it is possible to land back on level 1. From the moment that global stats of a multivariarte dataset, subject to PCA, are changed, one will probably jump into a new reality. This means that it takes an extra effort to interpret the new stuff. Michael: Right. That is why I'm not doing any alteration of the data after transforming to PC's. The idea of pca sharpening is that you perform a pca, then do an inverse pca substituting the pan band for pc1 to enhance the resolution. Nikos: I haven't tried PCA sharpening. So, apologies for my simplistic question(s), I just want to understand the trick here. Which resolution is to be enhanced? The geometric? Is it meant to keep PC1 and mix it with the rest, or keep the Pan and throw away PC1? Principal Component 1 will contain the highest variance of your input data -- which, in fact, is a composition of different amount of information originated from all input bands. If you throw that away you are left with a dataset which is likely to be useless! Michael: The way this works is to: 1) transform 3 lower resolution bands to 3 principal components 2) replace PC1 with the higher resolution panchromatic band (under the reasonable assumption that the pan band will include more of the total spectral variability than will any more spectrally limited band). Histogram matching the pan band to PC1 is recommended here. This assumption is, indeed, necessary and sounds pretty rational. Interesting stuff. 3) do an inverse PCA to get back to the original bands with a similar range of spectral response but with higher spatial resolution. There have been--and continue to be--studies of the performance of different pan sharpening algorithms from various perspectives. For myself, pan sharpening can help with visually resolving more features in greater detail. But this is at the cost of making it considerably more difficult to understand what the pixel values of the enhanced bands mean. The equations I show below seem to work, but what I've read indicates that it is not possible to *exactly* get the original values back; you can only approximate them. Nikos: As Markus' demonstration showed in another post, the results can be close enough so the differences can be disregarded. As far as I have understood PCA, it depends on how many decimals are taken into account, while doing all the math and _not_ effectively altering the data at any of the intermediate steps. Michael: Yes. Markus' demo made me more comfortable with the algorithm overall. When you replace PC1 with the pan band, of course, you don't get back to the original values. But the ranges look pretty good now. I'll attach the script here in case you want to try it. Ver. 2 and 3 represent different kinds of optimizing for calculation speed. v3 only works with a new GRASS python function that Hamish committed to trunk yesterday. V2 should work with all current versions of GRASS. Thanks a lot. Noted on my ToDo list: Check MichaelB's pan-sharpening scripts (after next week). Here are some helpful references: Amarsaikhan, D., Douglas, T. (2004). Data fusion and multisource image classification. International Journal of Remote Sensing, 25(17), 3529–3539. Behnia, P. (2005). Comparison between four methods for data fusion of ETM+ multispectral and pan images. Geo-spatial Information Science, 8(2), 98–103. doi: 10.1007/BF02826847 Du, Q., Younan, N. H., King, R., Shah, V. P. (2007). On the Performance Evaluation of Pan-Sharpening Techniques. Geoscience and Remote Sensing Letters, IEEE, 4(4), 518 –522. doi: 10.1109/LGRS.2007.896328 Karathanassi, V., Kolokousis, P., Ioannidou, S. (2007). A comparison study on fusion methods using evaluation indicators. International Journal of Remote Sensing, 28(10), 2309–2341. doi: 10.1080/01431160600606890 ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] PCA question
An inverse PCA can be regarded as the inverse of a transformation using matrix notation. PC scores are calculated with b = A a with A being the transformation matrix composed of the Eigenvectors, a being the vector of the original values and b the PC scores. What you now need is inverse of A, A^-1. The original values can then be retrieved with A^-1 b = a A^-1 is the inverse of the transformation matrix A which you can get in R with solve(A). For a PCA, the original values are usually shifted to the mean and optionally scaled to stddev before computing the Eigenvectors. The mean shift is always performed by i.pca, scaling is optional. That means that A^-1 b gives the original values shifted to the mean and maybe scaled, and the mean of each original band needs to be added to get the original values used as input to i.pca. With scaling applied, the shifted values need to be multiplied by the stddev for each original band. HTH, Markus M On Tue, Jun 19, 2012 at 12:46 AM, Michael Barton michael.bar...@asu.edu wrote: The constant (i.e., the band mean) was in the pca primer that someone sent me a link to in this discussion. Using the Eigenvectors resulting from i.pca, I can achieve the results of i.pca using my formula below. This is essentially the same as your formula minus the constant--which doesn't really make much (of any) difference in the final result. However, my question is about performing an *inverse pca*--getting back to the original values from the principal components maps. The idea of pca sharpening is that you perform a pca, then do an inverse pca substituting the pan band for pc1 to enhance the resolution. The equations I show below seem to work, but what I've read indicates that it is not possible to *exactly* get the original values back; you can only approximate them. Michael On Jun 17, 2012, at 10:48 AM, Duccio Rocchini wrote: Dear all, first, sorry for the delay... Here are my 2 cents to be added to the discussion. I re-took in my hands the John Jensen book. Accordingly new brightness values1,1,1 = a1,1*BV1,1,1 +a2,1*BV1,1,2. + an,1*BV1,1,m where a=eigenvector and BV=original brightness value. I found no evidence for the mean term: - ((b1+b2+b3)/3) Michael: do you have a proof/reference for that? P.S. thanks for involving me in this discussion which is really stimulating! Duccio 2012/6/7 Michael Barton michael.bar...@asu.edu: I think I've figured it out. If (ev1-1, ev1-2, ev1-3) are the eigenvectors of the first principal component for 3 imagery bands (b1, b2, b3), the corresponding factor scores of the PC1, PC2, and PC3 maps (fs1, fs2, fs3) are calculated as: fs1 = (ev1-1*b1) + (ev1-2*b2) + (ev1-3*b3) - ((b1+b2+b3)/3) fs2 = (ev2-1*b1) + (ev2-2*b2) + (ev2-3*b3) - ((b1+b2+b3)/3) fs3 = (ev3-1*b1) + (ev3-2*b2) + (ev3-3*b3) - ((b1+b2+b3)/3) So to do an inverse PCA on the same data you need to do the following: b1' = (fs1/ev1-1) + (fs2/ev2-1) + (fs3/ev3-1) b2' = (fs1/ev1-2) + (fs2/ev2-2) + (fs3/ev3-2) b3' = (fs1/ev1-3) + (fs2/ev2-3) + (fs3/ev3-3) Adding the constant back on doesn't really seem to matter because you need to rescale b1' to b1, b2' to b2, and b3' to b3 anyway. Michael On Jun 7, 2012, at 1:55 AM, Markus Neteler wrote: Hi Duccio, On Wed, Jun 6, 2012 at 11:39 PM, Michael Barton michael.bar...@asu.edu wrote: On Jun 6, 2012, at 2:20 PM, Markus Neteler wrote: On Wed, Jun 6, 2012 at 5:09 PM, Michael Barton michael.bar...@asu.edu wrote: ... I'm working on a pan sharpening script that will combine your i.fusion.brovey with options to do other pan sharpening methods. An IHS transformation is easy. I think that a PCA sharpening is doable too if I can find an equation to rotate the PCA channels back into unrotated space--since i.pca does provide the eigenvectors. Maybe there is material in (see m.eigenvector) http://grass.osgeo.org/wiki/Principal_Components_Analysis This has a lot of good information and ALMOST has what I need. Everything I read suggests that there is a straightforward way to get the original values from the factor scores if you have the eigenvectors. But no one I've yet found provides the equation or algorithm to do it. @Duccio: any idea about this by chance? thanks Markus _ C. Michael Barton Visiting Scientist, Integrated Science Program National Center for Atmospheric Research University Corporation for Atmospheric Research 303-497-2889 (voice) Director, Center for Social Dynamics Complexity Professor of Anthropology, School of Human Evolution Social Change Arizona State University www: http://www.public.asu.edu/~cmbarton, http://csdc.asu.edu -- Duccio Rocchini, PhD http://gis.cri.fmach.it/rocchini/ Fondazione Edmund Mach Research and Innovation Centre Department of Biodiversity and Molecular Ecology GIS and Remote Sensing Unit Via Mach 1, 38010 San Michele all'Adige (TN) - Italy Phone +39 0461 615 570
Re: [GRASS-dev] PCA question
Thanks Marcus, Please see my post below, starting with I think I've figured it out... Since I'm working in GRASS instead of R, I need to work back from the values given by i.pca. Also, I'm not up on matrix algebra notation. So it would be a help to me to see exactly what is meant by an inverted Eigenvector matrix. I tried to give how I interpreted this in GRASS map algebra terms below. Is this correct? Using my simple algebraic notation, the Eigenvector matrix produced by i.pca would be: ev1-1 ev1-2 ev1-3 ev2-1 ev2-2 ev2-3 ev3-1 ev3-2 ev3-3 Where ev1-2 refers to the eigenvector in the first row and second column Spot checking the factor scores against original values, with no scaling, seems to indicate that it is the mean of all original bands for each pixel that are subtracted from the factor scores in i.pca. This is somewhat different from what you suggest. So my questions are: 1) Do I have the inversion correct in terms of how it needs to be calculated in GRASS? 2) Even though the mean of all bands seems to be subtracted from each factor score, does inverting the matrix mean that the mean of *each* band is added back to its transformed value? Adding either the mean of all original bands or 1 original band seems to produce values that are much higher than the original, and so need to be rescaled. Maybe this is OK. 3) I do not normalize or rescale in i.pca. This seems to make it easier to do the inverse PCA with fewer calculations. Is there any reason I *should* rescale and/or normalize? Thanks much Michael On Jun 25, 2012, at 10:11 AM, Markus Metz wrote: An inverse PCA can be regarded as the inverse of a transformation using matrix notation. PC scores are calculated with b = A a with A being the transformation matrix composed of the Eigenvectors, a being the vector of the original values and b the PC scores. What you now need is inverse of A, A^-1. The original values can then be retrieved with A^-1 b = a A^-1 is the inverse of the transformation matrix A which you can get in R with solve(A). For a PCA, the original values are usually shifted to the mean and optionally scaled to stddev before computing the Eigenvectors. The mean shift is always performed by i.pca, scaling is optional. That means that A^-1 b gives the original values shifted to the mean and maybe scaled, and the mean of each original band needs to be added to get the original values used as input to i.pca. With scaling applied, the shifted values need to be multiplied by the stddev for each original band. HTH, Markus M On Tue, Jun 19, 2012 at 12:46 AM, Michael Barton michael.bar...@asu.edu wrote: The constant (i.e., the band mean) was in the pca primer that someone sent me a link to in this discussion. Using the Eigenvectors resulting from i.pca, I can achieve the results of i.pca using my formula below. This is essentially the same as your formula minus the constant--which doesn't really make much (of any) difference in the final result. However, my question is about performing an *inverse pca*--getting back to the original values from the principal components maps. The idea of pca sharpening is that you perform a pca, then do an inverse pca substituting the pan band for pc1 to enhance the resolution. The equations I show below seem to work, but what I've read indicates that it is not possible to *exactly* get the original values back; you can only approximate them. Michael On Jun 17, 2012, at 10:48 AM, Duccio Rocchini wrote: Dear all, first, sorry for the delay... Here are my 2 cents to be added to the discussion. I re-took in my hands the John Jensen book. Accordingly new brightness values1,1,1 = a1,1*BV1,1,1 +a2,1*BV1,1,2. + an,1*BV1,1,m where a=eigenvector and BV=original brightness value. I found no evidence for the mean term: - ((b1+b2+b3)/3) Michael: do you have a proof/reference for that? P.S. thanks for involving me in this discussion which is really stimulating! Duccio 2012/6/7 Michael Barton michael.bar...@asu.edu: I think I've figured it out. If (ev1-1, ev1-2, ev1-3) are the eigenvectors of the first principal component for 3 imagery bands (b1, b2, b3), the corresponding factor scores of the PC1, PC2, and PC3 maps (fs1, fs2, fs3) are calculated as: fs1 = (ev1-1*b1) + (ev1-2*b2) + (ev1-3*b3) - ((b1+b2+b3)/3) fs2 = (ev2-1*b1) + (ev2-2*b2) + (ev2-3*b3) - ((b1+b2+b3)/3) fs3 = (ev3-1*b1) + (ev3-2*b2) + (ev3-3*b3) - ((b1+b2+b3)/3) So to do an inverse PCA on the same data you need to do the following: b1' = (fs1/ev1-1) + (fs2/ev2-1) + (fs3/ev3-1) b2' = (fs1/ev1-2) + (fs2/ev2-2) + (fs3/ev3-2) b3' = (fs1/ev1-3) + (fs2/ev2-3) + (fs3/ev3-3) Adding the constant back on doesn't really seem to matter because you need to rescale b1' to b1, b2' to b2, and b3' to b3 anyway. Michael On Jun 7, 2012, at 1:55 AM, Markus Neteler wrote: Hi Duccio, On Wed, Jun 6, 2012 at 11:39
Re: [GRASS-dev] PCA question
On Mon, Jun 25, 2012 at 6:47 PM, Michael Barton michael.bar...@asu.edu wrote: Thanks Marcus, Please see my post below, starting with I think I've figured it out... Yes, I saw, but the formula below I think I've figured it out... looked strange to me. Since I'm working in GRASS instead of R, [ should be and, not instead of ;) ] Here is an example based on on the landsat imagery in mapset landsat, North Carolina sample dataset: i.pca without rescaling of the output: i.pca input=lsat7_2000_10,lsat7_2000_20,lsat7_2000_30 output_prefix=lsat7_2000_pc rescale=0,0 The eigenvectors are 0.4494, 0.5128, 0.7315 0.6726, 0.3447,-0.6548 0.5879,-0.7863, 0.1901 this is a square 3x3 matrix. R gives the inverse matrix as 0.4493372, 0.6726549, 0.5879235 0.5128129, 0.3446144,-0.7862660 0.7315070,-0.6548316, 0.182 Test with band 10: the general formula is b1' = (ev1-1' * pc1 + ev1-2' * pc2 + ev1-3' * pc3) + mean(b1) The mean of lsat7_2000_b10 is 79.925. r.mapcalc lsat7_2000_10.pc = (lsat7_2000_pc.1 * 0.4493372 + lsat7_2000_pc.2 * 0.6726549 + lsat7_2000_pc.3 * 0.5879235) + 79.925 Difference to original: r.mapcalc diff = lsat7_2000_10 - lsat7_2000_10.pc r.univar diff -g n=250325 null_cells=0 cells=250325 min=-0.00140021304849824 max=0.00723340429107111 range=0.00863361733956935 mean=-1.8473599814612e-05 That's close enough, given that the eigenvectors are printed by i.pca with limited precision (only 4 decimal places). Without using R, the inverse of a 3x3 matrix can be manually calculated as follows: original matrix: a b c d e f g h k inverse matrix: (ek - fh) (ch - bk) (bf - ce) (fg - dk) (ak - cg) (cd - af) (dh - eg) (gb - ah) (ae - bd) each entry in this inverse matrix needs to be multiplied with 1 / (a(ek - fh) - b(kd - fg) + c(dh - eg)) in order to get the transformation matrix to be applied to the pc scores. The formula is more complex for matrices with more than 3 dimensions, e.g. for i.pca with 6 input bands. So my questions are: 1) Do I have the inversion correct in terms of how it needs to be calculated in GRASS? I don't think so. Using your formula and testing for the difference gives me min=-546.255208609669 max=174.771853180844 range=721.027061790513 mean=79.9249815357317 a bit too much IMHO. 2) Even though the mean of all bands seems to be subtracted from each factor score, does inverting the matrix mean that the mean of *each* band is added back to its transformed value? Adding either the mean of all original bands or 1 original band seems to produce values that are much higher than the original, and so need to be rescaled. Maybe this is OK. The mean of each band is added back to each recalculated band. See above r.mapcalc formula for band 10. 3) I do not normalize or rescale in i.pca. This seems to make it easier to do the inverse PCA with fewer calculations. Is there any reason I *should* rescale and/or normalize? Normalization applies to the input of i.pca and is by default not done. It needs to be done for heterogenous input data such as e.g. rainfall, temperature, NDVI, etc. Rescaling is automatically applied to the output of i.pca unless explicitly disabled with rescale=0,0. Markus M Thanks much Michael On Jun 25, 2012, at 10:11 AM, Markus Metz wrote: An inverse PCA can be regarded as the inverse of a transformation using matrix notation. PC scores are calculated with b = A a with A being the transformation matrix composed of the Eigenvectors, a being the vector of the original values and b the PC scores. What you now need is inverse of A, A^-1. The original values can then be retrieved with A^-1 b = a A^-1 is the inverse of the transformation matrix A which you can get in R with solve(A). For a PCA, the original values are usually shifted to the mean and optionally scaled to stddev before computing the Eigenvectors. The mean shift is always performed by i.pca, scaling is optional. That means that A^-1 b gives the original values shifted to the mean and maybe scaled, and the mean of each original band needs to be added to get the original values used as input to i.pca. With scaling applied, the shifted values need to be multiplied by the stddev for each original band. HTH, Markus M On Tue, Jun 19, 2012 at 12:46 AM, Michael Barton michael.bar...@asu.edu wrote: The constant (i.e., the band mean) was in the pca primer that someone sent me a link to in this discussion. Using the Eigenvectors resulting from i.pca, I can achieve the results of i.pca using my formula below. This is essentially the same as your formula minus the constant--which doesn't really make much (of any) difference in the final result. However, my question is about performing an *inverse pca*--getting back to the original values from the principal components maps. The idea of pca sharpening is that you perform a pca, then do an inverse pca substituting the pan band for pc1 to enhance the resolution. The equations I show
Re: [GRASS-dev] PCA question
Markus, This is very helpful. See below for a couple additional questions. Thanks much Michael On Jun 25, 2012, at 12:03 PM, Markus Metz wrote: On Mon, Jun 25, 2012 at 6:47 PM, Michael Barton michael.bar...@asu.edu wrote: Thanks Marcus, Please see my post below, starting with I think I've figured it out... Yes, I saw, but the formula below I think I've figured it out... looked strange to me. Since I'm working in GRASS instead of R, [ should be and, not instead of ;) ] Here is an example based on on the landsat imagery in mapset landsat, North Carolina sample dataset: i.pca without rescaling of the output: i.pca input=lsat7_2000_10,lsat7_2000_20,lsat7_2000_30 output_prefix=lsat7_2000_pc rescale=0,0 The eigenvectors are 0.4494, 0.5128, 0.7315 0.6726, 0.3447,-0.6548 0.5879,-0.7863, 0.1901 this is a square 3x3 matrix. R gives the inverse matrix as 0.4493372, 0.6726549, 0.5879235 0.5128129, 0.3446144,-0.7862660 0.7315070,-0.6548316, 0.182 Test with band 10: the general formula is b1' = (ev1-1' * pc1 + ev1-2' * pc2 + ev1-3' * pc3) + mean(b1) The mean of lsat7_2000_b10 is 79.925. r.mapcalc lsat7_2000_10.pc = (lsat7_2000_pc.1 * 0.4493372 + lsat7_2000_pc.2 * 0.6726549 + lsat7_2000_pc.3 * 0.5879235) + 79.925 Difference to original: r.mapcalc diff = lsat7_2000_10 - lsat7_2000_10.pc r.univar diff -g n=250325 null_cells=0 cells=250325 min=-0.00140021304849824 max=0.00723340429107111 range=0.00863361733956935 mean=-1.8473599814612e-05 That's close enough, given that the eigenvectors are printed by i.pca with limited precision (only 4 decimal places). Without using R, the inverse of a 3x3 matrix can be manually calculated as follows: original matrix: a b c d e f g h k inverse matrix: (ek - fh) (ch - bk) (bf - ce) (fg - dk) (ak - cg) (cd - af) (dh - eg) (gb - ah) (ae - bd) Your example from R above *looks* like the inverse of matrix... a b c d e f g h k is (given rounding errors) ... a d g b e h c f k That is, transpose rows and columns. Is this just a coincidence or a reasonable approximation? each entry in this inverse matrix needs to be multiplied with 1 / (a(ek - fh) - b(kd - fg) + c(dh - eg)) by a(ek - fh)... , you mean a = factor score 1 of a cell, b = factor score 2 of a cell, c = factor score 3 of a cell (not eigenvectors a,b,c or the matrix), right? in order to get the transformation matrix to be applied to the pc scores. The formula is more complex for matrices with more than 3 dimensions, e.g. for i.pca with 6 input bands. So my questions are: 1) Do I have the inversion correct in terms of how it needs to be calculated in GRASS? I don't think so. Using your formula and testing for the difference gives me min=-546.255208609669 max=174.771853180844 range=721.027061790513 mean=79.9249815357317 a bit too much IMHO. Right. Looks like I'll need to use r.univar to calculate the mean of each band to add back in. Thanks again. I will try this out manually and test before updating the pan sharpening routine. 2) Even though the mean of all bands seems to be subtracted from each factor score, does inverting the matrix mean that the mean of *each* band is added back to its transformed value? Adding either the mean of all original bands or 1 original band seems to produce values that are much higher than the original, and so need to be rescaled. Maybe this is OK. The mean of each band is added back to each recalculated band. See above r.mapcalc formula for band 10. 3) I do not normalize or rescale in i.pca. This seems to make it easier to do the inverse PCA with fewer calculations. Is there any reason I *should* rescale and/or normalize? Normalization applies to the input of i.pca and is by default not done. It needs to be done for heterogenous input data such as e.g. rainfall, temperature, NDVI, etc. Rescaling is automatically applied to the output of i.pca unless explicitly disabled with rescale=0,0. Thanks. That is what I thought. Michael Markus M Thanks much Michael On Jun 25, 2012, at 10:11 AM, Markus Metz wrote: An inverse PCA can be regarded as the inverse of a transformation using matrix notation. PC scores are calculated with b = A a with A being the transformation matrix composed of the Eigenvectors, a being the vector of the original values and b the PC scores. What you now need is inverse of A, A^-1. The original values can then be retrieved with A^-1 b = a A^-1 is the inverse of the transformation matrix A which you can get in R with solve(A). For a PCA, the original values are usually shifted to the mean and optionally scaled to stddev before computing the Eigenvectors. The mean shift is always performed by i.pca, scaling is optional. That means that A^-1 b gives the original values shifted to the mean and maybe scaled, and the mean of each original band needs to be added to get the original
Re: [GRASS-dev] PCA question
On Mon, Jun 25, 2012 at 8:21 PM, Michael Barton michael.bar...@asu.edu wrote: On Jun 25, 2012, at 12:03 PM, Markus Metz wrote: Here is an example based on on the landsat imagery in mapset landsat, North Carolina sample dataset: i.pca without rescaling of the output: i.pca input=lsat7_2000_10,lsat7_2000_20,lsat7_2000_30 output_prefix=lsat7_2000_pc rescale=0,0 The eigenvectors are 0.4494, 0.5128, 0.7315 0.6726, 0.3447,-0.6548 0.5879,-0.7863, 0.1901 this is a square 3x3 matrix. R gives the inverse matrix as 0.4493372, 0.6726549, 0.5879235 0.5128129, 0.3446144,-0.7862660 0.7315070,-0.6548316, 0.182 [...] Without using R, the inverse of a 3x3 matrix can be manually calculated as follows: original matrix: a b c d e f g h k inverse matrix: (ek - fh) (ch - bk) (bf - ce) (fg - dk) (ak - cg) (cd - af) (dh - eg) (gb - ah) (ae - bd) Your example from R above *looks* like the inverse of matrix... a b c d e f g h k is (given rounding errors) ... a d g b e h c f k That is, transpose rows and columns. Is this just a coincidence or a reasonable approximation? Oops, right. A PCA uses an orthogonal transformation, in which case the inverse of the transformation matrix is identical to the transpose of the transformation matrix. Easy solution. each entry in this inverse matrix needs to be multiplied with 1 / (a(ek - fh) - b(kd - fg) + c(dh - eg)) by a(ek - fh)... , you mean a = factor score 1 of a cell, b = factor score 2 of a cell, c = factor score 3 of a cell (not eigenvectors a,b,c or the matrix), right? No, a,b,c are here Eigenvector values. But this formula is the general case for the inverse of a 3x3 matrix and not needed here, because the inverse and the transpose are the same. Markus M in order to get the transformation matrix to be applied to the pc scores. The formula is more complex for matrices with more than 3 dimensions, e.g. for i.pca with 6 input bands. So my questions are: 1) Do I have the inversion correct in terms of how it needs to be calculated in GRASS? I don't think so. Using your formula and testing for the difference gives me min=-546.255208609669 max=174.771853180844 range=721.027061790513 mean=79.9249815357317 a bit too much IMHO. Right. Looks like I'll need to use r.univar to calculate the mean of each band to add back in. Thanks again. I will try this out manually and test before updating the pan sharpening routine. 2) Even though the mean of all bands seems to be subtracted from each factor score, does inverting the matrix mean that the mean of *each* band is added back to its transformed value? Adding either the mean of all original bands or 1 original band seems to produce values that are much higher than the original, and so need to be rescaled. Maybe this is OK. The mean of each band is added back to each recalculated band. See above r.mapcalc formula for band 10. 3) I do not normalize or rescale in i.pca. This seems to make it easier to do the inverse PCA with fewer calculations. Is there any reason I *should* rescale and/or normalize? Normalization applies to the input of i.pca and is by default not done. It needs to be done for heterogenous input data such as e.g. rainfall, temperature, NDVI, etc. Rescaling is automatically applied to the output of i.pca unless explicitly disabled with rescale=0,0. Thanks. That is what I thought. Michael Markus M Thanks much Michael On Jun 25, 2012, at 10:11 AM, Markus Metz wrote: An inverse PCA can be regarded as the inverse of a transformation using matrix notation. PC scores are calculated with b = A a with A being the transformation matrix composed of the Eigenvectors, a being the vector of the original values and b the PC scores. What you now need is inverse of A, A^-1. The original values can then be retrieved with A^-1 b = a A^-1 is the inverse of the transformation matrix A which you can get in R with solve(A). For a PCA, the original values are usually shifted to the mean and optionally scaled to stddev before computing the Eigenvectors. The mean shift is always performed by i.pca, scaling is optional. That means that A^-1 b gives the original values shifted to the mean and maybe scaled, and the mean of each original band needs to be added to get the original values used as input to i.pca. With scaling applied, the shifted values need to be multiplied by the stddev for each original band. HTH, Markus M On Tue, Jun 19, 2012 at 12:46 AM, Michael Barton michael.bar...@asu.edu wrote: The constant (i.e., the band mean) was in the pca primer that someone sent me a link to in this discussion. Using the Eigenvectors resulting from i.pca, I can achieve the results of i.pca using my formula below. This is essentially the same as your formula minus the constant--which doesn't really make much (of any) difference in the final result. However, my question is about performing
Re: [GRASS-dev] PCA question
Great news. Thanks. Michael On Jun 25, 2012, at 12:51 PM, Markus Metz wrote: On Mon, Jun 25, 2012 at 8:21 PM, Michael Barton michael.bar...@asu.edu wrote: On Jun 25, 2012, at 12:03 PM, Markus Metz wrote: Here is an example based on on the landsat imagery in mapset landsat, North Carolina sample dataset: i.pca without rescaling of the output: i.pca input=lsat7_2000_10,lsat7_2000_20,lsat7_2000_30 output_prefix=lsat7_2000_pc rescale=0,0 The eigenvectors are 0.4494, 0.5128, 0.7315 0.6726, 0.3447,-0.6548 0.5879,-0.7863, 0.1901 this is a square 3x3 matrix. R gives the inverse matrix as 0.4493372, 0.6726549, 0.5879235 0.5128129, 0.3446144,-0.7862660 0.7315070,-0.6548316, 0.182 [...] Without using R, the inverse of a 3x3 matrix can be manually calculated as follows: original matrix: a b c d e f g h k inverse matrix: (ek - fh) (ch - bk) (bf - ce) (fg - dk) (ak - cg) (cd - af) (dh - eg) (gb - ah) (ae - bd) Your example from R above *looks* like the inverse of matrix... a b c d e f g h k is (given rounding errors) ... a d g b e h c f k That is, transpose rows and columns. Is this just a coincidence or a reasonable approximation? Oops, right. A PCA uses an orthogonal transformation, in which case the inverse of the transformation matrix is identical to the transpose of the transformation matrix. Easy solution. each entry in this inverse matrix needs to be multiplied with 1 / (a(ek - fh) - b(kd - fg) + c(dh - eg)) by a(ek - fh)... , you mean a = factor score 1 of a cell, b = factor score 2 of a cell, c = factor score 3 of a cell (not eigenvectors a,b,c or the matrix), right? No, a,b,c are here Eigenvector values. But this formula is the general case for the inverse of a 3x3 matrix and not needed here, because the inverse and the transpose are the same. Markus M in order to get the transformation matrix to be applied to the pc scores. The formula is more complex for matrices with more than 3 dimensions, e.g. for i.pca with 6 input bands. So my questions are: 1) Do I have the inversion correct in terms of how it needs to be calculated in GRASS? I don't think so. Using your formula and testing for the difference gives me min=-546.255208609669 max=174.771853180844 range=721.027061790513 mean=79.9249815357317 a bit too much IMHO. Right. Looks like I'll need to use r.univar to calculate the mean of each band to add back in. Thanks again. I will try this out manually and test before updating the pan sharpening routine. 2) Even though the mean of all bands seems to be subtracted from each factor score, does inverting the matrix mean that the mean of *each* band is added back to its transformed value? Adding either the mean of all original bands or 1 original band seems to produce values that are much higher than the original, and so need to be rescaled. Maybe this is OK. The mean of each band is added back to each recalculated band. See above r.mapcalc formula for band 10. 3) I do not normalize or rescale in i.pca. This seems to make it easier to do the inverse PCA with fewer calculations. Is there any reason I *should* rescale and/or normalize? Normalization applies to the input of i.pca and is by default not done. It needs to be done for heterogenous input data such as e.g. rainfall, temperature, NDVI, etc. Rescaling is automatically applied to the output of i.pca unless explicitly disabled with rescale=0,0. Thanks. That is what I thought. Michael Markus M Thanks much Michael On Jun 25, 2012, at 10:11 AM, Markus Metz wrote: An inverse PCA can be regarded as the inverse of a transformation using matrix notation. PC scores are calculated with b = A a with A being the transformation matrix composed of the Eigenvectors, a being the vector of the original values and b the PC scores. What you now need is inverse of A, A^-1. The original values can then be retrieved with A^-1 b = a A^-1 is the inverse of the transformation matrix A which you can get in R with solve(A). For a PCA, the original values are usually shifted to the mean and optionally scaled to stddev before computing the Eigenvectors. The mean shift is always performed by i.pca, scaling is optional. That means that A^-1 b gives the original values shifted to the mean and maybe scaled, and the mean of each original band needs to be added to get the original values used as input to i.pca. With scaling applied, the shifted values need to be multiplied by the stddev for each original band. HTH, Markus M On Tue, Jun 19, 2012 at 12:46 AM, Michael Barton michael.bar...@asu.edu wrote: The constant (i.e., the band mean) was in the pca primer that someone sent me a link to in this discussion. Using the Eigenvectors resulting from i.pca, I can achieve the results of i.pca using my formula below. This is essentially the same as your
Re: [GRASS-dev] PCA question
Hi, FWIW, it's a known feature that GRASS's i.pca and R's version give transposed output. It's just cosmetic and the row and column headers tell you what's what. GRASS lists the principal components row-wise, R presents them column-wise. See discussion and GRASS vs. R testing/validation w/Nikos from c. March 2009. ( trac ticket #430) See also other open i.pca tickets: i.pca fails to center data prior to analysis http://trac.osgeo.org/grass/ticket/576 i.pca: close map before writing out metadata? http://trac.osgeo.org/grass/ticket/511 i.pca metadata truncated http://trac.osgeo.org/grass/ticket/1279 Hamish ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] PCA question
Hamish: See discussion and GRASS vs. R testing/validation w/Nikos from c. March 2009. ( trac ticket #430) and the (colorized) discussion here: http://grass.osgeo.org/wiki/Principal_Components_Analysis#SVD_solution Hamish ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] PCA question
Fortunately, those are not issues for pan sharpening. Michael On Jun 25, 2012, at 4:23 PM, Hamish wrote: Hi, FWIW, it's a known feature that GRASS's i.pca and R's version give transposed output. It's just cosmetic and the row and column headers tell you what's what. GRASS lists the principal components row-wise, R presents them column-wise. See discussion and GRASS vs. R testing/validation w/Nikos from c. March 2009. ( trac ticket #430) See also other open i.pca tickets: i.pca fails to center data prior to analysis http://trac.osgeo.org/grass/ticket/576 i.pca: close map before writing out metadata? http://trac.osgeo.org/grass/ticket/511 i.pca metadata truncated http://trac.osgeo.org/grass/ticket/1279 Hamish _ C. Michael Barton Visiting Scientist, Integrated Science Program National Center for Atmospheric Research University Corporation for Atmospheric Research 303-497-2889 (voice) Director, Center for Social Dynamics Complexity Professor of Anthropology, School of Human Evolution Social Change Arizona State University www: http://www.public.asu.edu/~cmbarton, http://csdc.asu.edu ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] PCA question
The constant (i.e., the band mean) was in the pca primer that someone sent me a link to in this discussion. Using the Eigenvectors resulting from i.pca, I can achieve the results of i.pca using my formula below. This is essentially the same as your formula minus the constant--which doesn't really make much (of any) difference in the final result. However, my question is about performing an *inverse pca*--getting back to the original values from the principal components maps. The idea of pca sharpening is that you perform a pca, then do an inverse pca substituting the pan band for pc1 to enhance the resolution. The equations I show below seem to work, but what I've read indicates that it is not possible to *exactly* get the original values back; you can only approximate them. Michael On Jun 17, 2012, at 10:48 AM, Duccio Rocchini wrote: Dear all, first, sorry for the delay... Here are my 2 cents to be added to the discussion. I re-took in my hands the John Jensen book. Accordingly new brightness values1,1,1 = a1,1*BV1,1,1 +a2,1*BV1,1,2. + an,1*BV1,1,m where a=eigenvector and BV=original brightness value. I found no evidence for the mean term: - ((b1+b2+b3)/3) Michael: do you have a proof/reference for that? P.S. thanks for involving me in this discussion which is really stimulating! Duccio 2012/6/7 Michael Barton michael.bar...@asu.edu: I think I've figured it out. If (ev1-1, ev1-2, ev1-3) are the eigenvectors of the first principal component for 3 imagery bands (b1, b2, b3), the corresponding factor scores of the PC1, PC2, and PC3 maps (fs1, fs2, fs3) are calculated as: fs1 = (ev1-1*b1) + (ev1-2*b2) + (ev1-3*b3) - ((b1+b2+b3)/3) fs2 = (ev2-1*b1) + (ev2-2*b2) + (ev2-3*b3) - ((b1+b2+b3)/3) fs3 = (ev3-1*b1) + (ev3-2*b2) + (ev3-3*b3) - ((b1+b2+b3)/3) So to do an inverse PCA on the same data you need to do the following: b1' = (fs1/ev1-1) + (fs2/ev2-1) + (fs3/ev3-1) b2' = (fs1/ev1-2) + (fs2/ev2-2) + (fs3/ev3-2) b3' = (fs1/ev1-3) + (fs2/ev2-3) + (fs3/ev3-3) Adding the constant back on doesn't really seem to matter because you need to rescale b1' to b1, b2' to b2, and b3' to b3 anyway. Michael On Jun 7, 2012, at 1:55 AM, Markus Neteler wrote: Hi Duccio, On Wed, Jun 6, 2012 at 11:39 PM, Michael Barton michael.bar...@asu.edu wrote: On Jun 6, 2012, at 2:20 PM, Markus Neteler wrote: On Wed, Jun 6, 2012 at 5:09 PM, Michael Barton michael.bar...@asu.edu wrote: ... I'm working on a pan sharpening script that will combine your i.fusion.brovey with options to do other pan sharpening methods. An IHS transformation is easy. I think that a PCA sharpening is doable too if I can find an equation to rotate the PCA channels back into unrotated space--since i.pca does provide the eigenvectors. Maybe there is material in (see m.eigenvector) http://grass.osgeo.org/wiki/Principal_Components_Analysis This has a lot of good information and ALMOST has what I need. Everything I read suggests that there is a straightforward way to get the original values from the factor scores if you have the eigenvectors. But no one I've yet found provides the equation or algorithm to do it. @Duccio: any idea about this by chance? thanks Markus _ C. Michael Barton Visiting Scientist, Integrated Science Program National Center for Atmospheric Research University Corporation for Atmospheric Research 303-497-2889 (voice) Director, Center for Social Dynamics Complexity Professor of Anthropology, School of Human Evolution Social Change Arizona State University www: http://www.public.asu.edu/~cmbarton, http://csdc.asu.edu -- Duccio Rocchini, PhD http://gis.cri.fmach.it/rocchini/ Fondazione Edmund Mach Research and Innovation Centre Department of Biodiversity and Molecular Ecology GIS and Remote Sensing Unit Via Mach 1, 38010 San Michele all'Adige (TN) - Italy Phone +39 0461 615 570 ducciorocch...@gmail.com duccio.rocch...@fmach.it skype: duccio.rocchini _ C. Michael Barton Visiting Scientist, Integrated Science Program National Center for Atmospheric Research University Consortium for Atmospheric Research 303-497-2889 (voice) Director, Center for Social Dynamics Complexity Professor of Anthropology, School of Human Evolution Social Change Arizona State University www: http://www.public.asu.edu/~cmbarton, http://csdc.asu.edu ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev