On Sat, 23 Jan 2010, Riccardo (Jack) Lucchetti wrote: > On Fri, 22 Jan 2010, Henrique wrote: > > > Dear Gretl community, > > > > I want to create a index using five variables using gretl's principal > > components > > analysis and I would like to know if I'm doing it properly. I'll describe > > my steps: > > > > Step 1: Compute principal components (Main window, View -> Principal > > components); > > Step 2: Save all components (PC1, ..., PC5); > > Step 3: index = PC1 + ... + PC5. > > Sorry, Henrique, maybe I'm missing the point, but what are you trying to > do? Principal components are useful exactly because they are orthogonal > (incorrelated if you prefer) to each other, so they carry non-overlapping > information. If you want an index that contains the maximum possible > amount of information that one single variable can contain, take the first > PC (the one associated with the highest eigenvalue) and you're ok. If you > take their sum you end up with a variable that contains LESS information, > not more.
I gather via google that's there's a literature out there on constructing index variables of one sort or another on the basis of PCA. I suppose if you add up all the components you get some sort of weighted sum of the original series, which might perhaps be useful in certain contexts (supposing that you _want_ to lose information relative to the original set of data). This untutored example doesn't prove anything at all but I found it moderately interesting: <script> open data7-12 pca wbase length width height weight --save-all series idx = PC1 + PC2 + PC3 + PC4 + PC5 series xsum = wbase + length + width + height + weight ols price 0 idx ols price 0 xsum ols price 0 PC1 ols price 0 weight </script> The sum of PCs of cars' characteristics predicts price better than the sum of characteristics (which is hardly meaningful), and a bit better than PC1 alone, but it doesn't do nearly as well as the most relevant of the original variables, namely the weight of the car. Allin.
