Re: [R] prcomp - arbitrary direction of the returned principal components
Dear Aaron, Many thanks for your reply. Please allow me to illustrate my query a bit. I take some data, throw it to prcomp and extract the x data frame from prcomp. >From ?prcomp: x: if ‘retx’ is true the value of the rotated data (the centred (and scaled if requested) data multiplied by the ‘rotation’ matrix) is returned. Hence, ‘cov(x)’ is the diagonal matrix ‘diag(sdev^2)’. For the formula method, ‘napredict()’ is applied to handle the treatment of values omitted by the ‘na.action’. I consider x[,1] as my index. This makes sense as x[,1] is the projection of the data on the FIRST principal component. Now this x[,1] can be a high +ve number or a low -ve number. I can't ignore the sign. If I ignore the sign by taking the absolute value, the HIGH / LOW stress values will be indistinguishable. Hence I do not think using absolute values of x[,1] is the solution. Yes it will make the results REPRODUCIBLE but that will be at the cost of losing information. Any other idea ? Many thanks, Ashim On Wed, Oct 12, 2022 at 5:23 PM Ebert,Timothy Aaron wrote: > > Use absolute value > > Tim > > -Original Message- > From: R-help On Behalf Of Ashim Kapoor > Sent: Wednesday, October 12, 2022 7:48 AM > To: R Help > Subject: [R] prcomp - arbitrary direction of the returned principal components > > [External Email] > > Dear R experts, > > From ?prcomp, > > snip - > Note: > > The signs of the columns of the rotation matrix are arbitrary, and > so may differ between different programs for PCA, and even between > different builds of R. > snip -- > > My problem is that I am building an index based on Principal Components > Analysis. > When the index is high it should indicate stress in the market. Due to the > arbitrary sign sometimes I get an index which is HIGH when there is stress > and sometimes I get the OPPOSITE - an index which is LOW when there is > stress. > This program is shared with other people who may have a different build of R. > > I can forcefully use a NEGATIVE sign to FLIP the index when it is LOW. > That works. > > Now my query is : Just like we do set.seed(1234) and force the pattern of > generation of random number and make it REPRODUCIBLE, can I do something like > : > > set.direction.for.vector.in.pca(1234) > > Now each time I do prcomp it should choose the SAME ( high or low ) direction > of the principle component on ANY computer having ANY version of R installed. > > That's what I want. I don't want the the returned principal component to be > HIGH(LOW) on my computer and LOW(HIGH) on someone else's computer. > That would confuse the people the code is shared with. > > Is this possible ? How do people deal with this ? > > Many thanks, > Ashim > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-helpdata=05%7C01%7Ctebert%40ufl.edu%7C258ecdf67d1342e9785508daac47cdf3%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638011721656997427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=Jh00DHZnx%2FbRGgsdqkgEp7qcMzzqcjhxYfJGF1d13PI%3Dreserved=0 > PLEASE do read the posting guide > https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.htmldata=05%7C01%7Ctebert%40ufl.edu%7C258ecdf67d1342e9785508daac47cdf3%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638011721656997427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=p%2BYrpIUZTD1msNJFsE34J1iLCt8yAPsCe334GKm%2BAtk%3Dreserved=0 > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading Text files from UK Met Office into R again...
First one needs to remove the extraneous line-ends that you created by using an editor that inserts those line-ends (or perhaps it was your mail-client that added them because you failed to post in plain-text. I removed those files "by hand" and then created a text "file". txt <- "2015-01-01 00:00, 03002, WMO, SYNOP, 1, 12, 1011, 4, 7, 200, 18, 82, , , 8, , , , , 100, 450, 1005.4, 5, , 102, 4, , 129, , , , , , , , 8.7, 7.5, 8.1,1003.6, , , , , , , 1, 1, 1, , , 1, , , , , 1, 1, 1, 1, 1, 1, , 1, , 1, 1, , , , , , , , , , 1, , , , , 2014-12-31 23:53, 0, , , , , , , , , , , , K, , , , , 91.7, A, , , , 2015-01-01 00:00, 03005, WMO, SYNOP, 1, 9, 1011, 4, 1, 210, 26, 62, 8, 6, ,8, 8, , , 8, 30, 700, 1006, 1, 8, 54, 7, 6, 105, , , , , , , , 8.6, 7.3, 8, 996.1, , 01, , , , , 1, 1, 1, 1, 1, 1, 1, , , 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, , , , , , , , 1, , , , , 2014-12-31 23:55, 0, , , , , , , , , , , , K, , , , , 91.7, A, , , 0, 1 2015-01-01 00:00, 03006, WMO, SYNOP, 1, 10, 1011, 4, 6, 210, 23, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 1, 1, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 2014-12-31 23:53, 0, , , , , , , , , , , , , , , , , , A, , , , 2015-01-01 00:00, 03010, WMO, SYNOP, 1, 17, 1011, 4, 6, 230, 21, , , , , , , , , , , 1006.1, , , , , , , , , , , , , , 9.4, 6.2, 7.9, , , , , , , , 1, 1, , , , , , , , , , , 1, 1, 1, 1, , , , , , , , , , , , , , , , , , , ," # Then use `count.fields` count.fields(file=textConnection(txt)) [1] 104 106 105 81 # So i'm guessing you arbitrarily snipped in the middl of own of the text lines dat <- read.table(text=txt, sep=",", fill=TRUE, row.names=NULL, head=FALSE) str(dat) 'data.frame': 4 obs. of 105 variables: $ V1 : chr "2015-01-01 00:00" "2015-01-01 00:00" "2015-01-01 00:00" "2015-01-01 00:00" $ V2 : int 3002 3005 3006 3010 $ V3 : chr " WMO" " WMO" " WMO" " WMO" $ V4 : chr " SYNOP" " SYNOP" " SYNOP" " SYNOP" $ V5 : int 1 1 1 1 $ V6 : int 12 9 10 17 $ V7 : int 1011 1011 1011 1011 $ V8 : int 4 4 4 4 $ V9 : int 7 1 6 6 $ V10 : int 200 210 210 230 $ V11 : int 18 26 23 21 $ V12 : int 82 62 NA NA $ V13 : int NA 8 NA NA $ V14 : int NA 6 NA NA $ V15 : int 8 NA NA NA $ V16 : int NA 8 NA NA $ V17 : int NA 8 NA NA $ V18 : logi NA NA NA NA $ V19 : logi NA NA NA NA $ V20 : int 100 8 NA NA #snipped about 80 lines ... $ V99 : num 91.7 NA NA NA [list output truncated] ALWAYS use a programming editor and always post in plain-text. -- David. > On Oct 9, 2022, at 4:50 PM, Ivan Krylov wrote: > > On Sun, 9 Oct 2022 12:01:27 +0100 > Nick Wray wrote: > >> Error in read.table("midas_wxhrly_201501-201512.txt", fill = T) : >> duplicate 'row.names' are not allowed > > Since you don't pass the `header` argument, I think that the automatic > header detection is here at play. This is what ?read.table has to say > about row names: > >>> If there is a header and the first row contains one fewer field than >>> the number of columns, the first column in the input is used for the >>> row names. Otherwise if ‘row.names’ is missing, the rows are >>> numbered. > > Perhaps the "one fewer field in the header than the number of columns" > condition is true for files after 2010? I'm too lazy to sign up for a > CEDA account and I'm not sure I'd be given access to hourly datasets > anyway. > > If this is the reason for the failure (first column used as rownames() > and turns out to be non-unique), there's an easy way to fix that: > >>> Using ‘row.names = NULL’ forces row numbering. > > I don't see a header in your example. If there's actually no header > containing column names, passing `header = FALSE` will both prevent the > error and avoid eating the first line of the file. > > -- > Best regards, > Ivan > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prcomp - arbitrary direction of the returned principal components
Use absolute value Tim -Original Message- From: R-help On Behalf Of Ashim Kapoor Sent: Wednesday, October 12, 2022 7:48 AM To: R Help Subject: [R] prcomp - arbitrary direction of the returned principal components [External Email] Dear R experts, >From ?prcomp, snip - Note: The signs of the columns of the rotation matrix are arbitrary, and so may differ between different programs for PCA, and even between different builds of R. snip -- My problem is that I am building an index based on Principal Components Analysis. When the index is high it should indicate stress in the market. Due to the arbitrary sign sometimes I get an index which is HIGH when there is stress and sometimes I get the OPPOSITE - an index which is LOW when there is stress. This program is shared with other people who may have a different build of R. I can forcefully use a NEGATIVE sign to FLIP the index when it is LOW. That works. Now my query is : Just like we do set.seed(1234) and force the pattern of generation of random number and make it REPRODUCIBLE, can I do something like : set.direction.for.vector.in.pca(1234) Now each time I do prcomp it should choose the SAME ( high or low ) direction of the principle component on ANY computer having ANY version of R installed. That's what I want. I don't want the the returned principal component to be HIGH(LOW) on my computer and LOW(HIGH) on someone else's computer. That would confuse the people the code is shared with. Is this possible ? How do people deal with this ? Many thanks, Ashim __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-helpdata=05%7C01%7Ctebert%40ufl.edu%7C258ecdf67d1342e9785508daac47cdf3%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638011721656997427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=Jh00DHZnx%2FbRGgsdqkgEp7qcMzzqcjhxYfJGF1d13PI%3Dreserved=0 PLEASE do read the posting guide https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.htmldata=05%7C01%7Ctebert%40ufl.edu%7C258ecdf67d1342e9785508daac47cdf3%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638011721656997427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=p%2BYrpIUZTD1msNJFsE34J1iLCt8yAPsCe334GKm%2BAtk%3Dreserved=0 and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] prcomp - arbitrary direction of the returned principal components
Dear R experts, >From ?prcomp, snip - Note: The signs of the columns of the rotation matrix are arbitrary, and so may differ between different programs for PCA, and even between different builds of R. snip -- My problem is that I am building an index based on Principal Components Analysis. When the index is high it should indicate stress in the market. Due to the arbitrary sign sometimes I get an index which is HIGH when there is stress and sometimes I get the OPPOSITE - an index which is LOW when there is stress. This program is shared with other people who may have a different build of R. I can forcefully use a NEGATIVE sign to FLIP the index when it is LOW. That works. Now my query is : Just like we do set.seed(1234) and force the pattern of generation of random number and make it REPRODUCIBLE, can I do something like : set.direction.for.vector.in.pca(1234) Now each time I do prcomp it should choose the SAME ( high or low ) direction of the principle component on ANY computer having ANY version of R installed. That's what I want. I don't want the the returned principal component to be HIGH(LOW) on my computer and LOW(HIGH) on someone else's computer. That would confuse the people the code is shared with. Is this possible ? How do people deal with this ? Many thanks, Ashim __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.