Re: [R] prcomp - arbitrary direction of the returned principal components

2022-10-12 Thread Ashim Kapoor
Dear Aaron,

Many thanks for your reply.

Please allow me to illustrate my query a bit.

I take some data, throw it to prcomp and extract the x data frame from prcomp.

>From ?prcomp:

   x: if ‘retx’ is true the value of the rotated data (the centred
  (and scaled if requested) data multiplied by the ‘rotation’
  matrix) is returned.  Hence, ‘cov(x)’ is the diagonal matrix
  ‘diag(sdev^2)’.  For the formula method, ‘napredict()’ is
  applied to handle the treatment of values omitted by the
  ‘na.action’.

I consider x[,1] as my index. This makes sense as x[,1] is the
projection of the data on the FIRST principal component.
Now this x[,1] can be a high +ve number or a low -ve number. I can't
ignore the sign.

If I ignore the sign by taking the absolute value, the HIGH / LOW
stress values will be indistinguishable.

Hence I do not think using absolute values of x[,1] is the solution.
Yes it will make the results REPRODUCIBLE but that will be at the cost
of losing information.

Any other idea ?

Many thanks,
Ashim

On Wed, Oct 12, 2022 at 5:23 PM Ebert,Timothy Aaron  wrote:
>
> Use absolute value
>
> Tim
>
> -Original Message-
> From: R-help  On Behalf Of Ashim Kapoor
> Sent: Wednesday, October 12, 2022 7:48 AM
> To: R Help 
> Subject: [R] prcomp - arbitrary direction of the returned principal components
>
> [External Email]
>
> Dear R experts,
>
> From ?prcomp,
>
>  snip -
> Note:
>
>  The signs of the columns of the rotation matrix are arbitrary, and
>  so may differ between different programs for PCA, and even between
>  different builds of R.
>  snip --
>
> My problem is that I am building an index based on Principal Components 
> Analysis.
> When the index is high it should indicate stress in the market. Due to the 
> arbitrary sign sometimes I get an index which is HIGH when there is stress 
> and sometimes I get  the OPPOSITE - an index which is LOW when there is 
> stress.
> This program is shared with other people who may have a different build of R.
>
> I can forcefully use a NEGATIVE sign to FLIP the index when it is LOW.
> That works.
>
> Now my query is : Just like we do set.seed(1234) and force the pattern of 
> generation of random number and make it REPRODUCIBLE, can I do something like 
> :
>
> set.direction.for.vector.in.pca(1234)
>
> Now each time I do prcomp it should choose the SAME ( high or low ) direction 
> of the principle component on ANY computer having ANY version of R installed.
>
> That's what I want. I don't want the the returned principal component to be 
> HIGH(LOW) on my computer and LOW(HIGH) on someone else's computer.
> That would confuse the people the code is shared with.
>
> Is this possible ? How do people deal with this ?
>
> Many thanks,
> Ashim
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-helpdata=05%7C01%7Ctebert%40ufl.edu%7C258ecdf67d1342e9785508daac47cdf3%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638011721656997427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=Jh00DHZnx%2FbRGgsdqkgEp7qcMzzqcjhxYfJGF1d13PI%3Dreserved=0
> PLEASE do read the posting guide 
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.htmldata=05%7C01%7Ctebert%40ufl.edu%7C258ecdf67d1342e9785508daac47cdf3%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638011721656997427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=p%2BYrpIUZTD1msNJFsE34J1iLCt8yAPsCe334GKm%2BAtk%3Dreserved=0
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading Text files from UK Met Office into R again...

2022-10-12 Thread David Winsemius
First one needs to remove the extraneous line-ends that you created by using an 
editor that inserts those line-ends (or perhaps it was your mail-client that 
added them because you failed to post in plain-text. I removed those files "by 
hand" and then created a text "file".

txt <- "2015-01-01 00:00, 03002, WMO, SYNOP, 1, 12, 1011, 4, 7, 200, 18, 82, , 
, 8, , , , , 100, 450, 1005.4, 5, , 102, 4, , 129, , , , , , , , 8.7, 7.5, 
8.1,1003.6, , , , , , , 1, 1, 1, , , 1, , , , , 1, 1, 1, 1, 1, 1, , 1, , 1, 1, 
, , , , , , , , , 1, , , , , 2014-12-31 23:53, 0, , , , , , , , , , , , K, , , 
, , 91.7, A, , , ,
2015-01-01 00:00, 03005, WMO, SYNOP, 1, 9, 1011, 4, 1, 210, 26, 62, 8, 6, ,8, 
8, , , 8, 30, 700, 1006, 1, 8, 54, 7, 6, 105, , , , , , , , 8.6, 7.3, 8, 996.1, 
, 01, , , , , 1, 1, 1, 1, 1, 1, 1, , , 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, , , , , , , , 1, , , , , 2014-12-31 23:55, 0, , , , , , , , , , , , K, , , , 
, 91.7, A, , , 0, 1
2015-01-01 00:00, 03006, WMO, SYNOP, 1, 10, 1011, 4, 6, 210, 23, , , , , , , , 
, , , , , , , , , , , , , , , , , , , , , , , , , , , 1, 1, , , , , , , , , , , 
, , , , , , , , , , , , , , , , , , , , , , , 2014-12-31 23:53, 0, , , , , , , 
, , , , , , , , , , , A, , , ,
2015-01-01 00:00, 03010, WMO, SYNOP, 1, 17, 1011, 4, 6, 230, 21, , , , , , , , 
, , , 1006.1, , , , , , , , , , , , , , 9.4, 6.2, 7.9, , , , , , , , 1, 1, , , 
, , , , , , , , 1, 1, 1, 1, , , , , , , , , , , , , , , , , , , ,"

# Then use `count.fields`
count.fields(file=textConnection(txt))
[1] 104 106 105  81

# So i'm guessing you arbitrarily snipped in the middl of own of the text lines

dat <- read.table(text=txt, sep=",", fill=TRUE, row.names=NULL, head=FALSE)
 str(dat)
'data.frame':   4 obs. of  105 variables:
 $ V1  : chr  "2015-01-01 00:00" "2015-01-01 00:00" "2015-01-01 00:00" 
"2015-01-01 00:00"
 $ V2  : int  3002 3005 3006 3010
 $ V3  : chr  " WMO" " WMO" " WMO" " WMO"
 $ V4  : chr  " SYNOP" " SYNOP" " SYNOP" " SYNOP"
 $ V5  : int  1 1 1 1
 $ V6  : int  12 9 10 17
 $ V7  : int  1011 1011 1011 1011
 $ V8  : int  4 4 4 4
 $ V9  : int  7 1 6 6
 $ V10 : int  200 210 210 230
 $ V11 : int  18 26 23 21
 $ V12 : int  82 62 NA NA
 $ V13 : int  NA 8 NA NA
 $ V14 : int  NA 6 NA NA
 $ V15 : int  8 NA NA NA
 $ V16 : int  NA 8 NA NA
 $ V17 : int  NA 8 NA NA
 $ V18 : logi  NA NA NA NA
 $ V19 : logi  NA NA NA NA
 $ V20 : int  100 8 NA NA
 #snipped about 80 lines ...
 $ V99 : num  91.7 NA NA NA
  [list output truncated]


ALWAYS use a programming editor and always post in plain-text.

-- David.

> On Oct 9, 2022, at 4:50 PM, Ivan Krylov  wrote:
> 
> On Sun, 9 Oct 2022 12:01:27 +0100
> Nick Wray  wrote:
> 
>> Error in read.table("midas_wxhrly_201501-201512.txt", fill = T) :
>>  duplicate 'row.names' are not allowed
> 
> Since you don't pass the `header` argument, I think that the automatic
> header detection is here at play. This is what ?read.table has to say
> about row names:
> 
>>> If there is a header and the first row contains one fewer field than
>>> the number of columns, the first column in the input is used for the
>>> row names.  Otherwise if ‘row.names’ is missing, the rows are
>>> numbered.
> 
> Perhaps the "one fewer field in the header than the number of columns"
> condition is true for files after 2010? I'm too lazy to sign up for a
> CEDA account and I'm not sure I'd be given access to hourly datasets
> anyway.
> 
> If this is the reason for the failure (first column used as rownames()
> and turns out to be non-unique), there's an easy way to fix that:
> 
>>> Using ‘row.names = NULL’ forces row numbering.
> 
> I don't see a header in your example. If there's actually no header
> containing column names, passing `header = FALSE` will both prevent the
> error and avoid eating the first line of the file.
> 
> -- 
> Best regards,
> Ivan
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prcomp - arbitrary direction of the returned principal components

2022-10-12 Thread Ebert,Timothy Aaron
Use absolute value

Tim

-Original Message-
From: R-help  On Behalf Of Ashim Kapoor
Sent: Wednesday, October 12, 2022 7:48 AM
To: R Help 
Subject: [R] prcomp - arbitrary direction of the returned principal components

[External Email]

Dear R experts,

>From ?prcomp,

 snip -
Note:

 The signs of the columns of the rotation matrix are arbitrary, and
 so may differ between different programs for PCA, and even between
 different builds of R.
 snip --

My problem is that I am building an index based on Principal Components 
Analysis.
When the index is high it should indicate stress in the market. Due to the 
arbitrary sign sometimes I get an index which is HIGH when there is stress and 
sometimes I get  the OPPOSITE - an index which is LOW when there is stress.
This program is shared with other people who may have a different build of R.

I can forcefully use a NEGATIVE sign to FLIP the index when it is LOW.
That works.

Now my query is : Just like we do set.seed(1234) and force the pattern of 
generation of random number and make it REPRODUCIBLE, can I do something like :

set.direction.for.vector.in.pca(1234)

Now each time I do prcomp it should choose the SAME ( high or low ) direction 
of the principle component on ANY computer having ANY version of R installed.

That's what I want. I don't want the the returned principal component to be 
HIGH(LOW) on my computer and LOW(HIGH) on someone else's computer.
That would confuse the people the code is shared with.

Is this possible ? How do people deal with this ?

Many thanks,
Ashim

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-helpdata=05%7C01%7Ctebert%40ufl.edu%7C258ecdf67d1342e9785508daac47cdf3%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638011721656997427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=Jh00DHZnx%2FbRGgsdqkgEp7qcMzzqcjhxYfJGF1d13PI%3Dreserved=0
PLEASE do read the posting guide 
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.htmldata=05%7C01%7Ctebert%40ufl.edu%7C258ecdf67d1342e9785508daac47cdf3%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638011721656997427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=p%2BYrpIUZTD1msNJFsE34J1iLCt8yAPsCe334GKm%2BAtk%3Dreserved=0
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] prcomp - arbitrary direction of the returned principal components

2022-10-12 Thread Ashim Kapoor
Dear R experts,

>From ?prcomp,

 snip -
Note:

 The signs of the columns of the rotation matrix are arbitrary, and
 so may differ between different programs for PCA, and even between
 different builds of R.
 snip --

My problem is that I am building an index based on Principal
Components Analysis.
When the index is high it should indicate stress in the market. Due to
the arbitrary sign sometimes I get an index which is HIGH when there
is stress and sometimes I get  the OPPOSITE - an index which is LOW
when there is stress.
This program is shared with other people who may have a different build of R.

I can forcefully use a NEGATIVE sign to FLIP the index when it is LOW.
That works.

Now my query is : Just like we do set.seed(1234) and force the pattern
of generation of random number and make it REPRODUCIBLE, can I do
something like :

set.direction.for.vector.in.pca(1234)

Now each time I do prcomp it should choose the SAME ( high or low )
direction of the principle component on ANY computer having ANY
version of R installed.

That's what I want. I don't want the the returned principal component
to be HIGH(LOW) on my computer and LOW(HIGH) on someone else's
computer.
That would confuse the people the code is shared with.

Is this possible ? How do people deal with this ?

Many thanks,
Ashim

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.