Re: [R] the large dataset problem
Hi Eric, I'm facing a similar problem. Looking over the list of packages I came across: R.huge: Methods for accessing huge amounts of data http://cran.r-project.org/src/contrib/Descriptions/R.huge.html I haven't installed it yet so I don't know how well it works. I probably won't have time until next week at the earliest to look at it. Would be interested in hearing your feedback if you do try it. - Bruce -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Eric Doviak Sent: Saturday, July 28, 2007 2:08 PM To: r-help@stat.math.ethz.ch Subject: [R] the large dataset problem Dear useRs, I recently began a job at a very large and heavily bureaucratic organization. We're setting up a research office and statistical analysis will form the backbone of our work. We'll be working with large datasets such the SIPP as well as our own administrative data. Due to the bureaucracy, it will take some time to get the licenses for proprietary software like Stata. Right now, R is the only statistical software package on my computer. This, of course, is a huge limitation because R loads data directly into RAM making it difficult (if not impossible) to work with large datasets. My computer only has 1000 MB of RAM, of which Microsucks Winblows devours 400 MB. To make my memory issues even worse, my computer has a virus scanner that runs everyday and I do not have the administrative rights to turn the damn thing off. I need to find some way to overcome these constraints and work with large datasets. Does anyone have any suggestions? I've read that I should "carefully vectorize my code." What does that mean ??? !!! The "Introduction to R" manual suggests modifying input files with Perl. Any tips on how to get started? Would Perl Data Language (PDL) be a good choice? http://pdl.perl.org/index_en.html I wrote a script which loads large datasets a few lines at a time, writes the dozen or so variables of interest to a CSV file, removes the loaded data and then (via a "for" loop) loads the next few lines I managed to get it to work with one of the SIPP core files, but it's SLW. Worse, if I discover later that I omitted a relevant variable, then I'll have to run the whole script all over again. Any suggestions? Thanks, - Eric __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ** Please be aware that, notwithstanding the fact that the pers...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply & incompatible dimensions error
Thanks Benilton, I know what I want to do, just not sure how to do it using R. The help documentation is not very clear. What I am trying to do is calculate correlations on a row against row basis: mat1 row1 x mat2 row1, mat1 row1 x mat2 row2, ... mat1 row1 x mat2 row-n, mat1 row-n, mat2 row-n - Bruce -Original Message- From: Benilton Carvalho [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 24, 2007 11:31 AM To: Bernzweig, Bruce (Consultant) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] apply & incompatible dimensions error are you positive that your function is doing what you expect it to do? it looks like you want something like: sapply(1:10, function(i) cor(mat1[i,], mat2[i,])) b On Jul 24, 2007, at 11:05 AM, Bernzweig, Bruce ((Consultant)) wrote: > Hi, > > I've created the following two matrices (mat1 and mat2) and a function > (f) to calculate the correlations between the two on a row by row > basis. > > mat1 <- matrix(sample(1:500,50), ncol = 5, > dimnames=list(paste("row", 1:10, sep=""), > paste("col", 1:5, sep=""))) > > mat2 <- matrix(sample(501:1000,50), ncol = 5, > dimnames=list(paste("row", 1:10, sep=""), > paste("col", 1:5, sep=""))) > > f <- function(x,y) cor(x,y) > > When the matrices are squares (# rows = # columns) I have no problems. > > However, when they are not (as in the example above with 5 columns and > 10 rows), I get the following error: > >> apply(mat1, 1, f, y=mat2) > Error in cor(x, y, na.method, method == "kendall") : > incompatible dimensions > > Any help would be appreciated. Thanks! > > - Bruce > > > > ** > Please be aware that, notwithstanding the fact that the pers... > {{dropped}} > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. ** Please be aware that, notwithstanding the fact that the pers...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculating subsets of row pairs using somthing faster than a for loop.
Hi all, Situation: - I have two matrices each w/ 4 rows and 20 columns. mat1 <- matrix(sample(1:500,80), ncol = 20, dimnames=list(paste("mat1row", 1:4, sep=""), paste("mat1col", 1:20, sep=""))) mat2 <- matrix(sample(501:1000,80), ncol = 20, dimnames=list(paste("mat2row", 1:4, sep=""), paste("mat2col", 1:20, sep=""))) - Each column represents a value in a time series. Q: What do I want: Calculate moving average correlations for each row x row pair: For each row x row pair I want 10 values representing moving average correlations for 10 sets of time-values: cor(mat1[1,1:10], mat2[1,1:10]) cor(mat1[1,2:11], mat2[1,2:11]) ... cor(mat1[1,11:20], mat2[1,11:20]) cor(mat1[1,1:10], mat2[2,1:10]) ... cor(mat1[4,11:20], mat2[4,11:20]) Result would be a 16 (rows) x 10 (col) matrix matMA ma1, ma2, ..., ma10 for (mat1 row1) x (mat2 row1) ma1, ma2, ..., ma10 for (mat1 row1) x (mat2 row2) ... ma1, ma2, ..., ma10 for (mat1 row4) x (mat2 row3) ma1, ma2, ..., ma10 for (mat1 row4) x (mat2 row4) I would like to be able to do this without using a for loop due to the slowness of that method. Is it possible to iterate through subsets w/o using a for loop? Thanks, - Bruce P ** Please be aware that, notwithstanding the fact that the pers...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cor inside/outside a function has different output
Sorry. I looked up t after writing the previous email and realized that was what I was looking for! -Original Message- From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 24, 2007 11:48 AM To: Bernzweig, Bruce (Consultant) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] cor inside/outside a function has different output I think this is really answered already in my previous post but just in case try this: > res1 <- t(apply(mat1, 1, cor, t(mat2))) > res2 <- cor(t(mat1), t(mat2)) > all.equal(res1, res2, check.attributes = FALSE) [1] TRUE On 7/24/07, Bernzweig, Bruce (Consultant) <[EMAIL PROTECTED]> wrote: > I'm calculating correlations between two matrices > > > > mat1 <- matrix(sample(1:500,25), ncol = 5, > > dimnames=list(paste("mat1row", 1:5, sep=""), > > paste("mat1col", 1:5, sep=""))) > > > > mat2 <- matrix(sample(501:1000,25), ncol = 5, > > dimnames=list(paste("mat2row", 1:5, sep=""), > > paste("mat2col", 1:5, sep=""))) > > > > using what would seem to be two similar methods: > > > > Method 1: > > > > > f <- function(x,y) cor(x,y) > > > apply(mat1, 1, f, y=mat2) > > > > Method 2: > > > >> cor(mat1, mat2) > > > > However, the results (see blow) are different: > > > > > apply(mat1, 1, f, y=mat2) > > > > mat1row1 mat1row2mat1row3mat1row4mat1row5 > > [1,] -0.27601028 -0.1352143 0.03538690 -0.03084075 -0.60171704 > > [2,] -0.01595532 -0.3881197 -0.43663982 0.49081806 0.33291995 > > [3,] 0.35969624 -0.0582948 0.57462169 0.09926796 -0.02948423 > > [4,] -0.41435920 -0.7164638 -0.21213496 -0.55183934 -0.25341790 > > [5,] 0.33802803 0.5371508 0.05219095 0.83533575 0.17850291 > > > > > cor(mat1, mat2) > >mat2col1mat2col2 mat2col3 mat2col4 mat2col5 > > mat1col1 -0.84077496 -0.01538414 -0.6078933 -0.2263840 -0.1421335 > > mat1col2 0.23074421 0.54606286 -0.2354733 0.5214255 -0.2129077 > > mat1col3 -0.8528 0.19550100 -0.5920509 -0.8694040 0.6853990 > > mat1col4 0.08050976 -0.55449840 0.6225666 0.6187971 -0.8971584 > > mat1col5 -0.10199564 -0.43854767 -0.5803077 -0.5100285 0.2848351 > > > > Also, for method 2, the calculations are done on a column x column > basis. Is there any way to do this on a row by row basis. Looking at > the help page for cor, I don't see any parameters that could be used to > do this. > > > > Thanks, > > > > - Bruce > > > > > ** > Please be aware that, notwithstanding the fact that the pers...{{dropped}} > > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ** Please be aware that, notwithstanding the fact that the pers...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply & incompatible dimensions error
Thanks for the explanation. As for the rows/columns thing, the data I receive is given to me in that way. I currently read it in using read.csv. Is there a function I should look at that can take that and transpose it or should I just process the data first outside of R? Thanks, - Bruce -Original Message- From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 24, 2007 11:43 AM To: Bernzweig, Bruce (Consultant) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] apply & incompatible dimensions error Then try this: cor(t(mat1), t(mat2)) Also note 1. the above implies that mat1 and mat2 must have the same number of columns since if x and y are vectors cor(x,y) only makes sense if they have the same length. 2. the usual convention is that variables are stored as columns andt that rows correspond to cases so typically you would have (in terms of your mat1 and mat2): Mat1 <- t(mat1) Mat2 <- t(mat2) and then use Mat1 and Mat2, e.g. cor(Mat1, Mat2) On 7/24/07, Bernzweig, Bruce (Consultant) <[EMAIL PROTECTED]> wrote: > Thanks Gabor! > > You state that my apply is taking rows of mat1 with columns of mat2. > > Is this because I have the y=mat2 parameter? > > > apply(mat1, 1, f, y=mat2) > > Actually, what I would like is to run the correlations on a row against > row basis: mat1 row1 x mat2 row1, etc. > > Thanks again, > > - Bruce > > > -Original Message- > From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] > Sent: Tuesday, July 24, 2007 11:31 AM > To: Bernzweig, Bruce (Consultant) > Cc: r-help@stat.math.ethz.ch > Subject: Re: [R] apply & incompatible dimensions error > > Your apply is trying to take the correlations of the rows of mat1 with > the > columns of mat2 which, of course, does not work if they have different > numbers of columns. I think you mean to take the correlations of the > columns > of mat1 with the columns of mat2. For example, to take the correlations > of the 5 columns of mat1 with the first 4 columns of mat2 try: > > > cor(mat1, mat2[,1:4]) >col1 col2 col3 col4 > col1 -0.34624254 -0.2669519 -0.2705077 0.2183249 > col2 -0.26553255 -0.2687643 -0.0865895 0.1819025 > col3 0.19474613 -0.2334986 0.1746522 0.2326915 > col4 0.09328338 0.5117784 0.2413143 -0.3374916 > col5 0.27519716 0.1605331 -0.4057137 0.3282105 > > > On 7/24/07, Bernzweig, Bruce (Consultant) <[EMAIL PROTECTED]> wrote: > > Hi, > > > > I've created the following two matrices (mat1 and mat2) and a function > > (f) to calculate the correlations between the two on a row by row > basis. > > > >mat1 <- matrix(sample(1:500,50), ncol = 5, > >dimnames=list(paste("row", 1:10, sep=""), > >paste("col", 1:5, sep=""))) > > > >mat2 <- matrix(sample(501:1000,50), ncol = 5, > >dimnames=list(paste("row", 1:10, sep=""), > >paste("col", 1:5, sep=""))) > > > >f <- function(x,y) cor(x,y) > > > > When the matrices are squares (# rows = # columns) I have no problems. > > > > However, when they are not (as in the example above with 5 columns and > > 10 rows), I get the following error: > > > > > apply(mat1, 1, f, y=mat2) > > Error in cor(x, y, na.method, method == "kendall") : > >incompatible dimensions > > > > Any help would be appreciated. Thanks! > > > > - Bruce > > > > > > > > ** > > Please be aware that, notwithstanding the fact that the > pers...{{dropped}} > > > > __ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > ** > Please be aware that, notwithstanding the fact that the person sending > this communication has an address in Bear Stearns' e-mail system, this > person is not an employee, agent or representative of Bear Stearns. > Accordingly, this person has no power or authority to represent, make > any recommendation, solicitation, offer or statements or disclose > information on behalf of or in any way bind Bear Stearns or any of its > affiliates. > ** > ** Please be aware that, notwithstanding the fact that the pers...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply & incompatible dimensions error
Thanks Gabor! You state that my apply is taking rows of mat1 with columns of mat2. Is this because I have the y=mat2 parameter? > apply(mat1, 1, f, y=mat2) Actually, what I would like is to run the correlations on a row against row basis: mat1 row1 x mat2 row1, etc. Thanks again, - Bruce -Original Message- From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 24, 2007 11:31 AM To: Bernzweig, Bruce (Consultant) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] apply & incompatible dimensions error Your apply is trying to take the correlations of the rows of mat1 with the columns of mat2 which, of course, does not work if they have different numbers of columns. I think you mean to take the correlations of the columns of mat1 with the columns of mat2. For example, to take the correlations of the 5 columns of mat1 with the first 4 columns of mat2 try: > cor(mat1, mat2[,1:4]) col1 col2 col3 col4 col1 -0.34624254 -0.2669519 -0.2705077 0.2183249 col2 -0.26553255 -0.2687643 -0.0865895 0.1819025 col3 0.19474613 -0.2334986 0.1746522 0.2326915 col4 0.09328338 0.5117784 0.2413143 -0.3374916 col5 0.27519716 0.1605331 -0.4057137 0.3282105 On 7/24/07, Bernzweig, Bruce (Consultant) <[EMAIL PROTECTED]> wrote: > Hi, > > I've created the following two matrices (mat1 and mat2) and a function > (f) to calculate the correlations between the two on a row by row basis. > >mat1 <- matrix(sample(1:500,50), ncol = 5, >dimnames=list(paste("row", 1:10, sep=""), >paste("col", 1:5, sep=""))) > >mat2 <- matrix(sample(501:1000,50), ncol = 5, >dimnames=list(paste("row", 1:10, sep=""), >paste("col", 1:5, sep=""))) > >f <- function(x,y) cor(x,y) > > When the matrices are squares (# rows = # columns) I have no problems. > > However, when they are not (as in the example above with 5 columns and > 10 rows), I get the following error: > > > apply(mat1, 1, f, y=mat2) > Error in cor(x, y, na.method, method == "kendall") : >incompatible dimensions > > Any help would be appreciated. Thanks! > > - Bruce > > > > ** > Please be aware that, notwithstanding the fact that the pers...{{dropped}} > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ** Please be aware that, notwithstanding the fact that the pers...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cor inside/outside a function has different output
I'm calculating correlations between two matrices mat1 <- matrix(sample(1:500,25), ncol = 5, dimnames=list(paste("mat1row", 1:5, sep=""), paste("mat1col", 1:5, sep=""))) mat2 <- matrix(sample(501:1000,25), ncol = 5, dimnames=list(paste("mat2row", 1:5, sep=""), paste("mat2col", 1:5, sep=""))) using what would seem to be two similar methods: Method 1: > f <- function(x,y) cor(x,y) > apply(mat1, 1, f, y=mat2) Method 2: > cor(mat1, mat2) However, the results (see blow) are different: > apply(mat1, 1, f, y=mat2) mat1row1 mat1row2mat1row3mat1row4mat1row5 [1,] -0.27601028 -0.1352143 0.03538690 -0.03084075 -0.60171704 [2,] -0.01595532 -0.3881197 -0.43663982 0.49081806 0.33291995 [3,] 0.35969624 -0.0582948 0.57462169 0.09926796 -0.02948423 [4,] -0.41435920 -0.7164638 -0.21213496 -0.55183934 -0.25341790 [5,] 0.33802803 0.5371508 0.05219095 0.83533575 0.17850291 > cor(mat1, mat2) mat2col1mat2col2 mat2col3 mat2col4 mat2col5 mat1col1 -0.84077496 -0.01538414 -0.6078933 -0.2263840 -0.1421335 mat1col2 0.23074421 0.54606286 -0.2354733 0.5214255 -0.2129077 mat1col3 -0.8528 0.19550100 -0.5920509 -0.8694040 0.6853990 mat1col4 0.08050976 -0.55449840 0.6225666 0.6187971 -0.8971584 mat1col5 -0.10199564 -0.43854767 -0.5803077 -0.5100285 0.2848351 Also, for method 2, the calculations are done on a column x column basis. Is there any way to do this on a row by row basis. Looking at the help page for cor, I don't see any parameters that could be used to do this. Thanks, - Bruce ** Please be aware that, notwithstanding the fact that the pers...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] apply & incompatible dimensions error
Hi, I've created the following two matrices (mat1 and mat2) and a function (f) to calculate the correlations between the two on a row by row basis. mat1 <- matrix(sample(1:500,50), ncol = 5, dimnames=list(paste("row", 1:10, sep=""), paste("col", 1:5, sep=""))) mat2 <- matrix(sample(501:1000,50), ncol = 5, dimnames=list(paste("row", 1:10, sep=""), paste("col", 1:5, sep=""))) f <- function(x,y) cor(x,y) When the matrices are squares (# rows = # columns) I have no problems. However, when they are not (as in the example above with 5 columns and 10 rows), I get the following error: > apply(mat1, 1, f, y=mat2) Error in cor(x, y, na.method, method == "kendall") : incompatible dimensions Any help would be appreciated. Thanks! - Bruce ** Please be aware that, notwithstanding the fact that the pers...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tagging results of "apply"
Thanks for the clarification and help! -Original Message- From: Stephen Tucker [mailto:[EMAIL PROTECTED] Sent: Sunday, July 22, 2007 6:08 AM To: Bernzweig, Bruce (Consultant); r-help Subject: Re: [R] tagging results of "apply" Actually if you want to tag both column and row, this might also help: ## Give dimension labels to both matrices mat1 <- matrix(sample(1:500, 25), ncol = 5, dimnames=list(paste("mat1row",1:5,sep=""), paste("mat1col",1:5,sep=""))) mat2 <- matrix(sample(501:1000, 25), ncol = 5, dimnames=list(paste("mat2row",1:5,sep=""), paste("mat2col",1:5,sep=""))) cor(mat1[1,],mat2) mat2col1 mat2col2 mat2col3 mat2col4 mat2col5 [1,] -0.06313535 -0.4679927 -0.5147084 -0.797748 -0.001457972 The column labels are there but are lost when returned from apply(), as it says in ?apply: "In all cases the result is coerced by as.vector to one of the basic vector types before the dimensions are set" > as.vector(cor(mat1[1,],mat2)) [1] -0.063135353 -0.467992672 -0.514708392 -0.797748010 -0.001457972 You lose the dimension labels in this case, so one option is to guard against this in the following way: > as.vector(as.data.frame(cor(mat1[1,],mat2))) mat2col1 mat2col2 mat2col3 mat2col4 mat2col5 1 -0.06313535 -0.4679927 -0.5147084 -0.797748 -0.001457972 Unfortunately, if you use 'as.data.frame()' in 'function(x)', apply will return a list - but you can bind the rows of the output: > f <- function(x,y) as.data.frame(cor(x,y)) > do.call(rbind, apply(mat1,1,f,y=mat2)) mat2col1 mat2col2mat2col3 mat2col4 mat2col5 mat1row1 -0.06313535 -0.4679927 -0.51470839 -0.7977480 -0.001457972 mat1row2 -0.28750363 0.1681777 0.14671484 0.8139768 0.039982028 mat1row3 -0.62017387 -0.6932731 -0.72263865 -0.7929604 0.427366680 mat1row4 0.06441894 0.1707946 -0.11444747 -0.8213577 0.526239013 mat1row5 -0.09849051 0.7024540 -0.01997228 0.3712480 0.439037838 The result is a data frame, not a matrix, and note that the columns/rows are transposed in relation to the output of apply(mat1,1,f,y=mat2) An alternative is to convert each row of mat1 into a list element [by transposing it with t() and then feeding it to as.data.frame()] and then use sapply(): > sapply(as.data.frame(t(mat1)),f,y=mat2) mat1row1 mat1row2 mat1row3 mat1row4 mat1row5 mat2col1 -0.06313535 -0.2875036 -0.6201739 0.06441894 -0.0984905 mat2col2 -0.4679927 0.1681777 -0.6932731 0.1707946 0.702454 mat2col3 -0.5147084 0.1467148 -0.7226387 -0.1144475 -0.01997228 mat2col4 -0.7977480.8139768 -0.7929604 -0.8213577 0.371248 mat2col5 -0.001457972 0.03998203 0.4273667 0.526239 0.4390378 --- Stephen Tucker <[EMAIL PROTECTED]> wrote: > Dear Bruce, > In your functions, you need to use your bound variable, 'x' [not mat1] in > your anonymous function [function(x)] as the argument to cor(). > > For instance, you wrote: > apply(mat1, 1, function(x) cor(mat1, mat2[1,])) > apply(mat1, 1, function(x) cor(mat1, mat2)) > > They should be > apply(mat1, 1, function(x) cor(x, mat2[1,])) > apply(mat1, 1, function(x) cor(x, mat2)) > > or > f <- function(x,y) cor(x, y) > apply(mat1, 1, f, y=mat2[1,]) > apply(mat1, 1, f, y=mat2) > > Then from the ?apply documentation - under section, 'Value' - the following > statement will help you predict its behavior in this case: > "If each call to FUN returns a vector of length n, then apply returns an > array of dimension c(n, dim(X)[MARGIN]) if n > 1." > > [each column of your output is the output from cor(mat1[i,],mat2) in > Scenario > 2]. As for tagging, you can try adding dimension labels [to the object > which > is passed as the 'X' argument to apply()]: > > mat1 <- matrix(sample(1:500, 25), ncol = 5, >dimnames=list(paste("row",1:5,sep=""), > paste("col",1:5,sep=""))) > mat2 <- matrix(sample(501:1000, 25), ncol = 5) > > > apply(mat1, 1, function(x,y) cor(x, y), y=mat2) > row1 row2 row3row4row5 > [1,] 0.39412464 -0.6241649 0.7423724 0.48391875 0.27085386 > [2,] -0.22912466 -0.4123714 0.2857004 -0.52447327 0.06971423 > [3,] -0.51027247 0.3256587 -0.6195050 -0.48309737 0.01699978 > [4,] 0.26353316 -0.1873564 0.2121154 0.88784766 -0.02257890 > [5,] -0.03771225 -0.4250040 0.3795558 -0.03372794 -0.05874675 > > Hope this helps, > > Stephen > > --- "Bernzweig, Bruce (Consultant)" <[EMAIL PROTECTED]> wrote: > > > In trying to get a better understanding of vectorizat
Re: [R] tagging results of "apply"
Thanks! I'll take a look at this. -Original Message- From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Sent: Sunday, July 22, 2007 7:24 AM To: Bernzweig, Bruce (Consultant) Cc: r-help Subject: Re: [R] tagging results of "apply" You don't need apply at all here. cor can already do that and it automatically labels the rows and columns too. Using the builtin dataset anscombe whose first 4 columns are labelled x1,x2,x3,x4 and whose next 4 columns are labelled y1,y2,y3,y4 we have: > cor(anscombe[1:4], anscombe[5:8]) y1 y2 y3 y4 x1 0.8164205 0.8162365 0.8162867 -0.3140467 x2 0.8164205 0.8162365 0.8162867 -0.3140467 x3 0.8164205 0.8162365 0.8162867 -0.3140467 x4 -0.5290927 -0.7184365 -0.3446610 0.8165214 cor works the same with matrices too. On 7/20/07, Bernzweig, Bruce (Consultant) <[EMAIL PROTECTED]> wrote: > In trying to get a better understanding of vectorization I wrote the > following code: > > My objective is to take two sets of time series and calculate the > correlations for each combination of time series. > > mat1 <- matrix(sample(1:500, 25), ncol = 5) > mat2 <- matrix(sample(501:1000, 25), ncol = 5) > > Scenario 1: > apply(mat1, 1, function(x) cor(mat1, mat2[1,])) > > Scenario 2: > apply(mat1, 1, function(x) cor(mat1, mat2)) > > Using scenario 1, (output below) I can see that correlations are > calculated for just the first row of mat2 against each individual row of > mat1. > > Using scenario 2, (output below) I can see that correlations are > calculated for each row of mat2 against each individual row of mat1. > > Q1: The output of scenario2 consists of 25 rows of data. Are the first > five rows mat1 against mat2[1,], the next five rows mat1 against > mat2[2,], ... last five rows mat1 against mat2[5,]? > > Q2: I assign the output of scenario 2 to a new matrix > >matC <- apply(mat1, 1, function(x) cor(mat1, mat2)) > >However, I need a way to identify each row in matC as a pairing of > rows from mat1 and mat2. Is there a parameter I can add to apply to do > this? > > Scenario 1: > > apply(mat1, 1, function(x) cor(mat1, mat2[1,])) > [,1] [,2] [,3] [,4] [,5] > [1,] -0.4626122 -0.4626122 -0.4626122 -0.4626122 -0.4626122 > [2,] -0.9031543 -0.9031543 -0.9031543 -0.9031543 -0.9031543 > [3,] 0.0735273 0.0735273 0.0735273 0.0735273 0.0735273 > [4,] 0.7401259 0.7401259 0.7401259 0.7401259 0.7401259 > [5,] -0.4548582 -0.4548582 -0.4548582 -0.4548582 -0.4548582 > > Scenario 2: > > apply(mat1, 1, function(x) cor(mat1, mat2)) > [,1][,2][,3][,4][,5] > [1,] 0.19394126 0.19394126 0.19394126 0.19394126 0.19394126 > [2,] 0.26402400 0.26402400 0.26402400 0.26402400 0.26402400 > [3,] 0.12923842 0.12923842 0.12923842 0.12923842 0.12923842 > [4,] -0.74549676 -0.74549676 -0.74549676 -0.74549676 -0.74549676 > [5,] 0.64074122 0.64074122 0.64074122 0.64074122 0.64074122 > [6,] 0.26931986 0.26931986 0.26931986 0.26931986 0.26931986 > [7,] 0.08527921 0.08527921 0.08527921 0.08527921 0.08527921 > [8,] -0.28034079 -0.28034079 -0.28034079 -0.28034079 -0.28034079 > [9,] -0.15251915 -0.15251915 -0.15251915 -0.15251915 -0.15251915 > [10,] 0.19542415 0.19542415 0.19542415 0.19542415 0.19542415 > [11,] 0.75107032 0.75107032 0.75107032 0.75107032 0.75107032 > [12,] 0.53042767 0.53042767 0.53042767 0.53042767 0.53042767 > [13,] -0.51163612 -0.51163612 -0.51163612 -0.51163612 -0.51163612 > [14,] -0.44396048 -0.44396048 -0.44396048 -0.44396048 -0.44396048 > [15,] 0.57018745 0.57018745 0.57018745 0.57018745 0.57018745 > [16,] 0.70480284 0.70480284 0.70480284 0.70480284 0.70480284 > [17,] -0.36674283 -0.36674283 -0.36674283 -0.36674283 -0.36674283 > [18,] -0.81826607 -0.81826607 -0.81826607 -0.81826607 -0.81826607 > [19,] 0.53145184 0.53145184 0.53145184 0.53145184 0.53145184 > [20,] 0.24568385 0.24568385 0.24568385 0.24568385 0.24568385 > [21,] -0.10610402 -0.10610402 -0.10610402 -0.10610402 -0.10610402 > [22,] -0.78650748 -0.78650748 -0.78650748 -0.78650748 -0.78650748 > [23,] 0.04269423 0.04269423 0.04269423 0.04269423 0.04269423 > [24,] 0.14704698 0.14704698 0.14704698 0.14704698 0.14704698 > [25,] 0.28340166 0.28340166 0.28340166 0.28340166 0.28340166 > > > > ** > Please be aware that, notwithstanding the fact that the pers...{{dropped}} > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting
[R] tagging results of "apply"
In trying to get a better understanding of vectorization I wrote the following code: My objective is to take two sets of time series and calculate the correlations for each combination of time series. mat1 <- matrix(sample(1:500, 25), ncol = 5) mat2 <- matrix(sample(501:1000, 25), ncol = 5) Scenario 1: apply(mat1, 1, function(x) cor(mat1, mat2[1,])) Scenario 2: apply(mat1, 1, function(x) cor(mat1, mat2)) Using scenario 1, (output below) I can see that correlations are calculated for just the first row of mat2 against each individual row of mat1. Using scenario 2, (output below) I can see that correlations are calculated for each row of mat2 against each individual row of mat1. Q1: The output of scenario2 consists of 25 rows of data. Are the first five rows mat1 against mat2[1,], the next five rows mat1 against mat2[2,], ... last five rows mat1 against mat2[5,]? Q2: I assign the output of scenario 2 to a new matrix matC <- apply(mat1, 1, function(x) cor(mat1, mat2)) However, I need a way to identify each row in matC as a pairing of rows from mat1 and mat2. Is there a parameter I can add to apply to do this? Scenario 1: > apply(mat1, 1, function(x) cor(mat1, mat2[1,])) [,1] [,2] [,3] [,4] [,5] [1,] -0.4626122 -0.4626122 -0.4626122 -0.4626122 -0.4626122 [2,] -0.9031543 -0.9031543 -0.9031543 -0.9031543 -0.9031543 [3,] 0.0735273 0.0735273 0.0735273 0.0735273 0.0735273 [4,] 0.7401259 0.7401259 0.7401259 0.7401259 0.7401259 [5,] -0.4548582 -0.4548582 -0.4548582 -0.4548582 -0.4548582 Scenario 2: > apply(mat1, 1, function(x) cor(mat1, mat2)) [,1][,2][,3][,4][,5] [1,] 0.19394126 0.19394126 0.19394126 0.19394126 0.19394126 [2,] 0.26402400 0.26402400 0.26402400 0.26402400 0.26402400 [3,] 0.12923842 0.12923842 0.12923842 0.12923842 0.12923842 [4,] -0.74549676 -0.74549676 -0.74549676 -0.74549676 -0.74549676 [5,] 0.64074122 0.64074122 0.64074122 0.64074122 0.64074122 [6,] 0.26931986 0.26931986 0.26931986 0.26931986 0.26931986 [7,] 0.08527921 0.08527921 0.08527921 0.08527921 0.08527921 [8,] -0.28034079 -0.28034079 -0.28034079 -0.28034079 -0.28034079 [9,] -0.15251915 -0.15251915 -0.15251915 -0.15251915 -0.15251915 [10,] 0.19542415 0.19542415 0.19542415 0.19542415 0.19542415 [11,] 0.75107032 0.75107032 0.75107032 0.75107032 0.75107032 [12,] 0.53042767 0.53042767 0.53042767 0.53042767 0.53042767 [13,] -0.51163612 -0.51163612 -0.51163612 -0.51163612 -0.51163612 [14,] -0.44396048 -0.44396048 -0.44396048 -0.44396048 -0.44396048 [15,] 0.57018745 0.57018745 0.57018745 0.57018745 0.57018745 [16,] 0.70480284 0.70480284 0.70480284 0.70480284 0.70480284 [17,] -0.36674283 -0.36674283 -0.36674283 -0.36674283 -0.36674283 [18,] -0.81826607 -0.81826607 -0.81826607 -0.81826607 -0.81826607 [19,] 0.53145184 0.53145184 0.53145184 0.53145184 0.53145184 [20,] 0.24568385 0.24568385 0.24568385 0.24568385 0.24568385 [21,] -0.10610402 -0.10610402 -0.10610402 -0.10610402 -0.10610402 [22,] -0.78650748 -0.78650748 -0.78650748 -0.78650748 -0.78650748 [23,] 0.04269423 0.04269423 0.04269423 0.04269423 0.04269423 [24,] 0.14704698 0.14704698 0.14704698 0.14704698 0.14704698 [25,] 0.28340166 0.28340166 0.28340166 0.28340166 0.28340166 ** Please be aware that, notwithstanding the fact that the pers...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R behaviour related to user input (readline()) and run selection
When I run the below section of code I get the following error: Error in if (co == "A" || co[1] == "O") { : missing value where TRUE/FALSE needed When I run the code in two parts where I first get the user's input then afterwards run the if/else section, there is no problem. Is there a way to stop the "run selection" process until the user has input a value? - calc_option <- function(){ msg <- cat("Please select an option:\n"," 'O'ne or 'A'll': ") co <- readline(msg) switch(co, O = "O", o = "O", A = "A", a = "A" ) } co <- calc_option() if (co == "A" || co[1] == "O") { print(paste("calc_option = ", co)) } else { print("calc_option is not acceptable") } Thanks, - Bruce ** Please be aware that, notwithstanding the fact that the pers...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.