from:"Bernzweig, Bruce \\\(Consultant\\\)"

Re: [R] the large dataset problem

2007-07-30 Thread Bernzweig, Bruce \(Consultant\)

Hi Eric,

I'm facing a similar problem.

Looking over the list of packages I came across:

R.huge: Methods for accessing huge amounts of data 
http://cran.r-project.org/src/contrib/Descriptions/R.huge.html

I haven't installed it yet so I don't know how well it works.  I
probably won't have time until next week at the earliest to look at it.

Would be interested in hearing your feedback if you do try it.

- Bruce

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Eric Doviak
Sent: Saturday, July 28, 2007 2:08 PM
To: r-help@stat.math.ethz.ch
Subject: [R] the large dataset problem

Dear useRs,

I recently began a job at a very large and heavily bureaucratic
organization. We're setting up a research office and statistical
analysis will form the backbone of our work. We'll be working with large
datasets such the SIPP as well as our own administrative data.

Due to the bureaucracy, it will take some time to get the licenses for
proprietary software like Stata. Right now, R is the only statistical
software package on my computer. 

This, of course, is a huge limitation because R loads data directly into
RAM making it difficult (if not impossible) to work with large datasets.
My computer only has 1000 MB of RAM, of which Microsucks Winblows
devours 400 MB. To make my memory issues even worse, my computer has a
virus scanner that runs everyday and I do not have the administrative
rights to turn the damn thing off. 

I need to find some way to overcome these constraints and work with
large datasets. Does anyone have any suggestions?

I've read that I should "carefully vectorize my code." What does that
mean ??? !!!

The "Introduction to R" manual suggests modifying input files with Perl.
Any tips on how to get started? Would Perl Data Language (PDL) be a good
choice?  http://pdl.perl.org/index_en.html

I wrote a script which loads large datasets a few lines at a time,
writes the dozen or so variables of interest to a CSV file, removes the
loaded data and then (via a "for" loop) loads the next few lines  I
managed to get it to work with one of the SIPP core files, but it's
SLW. Worse, if I discover later that I omitted a relevant variable,
then I'll have to run the whole script all over again.

Any suggestions?

Thanks,
- Eric

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



**
Please be aware that, notwithstanding the fact that the pers...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply & incompatible dimensions error

2007-07-24 Thread Bernzweig, Bruce \(Consultant\)

Thanks Benilton,

I know what I want to do, just not sure how to do it using R.  The help
documentation is not very clear.

What I am trying to do is calculate correlations on a row against row
basis:  mat1 row1 x mat2 row1, mat1 row1 x mat2 row2, ... mat1 row1 x
mat2 row-n, mat1 row-n, mat2 row-n

- Bruce

-Original Message-
From: Benilton Carvalho [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 24, 2007 11:31 AM
To: Bernzweig, Bruce (Consultant)
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] apply & incompatible dimensions error

are you positive that your function is doing what you expect it to do?

it looks like you want something like:

sapply(1:10, function(i) cor(mat1[i,], mat2[i,]))

b

On Jul 24, 2007, at 11:05 AM, Bernzweig, Bruce ((Consultant)) wrote:

> Hi,
>
> I've created the following two matrices (mat1 and mat2) and a function
> (f) to calculate the correlations between the two on a row by row  
> basis.
>
>   mat1 <- matrix(sample(1:500,50), ncol = 5,
>   dimnames=list(paste("row", 1:10, sep=""),
>   paste("col", 1:5, sep="")))
>
>   mat2 <- matrix(sample(501:1000,50), ncol = 5,
>   dimnames=list(paste("row", 1:10, sep=""),
>   paste("col", 1:5, sep="")))
>
>   f <- function(x,y) cor(x,y)
>
> When the matrices are squares (# rows = # columns) I have no problems.
>
> However, when they are not (as in the example above with 5 columns and
> 10 rows), I get the following error:
>
>> apply(mat1, 1, f, y=mat2)
> Error in cor(x, y, na.method, method == "kendall") :
> incompatible dimensions
>
> Any help would be appreciated.  Thanks!
>
> - Bruce
>
>
>
> **
> Please be aware that, notwithstanding the fact that the pers... 
> {{dropped}}
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



**
Please be aware that, notwithstanding the fact that the pers...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Calculating subsets of row pairs using somthing faster than a for loop.

2007-07-24 Thread Bernzweig, Bruce \(Consultant\)

Hi all,

 

Situation:

 

 - I have two matrices each w/ 4 rows and 20 columns.

 

mat1 <- matrix(sample(1:500,80), ncol = 20, 

dimnames=list(paste("mat1row", 1:4, sep=""), 

paste("mat1col", 1:20, sep="")))

 

mat2 <- matrix(sample(501:1000,80), ncol = 20, 

dimnames=list(paste("mat2row", 1:4, sep=""), 

paste("mat2col", 1:20, sep="")))

 

 - Each column represents a value in a time series.

 

Q: What do I want:

 

   Calculate moving average correlations for each row x row pair:

 

   For each row x row pair I want 10 values representing moving average

   correlations for 10 sets of time-values:

 

   cor(mat1[1,1:10], mat2[1,1:10])

   cor(mat1[1,2:11], mat2[1,2:11])

   ...

   cor(mat1[1,11:20], mat2[1,11:20])

   cor(mat1[1,1:10], mat2[2,1:10])

   ...

   cor(mat1[4,11:20], mat2[4,11:20])

 

   Result would be a 16 (rows) x 10 (col) matrix matMA

 

  ma1, ma2, ..., ma10 for (mat1 row1) x (mat2 row1)

  ma1, ma2, ..., ma10 for (mat1 row1) x (mat2 row2)

  ...

  ma1, ma2, ..., ma10 for (mat1 row4) x (mat2 row3)

  ma1, ma2, ..., ma10 for (mat1 row4) x (mat2 row4) 

 

   I would like to be able to do this without using a for loop

   due to the slowness of that method.

 

   Is it possible to iterate through subsets w/o using a for loop?

 

Thanks,

 

- Bruce

 

  P



**
Please be aware that, notwithstanding the fact that the pers...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cor inside/outside a function has different output

2007-07-24 Thread Bernzweig, Bruce \(Consultant\)

Sorry.  I looked up t after writing the previous email and realized that
was what I was looking for!



-Original Message-
From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 24, 2007 11:48 AM
To: Bernzweig, Bruce (Consultant)
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] cor inside/outside a function has different output

I think this is really answered already in my previous post but just in
case
try this:

> res1 <- t(apply(mat1, 1, cor, t(mat2)))
> res2 <- cor(t(mat1), t(mat2))
> all.equal(res1, res2, check.attributes = FALSE)
[1] TRUE


On 7/24/07, Bernzweig, Bruce (Consultant) <[EMAIL PROTECTED]> wrote:
> I'm calculating correlations between two matrices
>
>
>
> mat1 <- matrix(sample(1:500,25), ncol = 5,
>
> dimnames=list(paste("mat1row", 1:5, sep=""),
>
> paste("mat1col", 1:5, sep="")))
>
>
>
> mat2 <- matrix(sample(501:1000,25), ncol = 5,
>
> dimnames=list(paste("mat2row", 1:5, sep=""),
>
> paste("mat2col", 1:5, sep="")))
>
>
>
> using what would seem to be two similar methods:
>
>
>
>  Method 1:
>
>
>
>   > f <- function(x,y) cor(x,y)
>
>   > apply(mat1, 1, f, y=mat2)
>
>
>
>  Method 2:
>
>
>
>> cor(mat1, mat2)
>
>
>
> However, the results (see blow) are different:
>
>
>
> > apply(mat1, 1, f, y=mat2)
>
>
>
>   mat1row1   mat1row2mat1row3mat1row4mat1row5
>
> [1,] -0.27601028 -0.1352143  0.03538690 -0.03084075 -0.60171704
>
> [2,] -0.01595532 -0.3881197 -0.43663982  0.49081806  0.33291995
>
> [3,]  0.35969624 -0.0582948  0.57462169  0.09926796 -0.02948423
>
> [4,] -0.41435920 -0.7164638 -0.21213496 -0.55183934 -0.25341790
>
> [5,]  0.33802803  0.5371508  0.05219095  0.83533575  0.17850291
>
>
>
> > cor(mat1, mat2)
>
>mat2col1mat2col2   mat2col3   mat2col4   mat2col5
>
> mat1col1 -0.84077496 -0.01538414 -0.6078933 -0.2263840 -0.1421335
>
> mat1col2  0.23074421  0.54606286 -0.2354733  0.5214255 -0.2129077
>
> mat1col3 -0.8528  0.19550100 -0.5920509 -0.8694040  0.6853990
>
> mat1col4  0.08050976 -0.55449840  0.6225666  0.6187971 -0.8971584
>
> mat1col5 -0.10199564 -0.43854767 -0.5803077 -0.5100285  0.2848351
>
>
>
> Also, for method 2, the calculations are done on a column x column
> basis.  Is there any way to do this on a row by row basis.  Looking at
> the help page for cor, I don't see any parameters that could be used
to
> do this.
>
>
>
> Thanks,
>
>
>
> - Bruce
>
>
>
>
> **
> Please be aware that, notwithstanding the fact that the
pers...{{dropped}}
>
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



**
Please be aware that, notwithstanding the fact that the pers...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply & incompatible dimensions error

2007-07-24 Thread Bernzweig, Bruce \(Consultant\)

Thanks for the explanation.

As for the rows/columns thing, the data I receive is given to me in that
way.  I currently read it in using read.csv.  Is there a function I
should look at that can take that and transpose it or should I just
process the data first outside of R?

Thanks,

- Bruce

-Original Message-
From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 24, 2007 11:43 AM
To: Bernzweig, Bruce (Consultant)
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] apply & incompatible dimensions error

Then try this:

cor(t(mat1), t(mat2))

Also note

1. the above implies that mat1 and mat2 must have the same
number of columns since if x and y are vectors cor(x,y) only makes
sense if they have the same length.

2. the usual convention is that variables are stored as columns
andt that rows correspond to cases so typically you would have
(in terms of your mat1 and mat2):

Mat1 <- t(mat1)
Mat2 <- t(mat2)

and then use Mat1 and Mat2, e.g. cor(Mat1, Mat2)



On 7/24/07, Bernzweig, Bruce (Consultant) <[EMAIL PROTECTED]> wrote:
> Thanks Gabor!
>
> You state that my apply is taking rows of mat1 with columns of mat2.
>
> Is this because I have the y=mat2 parameter?
>
> > apply(mat1, 1, f, y=mat2)
>
> Actually, what I would like is to run the correlations on a row
against
> row basis:  mat1 row1 x mat2 row1, etc.
>
> Thanks again,
>
> - Bruce
>
>
> -Original Message-
> From: Gabor Grothendieck [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, July 24, 2007 11:31 AM
> To: Bernzweig, Bruce (Consultant)
> Cc: r-help@stat.math.ethz.ch
> Subject: Re: [R] apply & incompatible dimensions error
>
> Your apply is trying to take the correlations of the rows of mat1 with
> the
> columns of mat2 which, of course, does not work if they have different
> numbers of columns. I think you mean to take the correlations of the
> columns
> of mat1 with the columns of mat2.  For example, to take the
correlations
> of the 5 columns of mat1 with the first 4 columns of mat2 try:
>
> > cor(mat1, mat2[,1:4])
>col1   col2   col3   col4
> col1 -0.34624254 -0.2669519 -0.2705077  0.2183249
> col2 -0.26553255 -0.2687643 -0.0865895  0.1819025
> col3  0.19474613 -0.2334986  0.1746522  0.2326915
> col4  0.09328338  0.5117784  0.2413143 -0.3374916
> col5  0.27519716  0.1605331 -0.4057137  0.3282105
>
>
> On 7/24/07, Bernzweig, Bruce (Consultant) <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > I've created the following two matrices (mat1 and mat2) and a
function
> > (f) to calculate the correlations between the two on a row by row
> basis.
> >
> >mat1 <- matrix(sample(1:500,50), ncol = 5,
> >dimnames=list(paste("row", 1:10, sep=""),
> >paste("col", 1:5, sep="")))
> >
> >mat2 <- matrix(sample(501:1000,50), ncol = 5,
> >dimnames=list(paste("row", 1:10, sep=""),
> >paste("col", 1:5, sep="")))
> >
> >f <- function(x,y) cor(x,y)
> >
> > When the matrices are squares (# rows = # columns) I have no
problems.
> >
> > However, when they are not (as in the example above with 5 columns
and
> > 10 rows), I get the following error:
> >
> > > apply(mat1, 1, f, y=mat2)
> > Error in cor(x, y, na.method, method == "kendall") :
> >incompatible dimensions
> >
> > Any help would be appreciated.  Thanks!
> >
> > - Bruce
> >
> >
> >
> >
**
> > Please be aware that, notwithstanding the fact that the
> pers...{{dropped}}
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> **
> Please be aware that, notwithstanding the fact that the person sending
> this communication has an address in Bear Stearns' e-mail system, this
> person is not an employee, agent or representative of Bear Stearns.
> Accordingly, this person has no power or authority to represent, make
> any recommendation, solicitation, offer or statements or disclose
> information on behalf of or in any way bind Bear Stearns or any of its
> affiliates.
> **
>



**
Please be aware that, notwithstanding the fact that the pers...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply & incompatible dimensions error

2007-07-24 Thread Bernzweig, Bruce \(Consultant\)

Thanks Gabor!

You state that my apply is taking rows of mat1 with columns of mat2.

Is this because I have the y=mat2 parameter?

> apply(mat1, 1, f, y=mat2)

Actually, what I would like is to run the correlations on a row against
row basis:  mat1 row1 x mat2 row1, etc.

Thanks again,

- Bruce

-Original Message-
From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 24, 2007 11:31 AM
To: Bernzweig, Bruce (Consultant)
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] apply & incompatible dimensions error

Your apply is trying to take the correlations of the rows of mat1 with
the
columns of mat2 which, of course, does not work if they have different
numbers of columns. I think you mean to take the correlations of the
columns
of mat1 with the columns of mat2.  For example, to take the correlations
of the 5 columns of mat1 with the first 4 columns of mat2 try:

> cor(mat1, mat2[,1:4])
col1   col2   col3   col4
col1 -0.34624254 -0.2669519 -0.2705077  0.2183249
col2 -0.26553255 -0.2687643 -0.0865895  0.1819025
col3  0.19474613 -0.2334986  0.1746522  0.2326915
col4  0.09328338  0.5117784  0.2413143 -0.3374916
col5  0.27519716  0.1605331 -0.4057137  0.3282105

On 7/24/07, Bernzweig, Bruce (Consultant) <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I've created the following two matrices (mat1 and mat2) and a function
> (f) to calculate the correlations between the two on a row by row
basis.
>
>mat1 <- matrix(sample(1:500,50), ncol = 5,
>dimnames=list(paste("row", 1:10, sep=""),
>paste("col", 1:5, sep="")))
>
>mat2 <- matrix(sample(501:1000,50), ncol = 5,
>dimnames=list(paste("row", 1:10, sep=""),
>paste("col", 1:5, sep="")))
>
>f <- function(x,y) cor(x,y)
>
> When the matrices are squares (# rows = # columns) I have no problems.
>
> However, when they are not (as in the example above with 5 columns and
> 10 rows), I get the following error:
>
> > apply(mat1, 1, f, y=mat2)
> Error in cor(x, y, na.method, method == "kendall") :
>incompatible dimensions
>
> Any help would be appreciated.  Thanks!
>
> - Bruce
>
>
>
> **
> Please be aware that, notwithstanding the fact that the
pers...{{dropped}}
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

**
Please be aware that, notwithstanding the fact that the pers...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cor inside/outside a function has different output

2007-07-24 Thread Bernzweig, Bruce \(Consultant\)

I'm calculating correlations between two matrices 

 

mat1 <- matrix(sample(1:500,25), ncol = 5, 

dimnames=list(paste("mat1row", 1:5, sep=""), 

paste("mat1col", 1:5, sep="")))

 

mat2 <- matrix(sample(501:1000,25), ncol = 5, 

dimnames=list(paste("mat2row", 1:5, sep=""), 

paste("mat2col", 1:5, sep="")))

 

using what would seem to be two similar methods:

 

  Method 1:

 

   > f <- function(x,y) cor(x,y)

   > apply(mat1, 1, f, y=mat2)

 

  Method 2:

 

> cor(mat1, mat2)

 

However, the results (see blow) are different:

 

> apply(mat1, 1, f, y=mat2)

 

   mat1row1   mat1row2mat1row3mat1row4mat1row5

[1,] -0.27601028 -0.1352143  0.03538690 -0.03084075 -0.60171704

[2,] -0.01595532 -0.3881197 -0.43663982  0.49081806  0.33291995

[3,]  0.35969624 -0.0582948  0.57462169  0.09926796 -0.02948423

[4,] -0.41435920 -0.7164638 -0.21213496 -0.55183934 -0.25341790

[5,]  0.33802803  0.5371508  0.05219095  0.83533575  0.17850291

 

> cor(mat1, mat2)

mat2col1mat2col2   mat2col3   mat2col4   mat2col5

mat1col1 -0.84077496 -0.01538414 -0.6078933 -0.2263840 -0.1421335

mat1col2  0.23074421  0.54606286 -0.2354733  0.5214255 -0.2129077

mat1col3 -0.8528  0.19550100 -0.5920509 -0.8694040  0.6853990

mat1col4  0.08050976 -0.55449840  0.6225666  0.6187971 -0.8971584

mat1col5 -0.10199564 -0.43854767 -0.5803077 -0.5100285  0.2848351

 

Also, for method 2, the calculations are done on a column x column
basis.  Is there any way to do this on a row by row basis.  Looking at
the help page for cor, I don't see any parameters that could be used to
do this.

 

Thanks,

 

- Bruce



**
Please be aware that, notwithstanding the fact that the pers...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] apply & incompatible dimensions error

2007-07-24 Thread Bernzweig, Bruce \(Consultant\)

Hi,

I've created the following two matrices (mat1 and mat2) and a function
(f) to calculate the correlations between the two on a row by row basis.

mat1 <- matrix(sample(1:500,50), ncol = 5, 
dimnames=list(paste("row", 1:10, sep=""), 
paste("col", 1:5, sep="")))

mat2 <- matrix(sample(501:1000,50), ncol = 5, 
dimnames=list(paste("row", 1:10, sep=""), 
paste("col", 1:5, sep="")))

f <- function(x,y) cor(x,y)

When the matrices are squares (# rows = # columns) I have no problems.

However, when they are not (as in the example above with 5 columns and
10 rows), I get the following error:

> apply(mat1, 1, f, y=mat2)
Error in cor(x, y, na.method, method == "kendall") : 
incompatible dimensions

Any help would be appreciated.  Thanks!

- Bruce



**
Please be aware that, notwithstanding the fact that the pers...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tagging results of "apply"

2007-07-23 Thread Bernzweig, Bruce \(Consultant\)

Thanks for the clarification and help!

-Original Message-
From: Stephen Tucker [mailto:[EMAIL PROTECTED] 
Sent: Sunday, July 22, 2007 6:08 AM
To: Bernzweig, Bruce (Consultant); r-help
Subject: Re: [R] tagging results of "apply"

Actually if you want to tag both column and row, this might also help:

## Give dimension labels to both matrices
mat1 <- matrix(sample(1:500, 25), ncol = 5,
   dimnames=list(paste("mat1row",1:5,sep=""),
 paste("mat1col",1:5,sep="")))
mat2 <- matrix(sample(501:1000, 25), ncol = 5,
   dimnames=list(paste("mat2row",1:5,sep=""),
 paste("mat2col",1:5,sep="")))

cor(mat1[1,],mat2)
mat2col1   mat2col2   mat2col3  mat2col4 mat2col5
[1,] -0.06313535 -0.4679927 -0.5147084 -0.797748 -0.001457972

The column labels are there but are lost when returned from apply(), as
it
says in ?apply:

"In all cases the result is coerced by as.vector to one of the basic
vector
types before the dimensions are set"

> as.vector(cor(mat1[1,],mat2))
[1] -0.063135353 -0.467992672 -0.514708392 -0.797748010 -0.001457972

You lose the dimension labels in this case, so one option is to guard
against
this in the following way:

> as.vector(as.data.frame(cor(mat1[1,],mat2)))
 mat2col1   mat2col2   mat2col3  mat2col4 mat2col5
1 -0.06313535 -0.4679927 -0.5147084 -0.797748 -0.001457972

Unfortunately, if you use 'as.data.frame()' in 'function(x)', apply will
return a list - but you can bind the rows of the output:

> f <- function(x,y) as.data.frame(cor(x,y))
> do.call(rbind, apply(mat1,1,f,y=mat2))
mat2col1   mat2col2mat2col3   mat2col4 mat2col5
mat1row1 -0.06313535 -0.4679927 -0.51470839 -0.7977480 -0.001457972
mat1row2 -0.28750363  0.1681777  0.14671484  0.8139768  0.039982028
mat1row3 -0.62017387 -0.6932731 -0.72263865 -0.7929604  0.427366680
mat1row4  0.06441894  0.1707946 -0.11444747 -0.8213577  0.526239013
mat1row5 -0.09849051  0.7024540 -0.01997228  0.3712480  0.439037838

The result is a data frame, not a matrix, and note that the columns/rows
are
transposed in relation to the output of
  apply(mat1,1,f,y=mat2)

An alternative is to convert each row of mat1 into a list element [by
transposing it with t() and then feeding it to as.data.frame()] and then
use
sapply():

> sapply(as.data.frame(t(mat1)),f,y=mat2)
 mat1row1 mat1row2   mat1row3   mat1row4   mat1row5   
mat2col1 -0.06313535  -0.2875036 -0.6201739 0.06441894 -0.0984905 
mat2col2 -0.4679927   0.1681777  -0.6932731 0.1707946  0.702454   
mat2col3 -0.5147084   0.1467148  -0.7226387 -0.1144475 -0.01997228
mat2col4 -0.7977480.8139768  -0.7929604 -0.8213577 0.371248   
mat2col5 -0.001457972 0.03998203 0.4273667  0.526239   0.4390378



--- Stephen Tucker <[EMAIL PROTECTED]> wrote:

> Dear Bruce,
> In your functions, you need to use your bound variable, 'x' [not mat1]
in
> your anonymous function [function(x)] as the argument to cor().
> 
> For instance, you wrote:
> apply(mat1, 1, function(x) cor(mat1, mat2[1,]))
> apply(mat1, 1, function(x) cor(mat1, mat2))
> 
> They should be
> apply(mat1, 1, function(x) cor(x, mat2[1,]))
> apply(mat1, 1, function(x) cor(x, mat2))
> 
> or
> f <- function(x,y) cor(x, y)
> apply(mat1, 1, f, y=mat2[1,])
> apply(mat1, 1, f, y=mat2)
> 
> Then from the ?apply documentation - under section, 'Value' - the
following
> statement will help you predict its behavior in this case:
> "If each call to FUN returns a vector of length n, then apply returns
an
> array of dimension c(n, dim(X)[MARGIN]) if n > 1."
> 
> [each column of your output is the output from cor(mat1[i,],mat2) in
> Scenario
> 2]. As for tagging, you can try adding dimension labels [to the object
> which
> is passed as the 'X' argument to apply()]:
> 
> mat1 <- matrix(sample(1:500, 25), ncol = 5,
>dimnames=list(paste("row",1:5,sep=""),
>  paste("col",1:5,sep="")))
> mat2 <- matrix(sample(501:1000, 25), ncol = 5)
> 
> > apply(mat1, 1, function(x,y) cor(x, y), y=mat2)
> row1   row2   row3row4row5
> [1,]  0.39412464 -0.6241649  0.7423724  0.48391875  0.27085386
> [2,] -0.22912466 -0.4123714  0.2857004 -0.52447327  0.06971423
> [3,] -0.51027247  0.3256587 -0.6195050 -0.48309737  0.01699978
> [4,]  0.26353316 -0.1873564  0.2121154  0.88784766 -0.02257890
> [5,] -0.03771225 -0.4250040  0.3795558 -0.03372794 -0.05874675
> 
> Hope this helps,
> 
> Stephen
> 
> --- "Bernzweig, Bruce (Consultant)" <[EMAIL PROTECTED]> wrote:
> 
> > In trying to get a better understanding of vectorizat

Re: [R] tagging results of "apply"

2007-07-23 Thread Bernzweig, Bruce \(Consultant\)

Thanks!  I'll take a look at this.

-Original Message-
From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] 
Sent: Sunday, July 22, 2007 7:24 AM
To: Bernzweig, Bruce (Consultant)
Cc: r-help
Subject: Re: [R] tagging results of "apply"

You don't need apply at all here.  cor can already do that and it
automatically labels the rows and columns too.  Using the builtin
dataset anscombe whose first 4 columns are labelled x1,x2,x3,x4
and whose next 4 columns are labelled y1,y2,y3,y4 we have:

> cor(anscombe[1:4], anscombe[5:8])
   y1 y2 y3 y4
x1  0.8164205  0.8162365  0.8162867 -0.3140467
x2  0.8164205  0.8162365  0.8162867 -0.3140467
x3  0.8164205  0.8162365  0.8162867 -0.3140467
x4 -0.5290927 -0.7184365 -0.3446610  0.8165214

cor works the same with matrices too.


On 7/20/07, Bernzweig, Bruce (Consultant) <[EMAIL PROTECTED]> wrote:
> In trying to get a better understanding of vectorization I wrote the
> following code:
>
> My objective is to take two sets of time series and calculate the
> correlations for each combination of time series.
>
> mat1 <- matrix(sample(1:500, 25), ncol = 5)
> mat2 <- matrix(sample(501:1000, 25), ncol = 5)
>
> Scenario 1:
> apply(mat1, 1, function(x) cor(mat1, mat2[1,]))
>
> Scenario 2:
> apply(mat1, 1, function(x) cor(mat1, mat2))
>
> Using scenario 1, (output below) I can see that correlations are
> calculated for just the first row of mat2 against each individual row
of
> mat1.
>
> Using scenario 2, (output below) I can see that correlations are
> calculated for each row of mat2 against each individual row of mat1.
>
> Q1: The output of scenario2 consists of 25 rows of data.  Are the
first
> five rows mat1 against mat2[1,], the next five rows mat1 against
> mat2[2,], ... last five rows mat1 against mat2[5,]?
>
> Q2: I assign the output of scenario 2 to a new matrix
>
>matC <- apply(mat1, 1, function(x) cor(mat1, mat2))
>
>However, I need a way to identify each row in matC as a pairing of
> rows from mat1 and mat2.  Is there a parameter I can add to apply to
do
> this?
>
> Scenario 1:
> > apply(mat1, 1, function(x) cor(mat1, mat2[1,]))
>   [,1]   [,2]   [,3]   [,4]   [,5]
> [1,] -0.4626122 -0.4626122 -0.4626122 -0.4626122 -0.4626122
> [2,] -0.9031543 -0.9031543 -0.9031543 -0.9031543 -0.9031543
> [3,]  0.0735273  0.0735273  0.0735273  0.0735273  0.0735273
> [4,]  0.7401259  0.7401259  0.7401259  0.7401259  0.7401259
> [5,] -0.4548582 -0.4548582 -0.4548582 -0.4548582 -0.4548582
>
> Scenario 2:
> > apply(mat1, 1, function(x) cor(mat1, mat2))
> [,1][,2][,3][,4][,5]
>  [1,]  0.19394126  0.19394126  0.19394126  0.19394126  0.19394126
>  [2,]  0.26402400  0.26402400  0.26402400  0.26402400  0.26402400
>  [3,]  0.12923842  0.12923842  0.12923842  0.12923842  0.12923842
>  [4,] -0.74549676 -0.74549676 -0.74549676 -0.74549676 -0.74549676
>  [5,]  0.64074122  0.64074122  0.64074122  0.64074122  0.64074122
>  [6,]  0.26931986  0.26931986  0.26931986  0.26931986  0.26931986
>  [7,]  0.08527921  0.08527921  0.08527921  0.08527921  0.08527921
>  [8,] -0.28034079 -0.28034079 -0.28034079 -0.28034079 -0.28034079
>  [9,] -0.15251915 -0.15251915 -0.15251915 -0.15251915 -0.15251915
> [10,]  0.19542415  0.19542415  0.19542415  0.19542415  0.19542415
> [11,]  0.75107032  0.75107032  0.75107032  0.75107032  0.75107032
> [12,]  0.53042767  0.53042767  0.53042767  0.53042767  0.53042767
> [13,] -0.51163612 -0.51163612 -0.51163612 -0.51163612 -0.51163612
> [14,] -0.44396048 -0.44396048 -0.44396048 -0.44396048 -0.44396048
> [15,]  0.57018745  0.57018745  0.57018745  0.57018745  0.57018745
> [16,]  0.70480284  0.70480284  0.70480284  0.70480284  0.70480284
> [17,] -0.36674283 -0.36674283 -0.36674283 -0.36674283 -0.36674283
> [18,] -0.81826607 -0.81826607 -0.81826607 -0.81826607 -0.81826607
> [19,]  0.53145184  0.53145184  0.53145184  0.53145184  0.53145184
> [20,]  0.24568385  0.24568385  0.24568385  0.24568385  0.24568385
> [21,] -0.10610402 -0.10610402 -0.10610402 -0.10610402 -0.10610402
> [22,] -0.78650748 -0.78650748 -0.78650748 -0.78650748 -0.78650748
> [23,]  0.04269423  0.04269423  0.04269423  0.04269423  0.04269423
> [24,]  0.14704698  0.14704698  0.14704698  0.14704698  0.14704698
> [25,]  0.28340166  0.28340166  0.28340166  0.28340166  0.28340166
>
>
>
> **
> Please be aware that, notwithstanding the fact that the
pers...{{dropped}}
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting

[R] tagging results of "apply"

2007-07-20 Thread Bernzweig, Bruce \(Consultant\)

In trying to get a better understanding of vectorization I wrote the
following code:

My objective is to take two sets of time series and calculate the
correlations for each combination of time series.

mat1 <- matrix(sample(1:500, 25), ncol = 5)
mat2 <- matrix(sample(501:1000, 25), ncol = 5)

Scenario 1:
apply(mat1, 1, function(x) cor(mat1, mat2[1,]))

Scenario 2:
apply(mat1, 1, function(x) cor(mat1, mat2))

Using scenario 1, (output below) I can see that correlations are
calculated for just the first row of mat2 against each individual row of
mat1.

Using scenario 2, (output below) I can see that correlations are
calculated for each row of mat2 against each individual row of mat1.  

Q1: The output of scenario2 consists of 25 rows of data.  Are the first
five rows mat1 against mat2[1,], the next five rows mat1 against
mat2[2,], ... last five rows mat1 against mat2[5,]?

Q2: I assign the output of scenario 2 to a new matrix

matC <- apply(mat1, 1, function(x) cor(mat1, mat2))

However, I need a way to identify each row in matC as a pairing of
rows from mat1 and mat2.  Is there a parameter I can add to apply to do
this?

Scenario 1:
> apply(mat1, 1, function(x) cor(mat1, mat2[1,]))
   [,1]   [,2]   [,3]   [,4]   [,5]
[1,] -0.4626122 -0.4626122 -0.4626122 -0.4626122 -0.4626122
[2,] -0.9031543 -0.9031543 -0.9031543 -0.9031543 -0.9031543
[3,]  0.0735273  0.0735273  0.0735273  0.0735273  0.0735273
[4,]  0.7401259  0.7401259  0.7401259  0.7401259  0.7401259
[5,] -0.4548582 -0.4548582 -0.4548582 -0.4548582 -0.4548582

Scenario 2:
> apply(mat1, 1, function(x) cor(mat1, mat2))
 [,1][,2][,3][,4][,5]
 [1,]  0.19394126  0.19394126  0.19394126  0.19394126  0.19394126
 [2,]  0.26402400  0.26402400  0.26402400  0.26402400  0.26402400
 [3,]  0.12923842  0.12923842  0.12923842  0.12923842  0.12923842
 [4,] -0.74549676 -0.74549676 -0.74549676 -0.74549676 -0.74549676
 [5,]  0.64074122  0.64074122  0.64074122  0.64074122  0.64074122
 [6,]  0.26931986  0.26931986  0.26931986  0.26931986  0.26931986
 [7,]  0.08527921  0.08527921  0.08527921  0.08527921  0.08527921
 [8,] -0.28034079 -0.28034079 -0.28034079 -0.28034079 -0.28034079
 [9,] -0.15251915 -0.15251915 -0.15251915 -0.15251915 -0.15251915
[10,]  0.19542415  0.19542415  0.19542415  0.19542415  0.19542415
[11,]  0.75107032  0.75107032  0.75107032  0.75107032  0.75107032
[12,]  0.53042767  0.53042767  0.53042767  0.53042767  0.53042767
[13,] -0.51163612 -0.51163612 -0.51163612 -0.51163612 -0.51163612
[14,] -0.44396048 -0.44396048 -0.44396048 -0.44396048 -0.44396048
[15,]  0.57018745  0.57018745  0.57018745  0.57018745  0.57018745
[16,]  0.70480284  0.70480284  0.70480284  0.70480284  0.70480284
[17,] -0.36674283 -0.36674283 -0.36674283 -0.36674283 -0.36674283
[18,] -0.81826607 -0.81826607 -0.81826607 -0.81826607 -0.81826607
[19,]  0.53145184  0.53145184  0.53145184  0.53145184  0.53145184
[20,]  0.24568385  0.24568385  0.24568385  0.24568385  0.24568385
[21,] -0.10610402 -0.10610402 -0.10610402 -0.10610402 -0.10610402
[22,] -0.78650748 -0.78650748 -0.78650748 -0.78650748 -0.78650748
[23,]  0.04269423  0.04269423  0.04269423  0.04269423  0.04269423
[24,]  0.14704698  0.14704698  0.14704698  0.14704698  0.14704698
[25,]  0.28340166  0.28340166  0.28340166  0.28340166  0.28340166



**
Please be aware that, notwithstanding the fact that the pers...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R behaviour related to user input (readline()) and run selection

2007-06-25 Thread Bernzweig, Bruce \(Consultant\)

When I run the below section of code I get the following error: 

 

   Error in if (co == "A" || co[1] == "O") { : 

  missing value where TRUE/FALSE needed

 

When I run the code in two parts where I first get the user's input

then afterwards run the if/else section, there is no problem.

 

Is there a way to stop the "run selection" process until the user

has input a value?

 


-



   calc_option <- function(){

  msg <- cat("Please select an option:\n"," 'O'ne or 'A'll':
")

  co <- readline(msg)

  

  switch(co,

 O = "O", o = "O",

 A = "A", a = "A"

  )

   }

   

   co <- calc_option()



   if (co == "A" || co[1] == "O") {

  print(paste("calc_option = ", co))

   } else {

  print("calc_option is not acceptable")

   }





Thanks,



- Bruce



**
Please be aware that, notwithstanding the fact that the pers...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] the large dataset problem

Re: [R] apply & incompatible dimensions error

[R] Calculating subsets of row pairs using somthing faster than a for loop.

Re: [R] cor inside/outside a function has different output

Re: [R] apply & incompatible dimensions error

Re: [R] apply & incompatible dimensions error

[R] cor inside/outside a function has different output

[R] apply & incompatible dimensions error

Re: [R] tagging results of "apply"

Re: [R] tagging results of "apply"

[R] tagging results of "apply"

[R] R behaviour related to user input (readline()) and run selection

12 matches

Site Navigation

Mail list logo

Footer information