Re: [R] Dicrete Laplace distribution
Dear Nicolette, You can always use the bruit force solution which works for every discrete distribution with finite number of states: let p0,p1,...,pK be the probabilities of 0,1,...,K (such that they sum up to 1). Let P - c(p0,p1,...,pK) and P1 - c(cumsum(P),1) Now let x = runif() (uniform in (0,1)) and k - which(P1 = x)[1] has the desired distribution. Regards, Moshe. --- On Fri, 12/3/10, Raquel Nicolette nicole...@ua.pt wrote: From: Raquel Nicolette nicole...@ua.pt Subject: [R] Dicrete Laplace distribution To: r-help@r-project.org Received: Friday, 12 March, 2010, 2:47 AM Hello, http://tolstoy.newcastle.edu.au/R/help/04/07/0312.html#0313qlink1 Could anybody tell me how to generate discrete Laplacian distribution? I need to sample uma discretised Laplacian density like this: J( g - g´) ~ exp (-lambda | g´ - g |) g in {0,…, gmax} Thanks, Nicolette [[alternative HTML version deleted]] -Inline Attachment Follows- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Optimise huge data.frame construction
Hi Daniele, One possibility would be to make two runs. In the first run you are not building the matrix but just calculating the number of rows you need (in a loop). Then you allocate such matrix (only once) and fill it in the second run. Regards, Moshe. --- On Wed, 24/2/10, Daniele Amberti daniele.ambe...@ors.it wrote: From: Daniele Amberti daniele.ambe...@ors.it Subject: [R] Optimise huge data.frame construction To: r-help@r-project.org r-help@r-project.org Received: Wednesday, 24 February, 2010, 8:34 PM I have data for different items (ID) in a database. For each ID I have to get: - Timestamp of the observation (timestamp); - numerical value (val) that will be my response variable in some kind of model; - a variable number of variables in a know set (if value for a specific variable is not present in DB it is 0). To get to the above mentioned values I have to cycle over IDs, make some calculation and store results to construct a huge data.frame for subsequent estimations. The number of rows for each ID is random (typically 14 to 200). My current approach is to construct a matrix like this: out - c('A', 'B', 'C', 'D') out - matrix(-1, 5000, 3 + length(out), dimnames = list(1:5000, c('ID', 'timestamp' , 'val', out))) I access to out matrix by numerical index to substitute values ( out[1:n,1] - k ) When matrix is full I add 5000 rows and go on. Afterward I clean rows with ID set to -1 and than all other -1 values with 0 For my application typically an ID have something between 14 and 200 observations (mean around 50) but I have 15000 IDs ... After profiling I realize that accessing the out matrix this way is too slow. Do you have any idea on how to speed up this kind of process? I think something can be done creating a data.frame for each ID and bind them in the end. Is it a good idea? How can I implement that? List of data.frame? And than? Below some code that can be useful if someone would like to experiment ... alist - vector('list', 2) alist[[1]] - data.frame( ID = 1, timestamp = 1:14, val = rnorm(14), A = 1, B = 2, C = 3 ) alist[[2]] - data.frame( ID = 2, timestamp = 2:15, val = rnorm(14), B = 2, C = 3, D = 4 ) alist[[3]] - data.frame( ID = 3, timestamp = 3:30, val = rnorm(28), C = 1, D = 2 ) Thanks in advance for your valuable help. Daniele ORS Srl Via Agostino Morando 1/3 12060 Roddi (Cn) - Italy Tel. +39 0173 620211 Fax. +39 0173 620299 / +39 0173 433111 Web Site www.ors.it Qualsiasi utilizzo non autorizzato del presente messaggio e dei suoi allegati ? vietato e potrebbe costituire reato. Se lei avesse ricevuto erroneamente questo messaggio, Le saremmo grati se provvedesse alla distruzione dello stesso e degli eventuali allegati. Opinioni, conclusioni o altre informazioni riportate nella e-mail, che non siano relative alle attivit? e/o alla missione aziendale di O.R.S. Srl si intendono non attribuibili alla societ? stessa, n? la impegnano in alcun modo. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading surfer files
Check read.table (?read.table). --- On Wed, 24/2/10, RagingJim nowa0...@flinders.edu.au wrote: From: RagingJim nowa0...@flinders.edu.au Subject: [R] reading surfer files To: r-help@r-project.org Received: Wednesday, 24 February, 2010, 3:23 PM To the R experts, I am currently playing with a program which was designed so that the outputs are to be read in Surfer. I do not have the program, but the files, can but put into excel and graphed. I figured i could do the same thing with R. If I open the file with excel, and put the text into columns, and separate them by spaces, I can then graph it. But how do I do this in R? Once I have done all that in excel and saved it, I can then open it and do a 3d suface plot in R, but I would like to skip the excel step. Each line looks like this: 0.2100209E+03 0.2109997E+03 0.2111866E+03 0.2095219E+03 0.2079958E+03 0.2088303E+03 0.2095171E+03 0.2104975E+03 0.2124125E+03 0.2132827E+03 0.2125371E+03 0.2119912E+03 0.2119215E+03 0.2127108E+03 0.2147589E+03 0.2167924E+03 0.2157783E+03 0.2144353E+03 0.2127579E+03 0.2085133E+03 0.2065096E+03 0.2037935E+03 0.1973606E+03 0.1945031E+03 0.2076126E+03 0.2131146E+03 0.2014675E+03 0.1999074E+03 0.1994730E+03 0.2003694E+03 0.2003093E+03 0.2017771E+03 0.2070309E+03 0.1945123E+03 0.1892468E+03 0.1857383E+03 0.1817865E+03 0.1791424E+03 0.1785881E+03 0.1791883E+03 0.1803020E+03 0.1797575E+03 0.1807816E+03 0.1835612E+03 0.1866963E+03 0.1882194E+03 0.1919594E+03 0.1940257E+03 0.1977790E+03 0.1970629E+03 0.1983556E+03 0.1990551E+03 0.1999159E+03 0.2016027E+03 0.2029541E+03 0.2049124E+03 0.2077511E+03 0.2081407E+03 0.2074211E+03 0.2082496E+03 0.2088355E+03 0.2089197E+03 0.2106171E+03 0.2125163E+03 The first thing I have to do is remove the first 4 lines (which are just file information), and then separate each value into its own column. How do I go about doing that? Cheers lads. -- View this message in context: http://n4.nabble.com/reading-surfer-files-tp1566943p1566943.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Goodness of fit test for count data
You can compute the conditional probability that your variable equals k given that it is non-zero. For example, if X has poisson distribution with parameter lambda then P(X=k/X!=0) = P(X=k)/(1-P(X=0)) = (exp(-lambda)/(1-exp(-lambda))*lambda^k/k! Now you can find lambda for which the sum of squares of your errors is minimal and then use CHi-aquared test using these expected frequencies. Similarly for negative binomial distribution. --- On Tue, 23/2/10, pinusan anh...@msu.edu wrote: From: pinusan anh...@msu.edu Subject: [R] Goodness of fit test for count data To: r-help@r-project.org Received: Tuesday, 23 February, 2010, 6:11 AM Dear all, I am trying to test goodness of fit. I assume that a data follow Poisson or Negative binomial distribution. I can test the goodness of fit in case of no truncated data. However, I could not find any good function or packages when a data is truncated. For example, a frequency table for the number of visiting emergency room in one hundred one observations past one year is as follow: N freq 1 30 2 35 3 26 4 8 5 0 6 2 7 0 I expect the frequency table to satisfy a Poisson distribution or Negative binomial distribution. However, the distribution is different from the usual Poisson or Negative binomial distribution because one value, zero, is excluded. I expect that the distribution is zero truncated distribution. In case of SAS, I used NLMIXED procedure to calculate the expected probability when y=1 … y=n under the assumption that a data follows Poisson or Negative binomial distribution. And then I run Chi-square test. If you need the SAS code, I will send E-mail. I want to run this test in R. Could you suggest any idea that can I perform this test in R. Have a nice day. -- View this message in context: http://n4.nabble.com/Goodness-of-fit-test-for-count-data-tp1564963p1564963.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Quadprog help
Hi Sergio, Having singular Dmat is certainly a problem. I can see two possibilities: 1) try to eliminate X1,...,X9, so that you are left with P1,...,P6 only. 2) if you can not do this, add eps*X1^+...+eps*X9^2 to your matrix Dmat so that it is positive definite (eps is a small positive number). You can then try to make eps smaller and smaller (but still not too small in order not to get into numerical problems) and check whether your solution converges. --- On Fri, 19/2/10, Araujo-Enciso, Sergio Rene sergio-rene.araujo-enc...@agr.uni-goettingen.de wrote: From: Araujo-Enciso, Sergio Rene sergio-rene.araujo-enc...@agr.uni-goettingen.de Subject: [R] Quadprog help To: r-help@r-project.org Received: Friday, 19 February, 2010, 7:40 PM I am having some problems using Quadprog in R. I want to minimize the objective function : 200*P1-1/2*10*P1^2+100*P2-1/2*5*P2^2+160*P3-1/2*8*P3^2+50*P4-1/2*10*P4^2+50*P 5-1/2*20*P5^2+50*P6-1/2*10*P6^2, Subject to a set of constrains including not only the variables P1, P2, P3, P4, P5, P6, but also the variables X1, X2,X3,X4,X5,X6,X7,X8,X9. As the set of variables X's are not affecting the objective function, I assume that I have to enter them as zero's in the vector dvec and the matrix Dmat. My program states as: mat-matrix(0,15,15) diag(Dmat)-c(10,5,8,10,20,10,0,0,0,0,0,0,0,0,0) dvec- c(-200,-100,-160,-50,-50,-50,0,0,0,0,0,0,0,0,0) Amat- matrix(c(-1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,-1,0,1, 0,0,0,0,0,0,0,0,0,0,0,0,0,-1,1,0,0,0,0, 0,0,0,0,0,0,0,-1,0,0,0,1,0,0,0,0,0,0,0, 0,0,0,0,-1,0,0,1,0,0,0,0,0,0,0,0,0,0,0, 0,-1,0,1,0,0,0,0,0,0,0,0,0,0,-1,0,0,0,0, 1,0,0,0,0,0,0,0,0,0,0,-1,0,0,0,1,0,0,0,0, 0,0,0,0,0,0,0,-1,0,0,1,0,0,0,0,0,0,0,0,0, -10,0,0,0,0,0,-1,0,0,-1,0,0,-1,0,0,0,-5,0, 0,0,0,0,-1,0,0,-1,0,0,-1,0,0,0,-8,0,0,0,0, 0,-1,0,0,-1,0,0,-1,0,0,0,-10,0,0,1,1,1,0,0, 0,0,0,0,0,0,0,0,-20,0,0,0,0,1,1,1,0,0,0,0,0, 0,0,0,-10,0,0,0,0,0,0,1,1,1),15,15) bvec -c(0,-2,-2,-2,0,-1,-2,-1,0,-200,-100,-160,-50,-50,-50) solve.QP(Dmat,dvec,Amat,bvec=bvec) Nonetheless I get the message: Error en solve.QP(Dmat, dvec, Amat, bvec = bvec) : matrix D in quadratic function is not positive definite!. I think it has to do with the fact that in the Dmat matrix I end up with several columns with zeros. Do anyone have an idea of how to solve such a problem? Bests, Sergio René [[alternative HTML version deleted]] -Inline Attachment Follows- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Integral of function of dnorm
Yes, this can be easily computed analytically (even though my result is a bit different). --- On Fri, 12/2/10, dav...@rhotrading.com dav...@rhotrading.com wrote: From: dav...@rhotrading.com dav...@rhotrading.com Subject: Re: [R] Integral of function of dnorm To: Greg Snow greg.s...@imail.org, Trafim Vanishek rdapam...@gmail.com, Peter Dalgaard p.dalga...@biostat.ku.dk Cc: r-help@r-project.org Received: Friday, 12 February, 2010, 7:35 AM and it can be done analytically: = -(1 + log(2 pi)) / 2 -- David L. Reiner -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Greg Snow Sent: Thursday, February 11, 2010 11:58 AM To: Trafim Vanishek; Peter Dalgaard Cc: r-help@r-project.org Subject: Re: [R] Integral of function of dnorm Try: tmpfun - function(x) dnorm(x,mean=8,sd=1)*log(dnorm(x,mean=8,sd=1)) integrate( tmpfun, -Inf, Inf) Also you may want to look at the log argument to dnorm rather than taking the log of the function. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Trafim Vanishek Sent: Thursday, February 11, 2010 9:49 AM To: Peter Dalgaard Cc: r-help@r-project.org Subject: Re: [R] Integral of function of dnorm This is exactly what I mean. I need to find integrate(dnorm(mean=8,sd=1)*log(dnorm(mean=8,sd=1)), - Inf, Inf) Which doesn't work like that, because it says: Error in dnorm(mean = 8, sd = 1) : element 1 is empty; the part of the args list of '.Internal' being evaluated was: (x, mean, sd, log) So how can I define x? THanks a lot Dear all, How is it possible in R to calculate the following integral: Integral(-Inf, Inf)[log(dnorm(mean = 3, sd = 1))] how can I define that the density dnorm is taken on (-Inf, Inf) Thanks a lot! Er, if you mean integral with respect to the x argument in dnorm, then the answer is -Inf because log(dnorm(x,...)) goes quadratically to -Inf in both directions. If you meant otherwise, please tell us what you meant... [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This e-mail and any materials attached hereto, including, without limitation, all content hereof and thereof (collectively, XR Content) are confidential and proprietary to XR Trading, LLC (XR) and/or its affiliates, and are protected by intellectual property laws. Without the prior written consent of XR, the XR Content may not (i) be disclosed to any third party or (ii) be reproduced or otherwise used by anyone other than current employees of XR or its affiliates, on behalf of XR or its affiliates. THE XR CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES OF ANY KIND. TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW, XR HEREBY DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE XR CONTENT, AND NEITHER XR NOR ANY OF ITS AFFILIATES SHALL IN ANY EVENT BE LIABLE FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, INCLUDING, BUT NOT LIMITED TO, DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF PROFITS AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR RELIANCE UPON, OR INABILITY TO USE, ANY XR CONTENT, EVEN IF XR IS ADVISED OF THE POSSIBILITY OF SUCH DAMAGES OR IF SUCH DAMAGES WERE FORESEEABLE. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] difftimes; histogram; memory problems
Hi Jonathan, If minDate = min(Condition1) - max(Condition2) and maxDate = max(Condition1) - min(Condition2) then all your differences would be between minDay and maxDay, and hopefully this is not a very big range (unless you are going many thousands years into the past or the future). So basically for any number of days in this range you should count the number of times it appears. To speed up the calculations you may do this with just one loop (and one vectorized operation) - I can not do this without a single loop (if we want to limit the memory use). Let me know if you need the actual code. Regards, Moshe. --- On Tue, 16/2/10, Jonathan jonsle...@gmail.com wrote: From: Jonathan jonsle...@gmail.com Subject: Re: [R] difftimes; histogram; memory problems To: r-help r-help@r-project.org Received: Tuesday, 16 February, 2010, 1:17 PM Let me fix a couple of typos in that email: Hi All: Let's say I have two dataframes (Condition1 and Condition2); each being on the order of 12,000 and 16,000 rows; 1 column. The entries contain dates. I'd like to calculate, for each possible pair of dates (that is: Condition1[1:12,000] and Condition2[1:16,000], the number of days difference between the dates in the pair. The result should be a matrix 12,000 by 16,000, which I'll call M. The purpose of building such a matrix M is to create a histogram of all the values contained within it. Ex): Condition1 - data.frame('dates' = rep(c('2001-02-10','1998-03-14'),6000)) Condition2 - data.frame('dates' = rep(c('2003-07-06','2007-03-11'),8000)) First, my instinct is to try and vectorize the operation. I tried this by expanding each vector into a matrix of repeated vectors (I'd then just subtract the two resultant matrices to get matrix M). I got the following error: expandedCondition1 - matrix(rep(Condition1[[1]], nrow(Condition2)), byrow=TRUE, ncol=nrow(Condition1)) Error: cannot allocate vector of size 732.4 Mb expandedCondition2 - matrix(rep(Condition2[[1]], nrow(Condition1)), byrow=FALSE, nrow=nrow(Condition2)) Error: cannot allocate vector of size 732.4 Mb Since it seems these matrices are too large, I'm wondering whether there's a better way to call a hist command without actually building the said matrix.. I'd greatly appreciate any ideas! Best, Jonathan On Mon, Feb 15, 2010 at 8:19 PM, Jonathan jonsle...@gmail.com wrote: Hi All: Let's say I have two dataframes (Condition1 and Condition2); each being on the order of 12,000 and 16,000 rows; 1 column. The entries contain dates. I'd like to calculate, for each possible pair of dates (that is: Condition1[1:10,000] and Condition2[1:10,000], the number of days difference between the dates in the pair. The result should be a matrix 12,000 by 16,000. Really, what I need is a histogram of all the values in this matrix. Ex): Condition1 - data.frame('dates' = rep(c('2001-02-10','1998-03-14'),6000)) Condition2 - data.frame('dates' = rep(c('2003-07-06','2007-03-11'),8000)) First, my instinct is to try and vectorize the operation. I tried this by expanding each vector into a matrix of repeated vectors (I'd then just subtract the two). I got the following error: expandedCondition1 - matrix(rep(Condition1[[1]], nrow(Condition2)), byrow=TRUE, ncol=nrow(Condition1)) Error: cannot allocate vector of size 732.4 Mb expandedCondition2 - matrix(rep(Condition2[[1]], nrow(Condition1)), byrow=FALSE, nrow=nrow(Condition2)) Error: cannot allocate vector of size 732.4 Mb Since it seems these matrices are too large, I'm wondering whether there's a better way to call a hist command without actually building the said matrix.. I'd greatly appreciate any ideas! Best, Jonathan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about rank() function
Hi, I believe that the reason is that even though the first 4 elements of your fmodel look equal (when rounded to 4 decimal places) they are actually not. To check this try fmodel[1:4]-fmodel[1] --- On Thu, 11/2/10, Something Something mailinglist...@gmail.com wrote: From: Something Something mailinglist...@gmail.com Subject: [R] Question about rank() function To: r-help@r-project.org Received: Thursday, 11 February, 2010, 6:23 PM Hello, I am trying to get the 'rank' function to work for me, but not sure what I am doing wrong. Please help. I ran the following commands: data = read.table(test1.csv, head=T, as.is=T, na.string=., row.nam=NULL) X1 = as.factor(data[[3]]) X2 = as.factor(data[[4]]) X3 = as.factor(data[[5]]) Y = data[[2]] model = lm(Y ~ X1*X2*X3, na.action = na.exclude) fmodel = fitted(model) fmodel (First line is shown below.) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 180.3763 180.3763 180.3763 180.3763 180.4546 180.3763 177.9245 177.9245 181.3859 180.3763 NA 180.4546 180.3763 180.4546 180.3763 180.3763 180.4546 Then I run: fmodel.rank = rank(fmodel) fmodel.rank (First line is shown below) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 375.0 222.0 68.5 68.5 402.5 222.0 33.5 33.5 465.5 222.0 500.0 402.5 222.0 402.5 222.0 222.0 378.5 222.0 222.0 222.0 222.0 222.0 402.5 222.0 33.5 222.0 As you can see, first 4 values of 'fmodel' are 180.3763, so after running rank(fmodel) I expected the ranks of first 4 to be the same, but they are not. What am I doing wrong? Please let me know. Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Resampling a grid to coarsen its resolution
One possibility I can see is to replace - by NA and use mean with na.rm=TRUE. --- On Wed, 10/2/10, Steve Murray smurray...@hotmail.com wrote: From: Steve Murray smurray...@hotmail.com Subject: [R] Resampling a grid to coarsen its resolution To: r-help@r-project.org Received: Wednesday, 10 February, 2010, 3:20 AM Dear all, I have a grid (data frame) dataset at 0.5 x 0.5 degrees spatial resolution (720 columns x 360 rows; regular spacing) and wish to coarsen this to a resolution of 2.5 x 2.5 degrees. A simple calculation which takes the mean of a block of points to form the regridded values would do the trick. Values which should be excluded from the calculation are - (unless all points within a block are -, in which case - should be returned as the 'new' cell). How would I go about achieving this in R? Any help or guidelines would be very much appreciated. Many thanks, Steve _ Got a cool Hotmail story? Tell us now __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Polynomial equation
Hi Chris, You can use lm with poly (look ?lm, ?poly). If x and y are your arrays of points and you wish to fit a polynom of degree 4, say, enter: model - lm(y~poly(x,4,raw=TRUE) and then summary(model) The raw=TRUE causes poly to use 1,x,x^2,x^3,... instead of orthogonal polynomials (which are better numerically but may be not what you need). Regards, Moshe. --- On Fri, 8/1/10, chrisli1223 chri...@austwaterenv.com.au wrote: From: chrisli1223 chri...@austwaterenv.com.au Subject: [R] Polynomial equation To: r-help@r-project.org Received: Friday, 8 January, 2010, 12:32 PM Hi all, I have got a dataset. In Excel, I can fit a polynomial trend line beautifully. However, the equation that Excel calculates appear to be incorrect. So I am thinking about using R. My questions are: (1) How do I fit a polynomial trendline to a graph? (2) How do I calculate and display the equation for that trendline? Many thanks for your help. Greatly appreciated. Chris -- View this message in context: http://n4.nabble.com/Polynomial-equation-tp1009398p1009398.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Polynomial equation
Hi Chris, Curve fitting has nothing to do with statistics (even though it is used in statistics). To get an idea of this, try the following: x - ((-100):100)/100 cofs-rnorm(4)#create coefficients y - cofs[1] + cofs[2]*x + cofs[3]*x^2 +cofs[4]*x^3 y1 - y +rnorm(201,0,0.1)#add noise mm - lm(y1~poly(x,3,raw=TRUE))#fit a polynomial of degree 3 y2 - predict(mm,as.data.frame(x)) #compute the polynomial for every point of x plot(x,y,type=l);lines(x,y1,col=red);lines(x,y2,col=blue) cofs mm$coefficients For the exponential fit, there exist two options: you are trying to fit y = exp(a*x+b) one possibility is to fit log(y) = a*x+b by mm - lm(log(y)~x) and the other (more correct) one is to use any of the least squares packages. I believe that you better read a little bit about curve fitting before doing all this. Regards, Moshe. --- On Fri, 8/1/10, chrisli1223 chri...@austwaterenv.com.au wrote: From: chrisli1223 chri...@austwaterenv.com.au Subject: Re: [R] Polynomial equation To: r-help@r-project.org Received: Friday, 8 January, 2010, 2:14 PM Thank you very much for your help. Greatly appreciated! However, due to my limited stats background, I am unable to find out the equation of the trendline from the summary table. Besides, how do I fit the trendline on the graph? I intend to put the first column of data onto x axis and the second column onto y axis. Are they the x and y in your example? Many thanks, Chris Moshe Olshansky-2 wrote: Hi Chris, You can use lm with poly (look ?lm, ?poly). If x and y are your arrays of points and you wish to fit a polynom of degree 4, say, enter: model - lm(y~poly(x,4,raw=TRUE) and then summary(model) The raw=TRUE causes poly to use 1,x,x^2,x^3,... instead of orthogonal polynomials (which are better numerically but may be not what you need). Regards, Moshe. --- On Fri, 8/1/10, chrisli1223 chri...@austwaterenv.com.au wrote: From: chrisli1223 chri...@austwaterenv.com.au Subject: [R] Polynomial equation To: r-help@r-project.org Received: Friday, 8 January, 2010, 12:32 PM Hi all, I have got a dataset. In Excel, I can fit a polynomial trend line beautifully. However, the equation that Excel calculates appear to be incorrect. So I am thinking about using R. My questions are: (1) How do I fit a polynomial trendline to a graph? (2) How do I calculate and display the equation for that trendline? Many thanks for your help. Greatly appreciated. Chris -- View this message in context: http://n4.nabble.com/Polynomial-equation-tp1009398p1009398.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://n4.nabble.com/Polynomial-equation-tp1009398p1009438.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confidence intervals - a statistical question, nothing to do with R
Dear list, I have r towns, T1,...,Tr where town i has population Ni. For each town I randomly sampled Mi individuals and found that Ki of them have a certain property. So Pi = Ki/Mi is an unbiased estimate of the proportion of people in town i having that property and the weighted average of Pi is an unbiased estimate of the proportion of the entire population (all r towns) having this property. I can compute confidence intervals for the proportion of people having that property for each city (in my case Mi Ni and so binomial distribution is a good approximation to Ki). My question is: how can I compute confidence interval for the proportion of people in the entire population (r towns) having that property? Either analytical or numerical (simulation?) method will be all right. Thank you in advance, Moshe. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov smirnov test
Hi Roslina, I believe that you can ignore the warning. Alternatively, you may add a very small random noise to pairs with ties, i.e. something like xobs[which(duplicated(xobs))] - xobs[which(duplicated(xobs))] + 1.0e-6*sd(xobs)*rnorm(length(which(duplicated(xobs Regards, Moshe. --- On Tue, 13/10/09, Roslina Zakaria zrosl...@yahoo.com wrote: From: Roslina Zakaria zrosl...@yahoo.com Subject: [R] Kolmogorov smirnov test To: r-help@r-project.org Received: Tuesday, 13 October, 2009, 9:58 AM Hi r-users, I would like to use Kolmogorov smirnov test but in my observed data(xobs) there are ties. I got the warning message. My question is can I do something about it? ks.test(xobs, xsyn) Two-sample Kolmogorov-Smirnov test data: xobs and xsyn D = 0.0502, p-value = 0.924 alternative hypothesis: two-sided Warning message: In ks.test(xobs, xsyn) : cannot compute correct p-values with ties Thank you for all your help. [[alternative HTML version deleted]] -Inline Attachment Follows- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] keeping all rows with the same values, and not only unique ones
test[which(test[,total] %in% needed),] --- On Fri, 25/9/09, Dimitri Liakhovitski ld7...@gmail.com wrote: From: Dimitri Liakhovitski ld7...@gmail.com Subject: [R] keeping all rows with the same values, and not only unique ones To: R-Help List r-h...@stat.math.ethz.ch Received: Friday, 25 September, 2009, 8:52 AM Dear R-ers, I have a data frame test: test-data.frame(x=c(1,2,3,4,5,6,7,8),y=c(2,3,4,5,6,7,8,9),total=c(7,7,8,8,9,9,10,10)) test I have a vector needed: needed-c(7,9) needed I need the result to look like this: 1 2 7 2 3 7 5 6 9 6 7 9 When I do the following: result-test[test[total]==needed,] result I only get unique rows that have 7 or 9 in total: 1 2 7 6 7 9 How could I keep ALL rows that have 7 or 9 in total Thanks a million! -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Basic population dynamics
Assuming that at the end all of them are dead, you can do the following: sum(deaths)-cumsum(deaths) Regards, Moshe. --- On Wed, 2/9/09, Frostygoat frostyg...@gmail.com wrote: From: Frostygoat frostyg...@gmail.com Subject: [R] Basic population dynamics To: r-help@r-project.org Received: Wednesday, 2 September, 2009, 4:48 AM Hello, For insect mortality data I'm trying to get an R script that will take the data from the raw form and convert it to Lx (% survival) for a number of treatments. The raw data has the number of days lived for each individual for the respective treatment. Thus, for example, when R selects the data for a single treatment I end up with the following vectors: day=seq(from=0,to=6) deaths=c(0,0,2,0,0,1,6) where deaths is the number of deaths on a given day. Now I need to create a new vector with the number alive for each day and this is where I'm stuck... I've tried to work various for and while loops but haven't had success. The vector should be: Alive=c(9,9,7,7,7,6,0) I realize it is a very basic problem that is easily accomplished in one's head or on a spreadsheet but in the context of the size of the data set I wish to have R do it for me. I would welcome any suggestions please. Best regards. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help on efficiency/vectorization
You can do for (i in 1:ncol(x)) {names - rownames(x)[which(x[,i]==1)];eval(parse(text=paste(V,i,.ind-names,sep=)));} --- On Thu, 27/8/09, Steven Kang stochastick...@gmail.com wrote: From: Steven Kang stochastick...@gmail.com Subject: [R] Help on efficiency/vectorization To: r-help@r-project.org Received: Thursday, 27 August, 2009, 4:13 PM Dear R users, I am trying to extract the rownames of a data set for which each columns meet a certain criteria. (condition - elements of each column to be equal 1) I have the correct result, however I am seeking for more efficient (desire vectorization) way in implementing such problem as it can get quite messy if there are hundreds of columns. Arbitrary data set and codes are shown below for your reference: x - as.data.frame(matrix(round(runif(50),0),nrow=5)) rownames(x) - letters[1:dim(x)[1]] x V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 a 0 1 1 1 0 0 0 0 1 0 b 1 1 1 1 0 1 0 0 1 1 c 0 1 1 0 0 0 0 0 0 1 d 1 0 0 1 1 1 1 1 0 0 e 1 0 0 0 0 1 1 0 1 0 V1.ind - rownames(x)[x[,V1]==1] V2.ind - rownames(x)[x[,V2]==1] V3.ind - rownames(x)[x[,V3]==1] V4.ind - rownames(x)[x[,V4]==1] : : V10.ind - rownames(x)[x[,V10]==1] V1.ind [1] b d e V2.ind [1] a b c V3.ind [1] a b c : : V10.ind [1] b c Your expertise in resolving this issue would be highly appreciated. Steve [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Submit a R job to a server
Hi Deb, Based on your last note (and after briefly looking at Rserve) I believe that you should install R with all the packages you need on the server and then use it like you are using any workstation, i.e. log in to it and do whatever you need. Regards, Moshe. --- On Thu, 27/8/09, Debabrata Midya debabrata.mi...@commerce.nsw.gov.au wrote: From: Debabrata Midya debabrata.mi...@commerce.nsw.gov.au Subject: Re: [R] Submit a R job to a server To: Cedrick W. Johnson cedr...@cedrickjohnson.com, m_olshan...@yahoo.com Cc: r-help@r-project.org Received: Thursday, 27 August, 2009, 1:48 PM Cedrick / Moshe, Thank you very much for such a quick response. My objective is to do the faster calculations by submitting a R job from my desktop to this server. Oracle 8i Enterprise Edition is currently running on this server. My objective is not only limited to access various oracle tables from this server but also I like to utilise this server for faster calculations and I like to use R for this. So far, I did not install anything related to R into this server. I have R installed on my desktop (Windows XP). I use RODBC / ODBC products to access data from this server and I use R / S-PLUS (installed on my desktop) to do the analyses. Is there any way to submit R job from my desktop to this server? How can I use this server to do my job faster? Once again, thank you very much for the time you have given. I am looking forward for your reply. Regards, Deb Cedrick W. Johnson cedr...@cedrickjohnson.com 27/08/2009 12:41 pm Good Morning Deb- It's unclear (to me at least) what you are trying to do.. What is the server running? Is it running RServe for which you have a userid and pwd or is it just a plain server running some OS? *IF* this is the case (RServe): on the windows machine you will need to: install.packages(Rserve) library(Rserve) ?Rserve --and optionally-- ?RSeval I *think* RSeval/Rserve *may* be what you're looking for, but I am going off just an assumption by what you meant regarding server.. Rserve can be used as a client and server on both platforms (I personally have had more success running the server portion under linux, with linux and java clients in which the clients are a mixture of the package's java access *and* R client access to the rserver instance) More info on rserve at: http://www.rforge.net/Rserve HTH, cedrick Debabrata Midya wrote: Dear R users, Thanks in advance. I am Deb, Statistician at NSW Department of Commerce, Sydney. I am using R 2.9.1 on Windows XP. May I request you to provide me information on the following: 1. I have access to a server ( I have userid and pwd) 2. What are the packages I need to submit a job from Windows XP to this server? Should I need to install any package on this server before submitting a job? Once again, thank you very much for the time you have given. I am looking forward for your reply. Regards, Deb ** This email message, including any attached files, is confidential and intended solely for the use of the individual or entity to whom it is addressed. The NSW Department of Commerce prohibits the right to publish, copy, distribute or disclose any information contained in this email, or its attachments, by any party other than the intended recipient. If you have received this email in error please notify the sender and delete it from your system. No employee or agent is authorised to conclude any binding agreement on behalf of the NSW Department of Commerce by email. The views or opinions presented in this email are solely those of the author and do not necessarily represent those of the Department, except where the sender expressly, and with authority, states them to be the views of NSW Department of Commerce. The NSW Department of Commerce accepts no liability for any loss or damage arising from the use of this email and recommends that the recipient check this email and any attached files for the presence of viruses. ** [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ** This email message, including any attached files, is confidential and intended solely for the use of the individual or entity to whom it is addressed. The NSW Department of Commerce prohibits the
Re: [R] expanding 1:12 months to Jan:Dec
One possible (but not very elegant) solution is: aa - paste(1:12,:10:2009,sep=) dd-as.Date(aa,format=%m:%d:%Y) mon - format(dd,%b) mon [1] Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec --- On Thu, 20/8/09, Liviu Andronic landronim...@gmail.com wrote: From: Liviu Andronic landronim...@gmail.com Subject: [R] expanding 1:12 months to Jan:Dec To: r-help@r-project.org Help r-help@r-project.org Received: Thursday, 20 August, 2009, 5:14 PM Dear R users I would like to do some spreadsheet style expansion of dates. For example, I would need to obtain a vector of months. I approached in an obviously wrong way: paste(01:12) [1] 1 2 3 4 5 6 7 8 9 10 11 12 as.Date(paste(01:12), %m) [1] NA NA NA NA NA NA NA NA NA NA NA NA to subsequently format(.., %b). Other than writing the months manually, could anyone suggest an easier way to obtain such a list? Liviu -- Do you know how to read? http://www.alienetworks.com/srtest.cfm Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Principle components analysis on a large dataset
Hi Misha, Since PCA is a linear procedure and you have only 6000 observations, you do not need 68000 variables. Using any 6000 of your variables so that the resulting 6000x6000 matrix is non-singular will do. You can choose these 6000 variables (columns) randomly, hoping that the resulting matrix is non-singular (and checking for this). Alternatively, you can try something like choosing one nice column, then choosing the second one which is the mostly orthogonal to the first one (kind of Gram-Schmidt), then choose the third one which is mostly orthogonal to the first two, etc. (I am not sure how much rounoff may be a problem- try doing this using higher precision if you can). Note that you do not need to load the entire 6000x68000 matrix into memory (you can load several thousands of columns, process them and discard them). Anyway, you will end up with a 6000x6000 matrix, i.e. 36,000,000 entries, which can fit into a memory and you can perform the usual PCA on this matrix. Good luck! Moshe. P.S. I am curious to see what other people think. --- On Fri, 21/8/09, misha680 mk144...@bcm.edu wrote: From: misha680 mk144...@bcm.edu Subject: [R] Principle components analysis on a large dataset To: r-help@r-project.org Received: Friday, 21 August, 2009, 10:45 AM Dear Sirs: Please pardon me I am very new to R. I have been using MATLAB. I was wondering if R would allow me to do principal components analysis on a very large dataset. Specifically, our dataset has 68800 variables and around 6000 observations. Matlab gives out of memory errors. I have tried also doing princomp in pieces, but this does not seem to quite work for our approach. Anything that might help much appreciated. If anyone has had experience doing this in R much appreciated. Thank you Misha -- View this message in context: http://www.nabble.com/Principle-components-analysis-on-a-large-dataset-tp25072510p25072510.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extra .
My guess is that 6. comes for 6.0 - something which comes from programming languages where 6 represents 6 as integer while 6. (or 6.0) represents 6 as floating point number. --- On Fri, 21/8/09, kfcnhl zhengchenj...@hotmail.com wrote: From: kfcnhl zhengchenj...@hotmail.com Subject: [R] extra . To: r-help@r-project.org Received: Friday, 21 August, 2009, 12:34 PM sigma0 - sqrt((6. * var(maxima))/pi) What does the '.' do here? -- View this message in context: http://www.nabble.com/extra-.-tp25073255p25073255.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] feature weighting in randomForest
Hi Tim, As far as I know you can not weigh predictors (and I believe that you really should not). You may weigh classes (and, in a sense, cases), but this is an entirely different issue. --- On Wed, 5/8/09, Häring, Tim (LWF) tim.haer...@lwf.bayern.de wrote: From: Häring, Tim (LWF) tim.haer...@lwf.bayern.de Subject: [R] feature weighting in randomForest To: r-help@r-project.org Received: Wednesday, 5 August, 2009, 5:59 PM Hello ! I´m using randomForest for classifacation problems. My dataset has 21.000 observations and 96 predictors. I know that some predictors of my dataset have more influence to classify my data than others. Therefore I would like to know if there is a way to weight my predictors. I know that for constructing each tree in a forest the most influencial predictor is used for partitioning the data. But maybe it would have an effect if I weight my predictors. Thanks in advance TIM --- Tim Haering Bavarian State Institute of Forest Research Department of Forest Ecology Am Hochanger 11 D-85354 Freising http://www.lwf.bayern.de __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] System is computationally singular and scale of covariates
Hi, What do you mean by outer product? If you have two vectors, say x and y, of lenght n and you define matrix A by A(i,j) = x(i)*y(j) then your matrix has rank one and it is VERY singular (in exact arithmetics). Is this is what you mean by outer product? --- On Sun, 16/8/09, Stephan Lindner lindn...@umich.edu wrote: From: Stephan Lindner lindn...@umich.edu Subject: [R] System is computationally singular and scale of covariates To: r-h...@stat.math.ethz.ch Received: Sunday, 16 August, 2009, 9:15 AM Dear all, I'm running a self-written numerical optimization routine (hazard model) which includes computing the inverse of the outer product of the score. I have been getting the above error message (System is computationally singular), and after some tweaking, I realized that these variables have some high numbers and the problem could be circumvented by scaling them down (i.e. dividing them by 100 or taking log). Since this is obviously not the best procedure, and since I have to estimate more complex models down the rode, I would like to understand better the reason which causes this problem. It is not a multicollinearity issue (I get the error even when using one single variable), and I think my code is clean (better be paranoid though). My sense is that the outer product just becomes large, and these are hard to invert. Maybe there are restrictions concering R in the size of the numbers? If that is the case, I think I would fare better scaling down the outer product rather than the variable itself, but I first wanted to ask the community to get and understanding of what could be the problem. Thanks a lot, Stephan Lindner -- --- Stephan Lindner University of Michigan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem about t test
You could do the following: y - apply(dat,1,function(a) t.test(a[1:10],a[11:30])$p.value) This will produce an array of 2 p-values. --- On Fri, 14/8/09, Gina Liao yi...@hotmail.com wrote: From: Gina Liao yi...@hotmail.com Subject: [R] problem about t test To: r-h...@stat.math.ethz.ch Received: Friday, 14 August, 2009, 12:53 PM Hello, I have a data frame str(dat)'data.frame': 2 obs. of 30 variables it contains two information-two types of cancers:stage A(A1 to A10) and stage B(B1 to B20) ##totally 30 patients-2 sets of gene expression I'd like to find the lists for top 20 differentially expressed genes using t-test (by P-value). Here is my code, unfortunately it doesn't work...I need the help,please. I just learned R for two weeks, and hope you can give the hint! A-dat[,1:10] B-dat[,11:30] bb-function(t.test)+ { P.value.to.return - t.test(A,B)$p.value+ return(P.value.to.return)+ } aa-t(apply(dat,1,bb)) Thanks!! Best Regards,vie _ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Solutions of equation systems
Is your system of equations linear? --- On Fri, 14/8/09, Moreno Mancosu nom...@tiscali.it wrote: From: Moreno Mancosu nom...@tiscali.it Subject: [R] Solutions of equation systems To: r-help@r-project.org Received: Friday, 14 August, 2009, 2:29 AM Hello all! Maybe it's a newbie question(in fact I _am_, a newbie), but I hope you'll find the solution. I have an equation system which has k equation and n variables (kn). I would like to obtain a description of the solutions (which can be the equation of lines or a plane, or everything else) with the lesser degree of freedom, obviously using R. In other words, I would like to obtain a result which is very similar to the results of Maxima or similar software. I tried to search on internet and i found the systemfit package, but I think it is not related with my problem. Any idea? Thanks in advance, Moreno __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting the number of non-NA values per day
Try tempFun - function(x) sum(!is.na(x)) nonZeros - aggregate(pollution[pol],format(pollution[date],%Y-%j), FUN = tempFun) --- On Wed, 12/8/09, Tim Chatterton tim.chatter...@uwe.ac.uk wrote: From: Tim Chatterton tim.chatter...@uwe.ac.uk Subject: [R] Counting the number of non-NA values per day To: r-help@r-project.org Received: Wednesday, 12 August, 2009, 3:26 AM I have a long dataframe (pollution) that contains a column of hourly date information (date) and a column of pollution measurements (pol) I have been happily calculating daily means and daily maximums using the aggregate function DMEANpollution = aggregate(pollution[pol], format(pollution[date],%Y-%j), mean, na.rm = TRUE) DMAXpollution = aggregate(pollution[pol], format(pollution[date],%Y-%j), max, na.rm = TRUE) However, I also need to count the number of valid measurements for each day to check that the mean and max are likely to be valid (for example I need at least 18 hourly measurements to calculate a valid daily mean) Try as I might I have not found a simple way of doing this. Can anybody help please? Many thanks, Tim. -- __ Dr Tim Chatterton Senior Research Fellow Air Quality Management Resource Centre Faculty of Environment and Technology University of the West of England Frenchay Campus Bristol BS16 1QY Tel: 0117 328 2929 Fax: 0117 328 3360 Email: tim.chatter...@uwe.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Re : How to Import Excel file into R 2.9.0 version
Alternatively download the xlsReadWrite package from http://treetron.googlepages.com/ install it an proceed as in older version of R. --- On Tue, 11/8/09, Inchallah Yarab inchallahya...@yahoo.fr wrote: From: Inchallah Yarab inchallahya...@yahoo.fr Subject: [R] Re : How to Import Excel file into R 2.9.0 version To: r-help@r-project.org Received: Tuesday, 11 August, 2009, 10:32 PM i advice you to save the file under csv and then use read.csv2(c:/file.csv,sep=,) hope that helps!!! inchallahyarab 2009/8/11 rajclinasia r...@clinasia.com Hi Every one, I have a problem with Reading Excel file into R 2.9.0 version. In older versions it is working with xlsReadWrite package. But in 2.9.0 version there is no package like that. so help me out in this aspect. Thanks in Advance. -- View this message in context: http://www.nabble.com/How-to-Import-Excel-file-into-R-2.9.0-version-tp24914638p24914638.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] -Inline Attachment Follows- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matrix Integral
Hi, Is your matrix K symmetric? If yes, there is an analytical solution. --- On Sat, 1/8/09, nhawrylyshyn nichlas.hawrylys...@gmail.com wrote: From: nhawrylyshyn nichlas.hawrylys...@gmail.com Subject: [R] Matrix Integral To: r-help@r-project.org Received: Saturday, 1 August, 2009, 12:15 AM Hi, Any help on this would be appreciated: I need to integrate where K is a 4x4 matrix, and SIGMA is a 4x4 matrix from say a to b, i.e. 0 to 5: integral MatrixExp(-K * s) %*% SIGMA %*% t(SIGMA) %*% MatrixExp(t(-K) s) ds t is tranpose , %*% : matrix mult , MatrixExp : matrix exponential I've use integrate before on univariate functions like f(x) = x^2 which is fine but when doing this on a matrix I run into problems. All I intuitively need to do is do this element by element. Thanks, NH. -- View this message in context: http://www.nabble.com/Matrix-Integral-tp24757170p24757170.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling of non-overlapping intervals of variable length
Another possibility, if the total length of your intervals is small in comparison to the big interval is to choose the starting points of all your intervals randomly and to dismiss the entire set if some of the intervals overlap. Most probably you will not have too many such cases (assuming, as stated above, that the total length of all the intervals is a small proportion of the length of the big interval). --- On Mon, 20/7/09, Hadassa Brunschwig hadassa.brunsch...@mail.huji.ac.il wrote: From: Hadassa Brunschwig hadassa.brunsch...@mail.huji.ac.il Subject: Re: [R] Sampling of non-overlapping intervals of variable length To: Charles C. Berry cbe...@tajo.ucsd.edu Cc: r-help@r-project.org Received: Monday, 20 July, 2009, 3:08 PM Thanks Chuck. Ups, did not think of the problem in that way. That did exactly what I needed. I have another complication to this problem: I do not only have one vector of 1:1e^6 but several vectors of different length, say 5. Initially, my intervals are distributed over those 5 vectors and the ranges of those 5 vectors in a specific way (and you might have guessed by now that I would like to do something like a permutation test). Because I have this additional level, I guess I could do something like: 1)Sample the 5 vectors with probabilities proportional to the frequencies of the intial intervals on these vectors. 2)For each sampled vector: apply Chucks solution. ? Thanks a lot. Hadassa On Sun, Jul 19, 2009 at 11:13 PM, Charles C. Berrycbe...@tajo.ucsd.edu wrote: On Sun, 19 Jul 2009, Hadassa Brunschwig wrote: Hi I am not sure what you mean by sampling an index of a group of intervals. I will try to give an example: Let's assume I have a vector 1:100. Let's say I have 10 intervals of different but known length, say, c(4,6,11,2,8,14,7,2,18,32). For simulation purposes I have to sample those 10 intervals 1000 times. The requirement is, however, that they should be of those lengths and should not be overlapping. In short, I would like to obtain a 10x1000 matrix with sampled intervals. Something like this: lens - c(4,6,11,2,8,14,7,2,18,32) perm.lens - sample(lens) sort(sample(1e06-sum(lens)+length(lens),length(lens)))+cumsum(c(0,head(perm.lens,-1))) [1] 15424 261927 430276 445976 451069 546578 656123 890494 939714 969643 The vector above gives the starting points for the intervals whose lengths are perm.lens. I'd bet every introductory combinatorics book has a problem or example in which the expression for the number of ways in which K ordered objects can be assigned to I groups consisting of n_i adjacent objects each is constructed. The construction is along the lines of the calculation above. HTH, Chuck Thanks Hadassa On Sun, Jul 19, 2009 at 9:48 PM, David Winsemiusdwinsem...@comcast.net wrote: On Jul 19, 2009, at 1:05 PM, Hadassa Brunschwig wrote: Hi, I hope I am not repeating a question which has been posed already. I am trying to do the following in the most efficient way: I would like to sample from a finite (large) set of integers n non-overlapping intervals, where each interval i has a different, set length L_i (which is the number of integers in the interval). I had the idea to sample recursively on a vector with the already chosen intervals discarded but that seems to be too complicated. It might be ridiculously easy if you sampled on an index of a group of intervals. Why not pose the question in the form of example data.frames or other classes of R objects? Specification of the desired output would be essential. I think further specification of the sampling strategy would also help because I am unable to understand what sort of probability model you are hoping to apply. Any suggestions on that? Thanks a lot. Hadassa -- Hadassa Brunschwig PhD Student Department of Statistics David Winsemius, MD Heritage Laboratories West Hartford, CT -- Hadassa Brunschwig PhD Student Department of Statistics The Hebrew University of Jerusalem http://www.stat.huji.ac.il __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 -- Hadassa Brunschwig PhD Student Department of Statistics The Hebrew University of Jerusalem http://www.stat.huji.ac.il __
Re: [R] searching for elements
?outer --- On Thu, 16/7/09, Chyden Finance fina...@chyden.net wrote: From: Chyden Finance fina...@chyden.net Subject: [R] searching for elements To: r-help@r-project.org Received: Thursday, 16 July, 2009, 3:00 AM Hello! For the past three years, I have been using R extensively in my PhD program in Finance for statistical work. Normally, I can figure this kind of thing out on my own, but I am completely stumped as to why the following code does not work. I have two variables: sigs and p0_recent. dim(sigs) = 296 3 dim(p0_recent) = 504 7 I want to compare each element in the first column of sigs with the elements in the first column of p0_recent. In other words, does sigs[i,1] == p0_recent[j,1], where i = 1:dim(sigs)[1] and j = 1:dim(p0_recent)[1]. I've been trying: for (j in 1:dim(p0_recent)[1]) { + for (i in 1:dim(sigs)[1]) { + if (sigs[i,1] == p0_recent[j,1]) { + print(sigs[i,1]) + }}} But, I get: Error in Ops.factor(sigs[i, 1], p0_recent[j, 1]) : level sets of factors are different Is there a better way than for loops to compare each element in one column to each element in another column of different length? If not, can anyone suggest a solution? CWE __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nested for loops
Make it for (i in 1:9) This is not the general solution, but in your case when i=10 you do not want to do anything. --- On Tue, 14/7/09, Michael Knudsen micknud...@gmail.com wrote: From: Michael Knudsen micknud...@gmail.com Subject: [R] Nested for loops To: r-help@r-project.org Received: Tuesday, 14 July, 2009, 3:38 PM Hi, I have spent some time locating a quite subtle (at least in my opinion) bug in my code. I want two nested for loops traversing the above-diagonal part of a square matrix. In pseudo code it would something like for i = 1 to 10 { for j = i+1 to 10 { // do something } } However, trying to do the same in R, my first try was for (i in 1:10) { for (j in (i+1):10) { // do something } } but there's a problem here. For i=10, the last for loop is over 11:10. Usually programming laguages would regard what corresponds to 11:10 as empty, but A:B with A bigger than B is in R interpreted as the numbers from B to A in reverse order. Is there a clever way to make nested loops like the one above in R? -- Michael Knudsen micknud...@gmail.com http://lifeofknudsen.blogspot.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ifultools on ppc debian
Hi Stephen, The error message clearly says what is wrong. Big Endian and Little Endian are two ways of storing data (mostly often double precision numbers) in memory. A double precision number occupies two blocks of 4 bytes each. On Big Endian machines (most machines which are not Intel) if the memory address of a double precision number is X then it's sign, exponent and more significant part of the mantissa start at address X and the less significant part of the mantissa starts at X+4, while on Little Endian machines (x86 and alike) the less significant part of the mantissa starts at X and the sign, exponent and most significant part of the mantissa start at X+4. In order to get better performance some programs directly manipulate the bit pattern of the number and to do this correctly they must know whether they are run on a Big Endian or Little Endian machine. So your program is apparently looking at /usr/include/bits/endian.h to check this. You should edit this file and comment (or #undefine) the Endian which is NOT what you got. Good luck! Moshe. --- On Wed, 15/7/09, stephen sefick ssef...@gmail.com wrote: From: stephen sefick ssef...@gmail.com Subject: [R] ifultools on ppc debian To: r-help@r-project.org Received: Wednesday, 15 July, 2009, 12:04 PM I have tried to compile this from source. I don't know what Endianess is, but it is probably not debian power pc. Am I would of luck with this package? Stephen Sefick * Installing *source* package ‘ifultools’ ... ** libs gcc -std=gnu99 -I/usr/local/lib/R/include -I../inst/include/ -DMUTIL_STATIC -DDEF_TF -DINTERRUPT_ENABLE -DNDEBUG -DUSE_RINTERNALS -I/usr/local/include -fpic -g -O2 -c RS_fra_dim.c -o RS_fra_dim.o In file included from /usr/include/endian.h:37, from /usr/include/sys/types.h:217, from /usr/include/stdlib.h:320, from /usr/local/lib/R/include/R.h:28, from ../inst/include/ut_type.h:18, from ../inst/include/fra_dim.h:9, from RS_fra_dim.c:20: /usr/include/bits/endian.h:27:4: error: #error Both BIG_ENDIAN and LITTLE_ENDIAN defined! make: *** [RS_fra_dim.o] Error 1 ERROR: compilation failed for package ‘ifultools’ * Removing ‘/home/ssefick/ifultools.Rcheck/ifultools’ -- Stephen Sefick Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping data in dataframe
Try ?aggregate --- On Wed, 15/7/09, Timo Schneider timo.schnei...@s2004.tu-chemnitz.de wrote: From: Timo Schneider timo.schnei...@s2004.tu-chemnitz.de Subject: [R] Grouping data in dataframe To: r-help@r-project.org r-help@r-project.org Received: Wednesday, 15 July, 2009, 1:56 PM Hello, I have a dataframe (obtained from read.table()) which looks like ExpA ExpB ExpC Size 1 12 23 33 1 2 12 24 29 1 3 10 22 34 1 4 25 50 60 2 5 24 53 62 2 6 21 49 61 2 now I want to take all rows that have the same value in the Size column and apply a function to the columns of these rows (for example median()). The result should be a new dataframe with the medians of the groups, like this: ExpA ExpB ExpC Size 1 12 23 34 1 2 24 50 61 2 I tried to play with the functions by() and tapply() but I didn't get the results I wanted so far, so any help on this would be great! The reason why I am having this problem: (I explain this just to make sure I don't do something against the nature of R.) I am doing 3 simillar experiments, A,B,C and I change a parameter in the experiment (size). Every experiment is done multiple times and I need the median or average over all experiments that are the same. Should I preprocess my data files so that they are completely different? Or is it fine the way it is and I just overlooked the simple solution to the problem described above? Regards, Timo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] averaging two matrices whilst ignoring missing values
One (awkward) way to do this is: x - matrix(c(c(test),c(test2)),ncol=2) y - rowMeans(x,na.rm=TRUE) testave - matrix(y,nrow=nrow(test)) --- On Tue, 14/7/09, Tish Robertson tishrobert...@hotmail.com wrote: From: Tish Robertson tishrobert...@hotmail.com Subject: [R] averaging two matrices whilst ignoring missing values To: r-help@r-project.org Received: Tuesday, 14 July, 2009, 11:46 AM Hi folks, I'm trying to do something that seems like it should easy, but it apparently isn't. I have two large matrices, both containing a number of missing values at different cells. I would like to average these matrices, but the NAs are preventing me. I get a non-numeric argument to binary operator error. That's the first problem. test-read.csv(test.csv,header=FALSE) test2-read.csv(test2.csv,header=FALSE) test_ - as.matrix(test,na.rm=T) test2_ - as.matrix(test2,na.rm=T) testave- (test_+test2_)/2 ?? So off the bat I'm doing something wrong. How would I replace the missing values in one matrix with the corresponding non-missing values in another? It's acceptable to me if I only have one value representing the average for a particular coordinate. Any help would be appreciated! _ Bing™ finds low fares by predicting when to book. Try it now. =WLHMTAGcrea=TXT_MTRHPG_Travel_Travel_TravelDeals_1x1 [[alternative HTML version deleted]] -Inline Attachment Follows- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to keep row name if there is only one row selected from a data frame
Try A[1,,drop=FALSE] - see help(\[) --- On Mon, 13/7/09, Weiwei Shi helprh...@gmail.com wrote: From: Weiwei Shi helprh...@gmail.com Subject: [R] how to keep row name if there is only one row selected from a data frame To: r-h...@stat.math.ethz.ch r-h...@stat.math.ethz.ch Received: Monday, 13 July, 2009, 1:55 PM Hi, there: Assume I have a dataframe with rownames like A with rownames like a to e, A [,1] [,2] a 1 6 b 2 7 c 3 8 d 4 9 e 5 10 when I use A[1,], I lost the rowname for it, like below. How could I keep it? Is there an easy way instead that I have to modify by myself after I used A[1,] manually. A[1,] [1] 1 6 Thanks, W. -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting a column name in loop?
If df is your dataframe then names(df) contains the column names and so names(df)[i] is the name of i-th column. --- On Thu, 9/7/09, mister_bluesman mister_blues...@hotmail.com wrote: From: mister_bluesman mister_blues...@hotmail.com Subject: [R] Extracting a column name in loop? To: r-help@r-project.org Received: Thursday, 9 July, 2009, 1:41 AM Hi, I am writing a script that will address columns using syntax like: data_set[,1] to extract the data from the first column of my data set, for example. This code will be placed in a loop (where the column reference will be placed by a variable). What I also need to do is extract the column NAME for a given column being processed in the loop. The dataframe has been set so that R knows that the top line refers to column headers. Can anyone help me understand how to do this? Thanks. -- View this message in context: http://www.nabble.com/Extracting-a-column-name-in-loop--tp24393160p24393160.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] print() to file?
One possibility is to use sink (see ?sink). --- On Thu, 9/7/09, Steve Jaffe sja...@riskspan.com wrote: From: Steve Jaffe sja...@riskspan.com Subject: [R] print() to file? To: r-help@r-project.org Received: Thursday, 9 July, 2009, 5:03 AM I'd like to write some objects (eg arrays) to a log file. cat() flattens them out. I'd like them formatted as in 'print' but print only writes to stdout. Is there a simple way to achieve this result? Thanks -- View this message in context: http://www.nabble.com/print%28%29-to-file--tp24397445p24397445.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Substituting numerical values using `apply'
Let M be your matrix. Do the following: B - t(matrix(colnames(a),nrow=ncol(M),ncol=nrow(M))) B[M==0] - NA --- On Thu, 9/7/09, Olivella olive...@wustl.edu wrote: From: Olivella olive...@wustl.edu Subject: [R] Substituting numerical values using `apply' To: r-help@r-project.org Received: Thursday, 9 July, 2009, 6:25 AM Hello, I wish to perform a substitution of certain numerical values in a data matrix with the corresponding column name. For instance, if I have a data matrix V1 V2 V3 2 0 1 0 1 2 1 5 0 5 0 0 I want to substitute the `1' and the `2' for the corresponding column name, and make everything else `NA' like this V1 V2 V3 V1 NA V3 NA V2 V3 V1 NA NA NA NA NA I have done this using an explicit `for' loop, but it takes a really long time to finish. Is there any way I can do this using `apply' or some form of implicit looping? Thank you for your help, SO -- View this message in context: http://www.nabble.com/Substituting-numerical-values-using-%60apply%27-tp24398687p24398687.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] naming of columns in R dataframe consisting of mixed data (alphanumeric and numeric)
Hi Mary, Your data.frame has just one column (not 2)! You can check this by dim(tresult2). What appears to you to be the first column (names) are indeed rownames. If you really want to have two columns do something like tresult2 - cbind(colnames(tresult),data.frame(t(tresult),row.names=NULL)) Now tresult2 contains 2 columns and you can proceed with names(tresults2)-c(Statistic , Value) --- On Fri, 10/7/09, Mary A. Marion mms...@comcast.net wrote: From: Mary A. Marion mms...@comcast.net Subject: [R] naming of columns in R dataframe consisting of mixed data (alphanumeric and numeric) To: r-help@r-project.org Received: Friday, 10 July, 2009, 3:50 AM Hello, I have an r function that creates the following dataframe tresults2. Notice that column 1 does not have a column heading. Tresults2: [,1] estparam 18.0 nullval 20.0 . . . ciWidth 2.04622 HalfInterval 1.02311 pertinent code: results-cbind( estparam, nullval, t, pv_left, pv_right, pv_two_t, estse, df, cc, tbox, llim, ulim, ciWidth, HalfInterval) tresults-round((results),5) tresults2-data.frame(t(tresults)) names(tresults2)-c(Statistic , Value) tresults2 === After transposing my dataframe tresults2 consists of 2 columns. Col1=alphanumeric data (this really is a variable name) and col2=numeric data (this is value of varaiable). how do I name columns when columns are different (alphanumeric and numeric) to avoid the following error: Error in names(tresults2) - c(Statistic , Value) : 'names' attribute [2] must be the same length as the vector [1] Am I to use c( , ) or list( , ) with a dataframe? Thank you for your help. Sincerely, Mary A. Marion [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting the number of cycles in a temperature test
Hi Antje, Are your measurements taken every minute (i.e. 30 minutes correspond to 30 consecutive values)? How fast is your transition? If you had 30 minures of upper temperature, then 1000 minutes of room temperature and then 30 minutes of lower temperature - would you count this as a cycle? Can a cycle be 30 minutes of lower temperature followed by 30 minutes of upper temperature? --- On Mon, 6/7/09, Steller, Antje (GQL-LM) antje.stel...@volkswagen.de wrote: From: Steller, Antje (GQL-LM) antje.stel...@volkswagen.de Subject: [R] Counting the number of cycles in a temperature test To: r-help@r-project.org Received: Monday, 6 July, 2009, 9:14 PM Hello dear R-users, today I have a question that I completely do not know how to solve (R-newbie!). In a temperature chamber I have measured temperature over time. The result is shown in the attached eps-file (if attachments are shown): There are two temperature levels, 150°C and -40°C. A complete cycle includes about 30 minutes upper temperature, quick change to the lower level, 30 minutes hold at lower level. Then the temperature rises again to the upper level and a new cycle begins. About 500 cycles have been completed. How can I count the number of cycles that have been completed? The data does not only include perfect temperature cycles. Once in a while the machine would stand still and for a day or so the temperature would remain at room temperature. So I cannot simply divide the measured time by the duration of a cycle... Thanks a lot for your support, if necessary I could also provide the dataset. Antje Temperaturzyklen.eps -Inline Attachment Follows- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Uncorrelated random vectors
As mentioned by somebody before, there is no problem for the normal case - use mvrnorm function from MASS package with any mu and make Sigma be any diagonal matrix (with strictly positive diagonal). Note that even though all the correlations are 0, the SAMPLE correlations won't be 0. If you want to create a set of vectors whose SAMPLE correlations are 0 you will have to use a variant of Gramm-Schmidt. I do not know whether a variant of mvrnorm exists for logistic distribution (my guess is that it does not). --- On Tue, 7/7/09, Stein, Luba (AIM SE) luba.st...@allianz.com wrote: From: Stein, Luba (AIM SE) luba.st...@allianz.com Subject: [R] Uncorrelated random vectors To: r-help@r-project.org r-help@r-project.org Received: Tuesday, 7 July, 2009, 11:45 PM Hello, is it possible to create two uncorrelated random vectors for a given distribution. In fact, I would like to have something like the function rnorm or rlogis with the extra property that they are uncorrelated. Thanks for your help, Luba [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a really simple question on polynomial multiplication
One way is to use convolution (?convolve): If A(x) = a_p*x^p + ... + a_1*x + a_0 and B(x) = b_q*x^q + ... + b_1*x + b_0 and if C(x) = A(x)*B(x) = c_(p+q)*x^(p+q) + ... + c_0 then c = convolve(a,rev(b),type=open) where c is the vector (c_(p+q),...,c_0), a is (a_p,...,a_0) and b is (b_q,...,b_0). In your case: a - c(1,-3) b - c(1,3) c - convolve(a,rev(b),type=open) c [1] 1.0e+00 -6.49654e-16 -9.0e+00 Note that the coefficient of x is not exactly 0 due to the use of fft to compute convolutions. --- On Thu, 16/10/08, Erin Hodgess [EMAIL PROTECTED] wrote: From: Erin Hodgess [EMAIL PROTECTED] Subject: [R] a really simple question on polynomial multiplication To: R Help r-help@r-project.org Received: Thursday, 16 October, 2008, 8:44 AM Dear R people: Is there a way to perform simple polynomial multiplication; that is, something like (x - 3) * (x + 3) = x^2 - 9, please? I looked in poly and polyroot and expression. There used to be a package that had this, maybe? thanks, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] runs of heads when flipping a coin
First of all, we must define what is a run of length r: is it a tail, then EXACTLY r heads and a tail again or is it AT LEAST r heads. Let's assume that we are looking for a run of EXACTLY r heads (and we toss the coin n times). Let X[1],X[2],...,X[n-r+1] be random variables such that Xi = 1 if there is a run of r heads starting at place i and 0 otherwise. Then the number of runs of length r is just sum(X), so the expected number of runs of length r is sum(E(X)) = sum(P(X[i]=1)). Now, for i=2,3,...,n-r P(X[i] = 1) = (1-h)*h^r*(1-h) and also P(X[1] = 1) = h^r*(1-h) and P(X[n-r+1] = 1) = (1-h)*h^r, so that the expected number of runs of length r is (1-h)*h^r*(2 + (n-r-1)*(1-h)) As to the distribution of the number of such runs, this is a much more difficult question (as mentioned by some other people). --- On Fri, 10/10/08, Harvey [EMAIL PROTECTED] wrote: From: Harvey [EMAIL PROTECTED] Subject: [R] runs of heads when flipping a coin To: r-help@r-project.org Received: Friday, 10 October, 2008, 4:16 AM Can someone recommend a method to answer the following type of question: Suppose I have a coin with a probability hhh of coming up heads (and 1-hhh of coming up tails) I plan on flipping the coin nnn times (for example, nnn = 500) What is the expected probability or frequency of a run of rrr heads* during the nnn=500 coin flips? Moreover, I would probably (excuse the pun) want the answer for a range of rrr values, for example rrr = 0:50 Of course I am more interested in an analytical solution than a monte carlo simulation solution. Thanks in advance, Harvey * OR a run of rrr heads or more ... [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ordering problem
Try AA - apply(A,1,function(x) paste(x,collapse=)) and work with AA. --- On Tue, 30/9/08, Jose Luis Aznarte M. [EMAIL PROTECTED] wrote: From: Jose Luis Aznarte M. [EMAIL PROTECTED] Subject: [R] ordering problem To: [EMAIL PROTECTED] Received: Tuesday, 30 September, 2008, 8:43 PM Hi there! I need some assistance here with vector orderings. I have a set of q vectors of length p, grouped by rows in a matrix A, q·p, that I need to order lexicographically (http://en.wikipedia.org/wiki/Lexicographical_order). I also have another matrix B, p·r, and a vector c, that should be ordered according to the order of A. So far, I was doing ordering - apply(A, 2, order)[,1] A - A[ordering,] B - B[ordering,] c - c[ordering] But now I realize that this way I'm ordering by considering only the first dimension of the vectors in A, i.e., not considering the case where there are ties amongst this first dimension. Does anyone have a clue about properly applying the lexicographical ordering? Thanks in advance! -- -- Jose Luis Aznarte M. http://decsai.ugr.es/~jlaznarte Department of Computer Science and Artificial Intelligence Universidad de Granada Tel. +34 - 958 - 24 04 67 GRANADA (Spain) Fax: +34 - 958 - 24 00 79 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] design question on piping multiple data sets from 1 file into R
I think that you can use read.csv with nrows and skip arguments (see ?read.table). --- On Mon, 22/9/08, DS [EMAIL PROTECTED] wrote: From: DS [EMAIL PROTECTED] Subject: [R] design question on piping multiple data sets from 1 file into R To: r-help@r-project.org Received: Monday, 22 September, 2008, 8:06 AM Hi, I have some queries that I use to get time series information for 8 seperate queries which deal with a different set of time series each. I take my queries run them and save the output as csv file and them format the data into graphs in excel. I wanted to know if there is an elegant and clean way to read in 1 csv file but to read the seperate matrices on different rows into seperate R data objects. if this is easy then I can read the 8 datasets in the csv file into 8 r objects and pipe them to time series objects for graphs. thanks Dhruv Email Fax It's easy to receive faxes via email. Click now to find out how! http://tagline.excite.com/fc/JkJQPTgLMRGrZRz1SpXTBEyJ7zsqYo4Wrxjvd4ml8SSHhbc6NzbNSo/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sort a data matrix by all the values and keep the names
One possibility is: x - data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2)) names - t(matrix(rep(names(x),times=nrow(x)),nrow=ncol(x))) m - as.matrix(x) ind - order(m) df - data.frame(name=names[ind],value=m[ind]) df name value 1 x1 1 2 x3 2 3 x2 4 4 x2 6 5 x1 7 6 x3 8 --- On Tue, 23/9/08, zhihuali [EMAIL PROTECTED] wrote: From: zhihuali [EMAIL PROTECTED] Subject: [R] sort a data matrix by all the values and keep the names To: [EMAIL PROTECTED] Received: Tuesday, 23 September, 2008, 9:54 AM Dear all, If I have a data frame x-data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2)): x1 x2 x3 1 4 8 7 6 2 I want to sort the whole data and get this: x1 1 x3 2 x2 4 x2 6 x1 7 x3 8 If I do sort(X), R reports: Error in order(list(x1 = c(1, 7), x2 = c(4, 6), x3 = c(8, 2)), decreasing = FALSE) : unimplemented type 'list' in 'orderVector1' The only way I can sort all the data is by converting it to a matrix: sort(as.matrix(x)) [1] 1 2 4 6 7 8 But now I lost all the names attributes. Is it possible to sort a data frame and keep all the names? Thanks! Zhihua Li _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] perl expression question
Hi Mark, stock-/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE gsub(.*/([^/]+)$, \\1,stock) [1] BLC.NYSE --- On Tue, 23/9/08, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: From: [EMAIL PROTECTED] [EMAIL PROTECTED] Subject: [R] perl expression question To: r-help@r-project.org Received: Tuesday, 23 September, 2008, 10:29 AM If I have the string below. does someone know a regular expression to just get the BLC.NYSE. I bought the O'Reilley book and read it when I can and I study the solutions on the list but I'm still not self sufficient with these things. Thanks. stock-/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help on sampling from the truncated normal/gamma distribution on the far end (probability is very low)
Hi Sonia, If I did not make a mistake, the conditional distribution of X given that X 0 is very close to exponential distribution with parameter lambda = 40, so you can sample from this distribution. --- On Mon, 15/9/08, Daniel Davis [EMAIL PROTECTED] wrote: From: Daniel Davis [EMAIL PROTECTED] Subject: [R] help on sampling from the truncated normal/gamma distribution on the far end (probability is very low) To: r-help@r-project.org Received: Monday, 15 September, 2008, 2:28 PM Hi, guys, I am trying to sample from a truncated normal/gamma distribution. But only the far end of the distribution (where the probability is very low) is left. e.g. mu = - 4; sigma = 0.1; The distribution is Normal(mu,sigma^2) truncated on [0,+Inf]; How can I get a sample? I tried to use inverse CDF method, but got Inf as answers. Please help me out. Also, pls help me on the similar situation on gamma dist'n. Thanks, Sonia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help on sampling from the truncated normal/gamma distribution on the far end (probability is very low)
Well, I made a mistake - your lambda should be 400 and not 40!!! --- On Thu, 18/9/08, Moshe Olshansky [EMAIL PROTECTED] wrote: From: Moshe Olshansky [EMAIL PROTECTED] Subject: Re: [R] help on sampling from the truncated normal/gamma distribution on the far end (probability is very low) To: r-help@r-project.org, Daniel Davis [EMAIL PROTECTED] Received: Thursday, 18 September, 2008, 5:00 PM Hi Sonia, If I did not make a mistake, the conditional distribution of X given that X 0 is very close to exponential distribution with parameter lambda = 40, so you can sample from this distribution. --- On Mon, 15/9/08, Daniel Davis [EMAIL PROTECTED] wrote: From: Daniel Davis [EMAIL PROTECTED] Subject: [R] help on sampling from the truncated normal/gamma distribution on the far end (probability is very low) To: r-help@r-project.org Received: Monday, 15 September, 2008, 2:28 PM Hi, guys, I am trying to sample from a truncated normal/gamma distribution. But only the far end of the distribution (where the probability is very low) is left. e.g. mu = - 4; sigma = 0.1; The distribution is Normal(mu,sigma^2) truncated on [0,+Inf]; How can I get a sample? I tried to use inverse CDF method, but got Inf as answers. Please help me out. Also, pls help me on the similar situation on gamma dist'n. Thanks, Sonia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] inserting values for null
Hi Ramya, Assuming that the problem is well defined (i.e. the values in col1 of the data.frames are unique and every value in D.F.sub.2[,1] appears also in D.F1[,1]) you can do the following: ind - match(D.F.sub.2[,1],D.F1[,1]) D.F1[ind,] - D.F.sub.2 --- On Thu, 18/9/08, Rajasekaramya [EMAIL PROTECTED] wrote: From: Rajasekaramya [EMAIL PROTECTED] Subject: [R] inserting values for null To: r-help@r-project.org Received: Thursday, 18 September, 2008, 2:13 AM I have a dataframe D.F1 dim (D.F1) 14351 9 This dataframe has values and for some 1000 rows it holds NULL values.I hace found the missing values for about 500 and have those in another dataframe D.F.sub.2 dim(D.F.sub.2) 500 9 as dataframe is a subset of D.F1 the coulmn 1 in D.F.sub.2 is a subset of D.F1.I have to insert the values in D.F1 in other fields while the coulmn 1 in both the main table and sub table matches. Ih short i have to insert a dataframe (D.F.sub.2) into the main D.F1 using the column1 that hold similar values. Kindly plz help me. Ramya -- View this message in context: http://www.nabble.com/inserting-values-for-null-tp19535742p19535742.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] database table merging tips with R
One possibility is as follows: If r$userid is your array of (2000) ID's then s - paste(r$userid,sep=,) s- paste(select t.userid, x, y, z from largetable t where t.serid in (,s,),sep=) and finally d - sqlQuery(connection,s) Regards, Moshe. --- On Fri, 12/9/08, Avram Aelony [EMAIL PROTECTED] wrote: From: Avram Aelony [EMAIL PROTECTED] Subject: [R] database table merging tips with R To: [EMAIL PROTECTED] Received: Friday, 12 September, 2008, 4:33 AM Dear R list, What is the best way to efficiently marry an R dataset with a very large (Oracle) database table? The goal is to only return Oracle table rows that match IDs present in the R dataset. I have an R data frame with 2000 user IDs analogous to: r = data.frame(userid=round(runif(2000)*10,0)) ...and I need to pull data from an Oracle table only for these 2000 IDs. The Oracle table is quite large. Additionally, the sql query may need to join to other tables to bring in ancillary fields. I currently connect to Oracle via odbc: library(RODBC) connection - odbcConnect(, uid=, pwd=) d = sqlQuery(connection, select userid, x, y, z from largetable where timestamp sysdate -7) ...allowing me to pull data from the database table into the R object d and then use the R merge function. The problem however is that if d is too large it may fail due to memory limitations or be inefficient. I would like to push the merge portion to the database and it would be very convenient if it were possible to request that the query look to the R object for the ID's to which it should restrict the output. Is there a way to do this? Something like the following fictional code: d = sqlQuery(connection, select t.userid, x, y, z from largetable t where r$userid=t.userid) Would sqldf (http://code.google.com/p/sqldf/) help me out here? If so, how? This would be convenient and help me avoid needing to create a temporary table to store the R data, join via sql, then return the data back to R. I am using R version 2.7.2 (2008-08-25) / i386-pc-mingw32 . Thanks for your comments, ideas, recommendations. -Avram __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] database table merging tips with R
Just a small correction: start with s - paste(r$userid,collapse=,) and not s - paste(r$userid,sep=,) --- On Fri, 12/9/08, Moshe Olshansky [EMAIL PROTECTED] wrote: From: Moshe Olshansky [EMAIL PROTECTED] Subject: Re: [R] database table merging tips with R To: [EMAIL PROTECTED], Avram Aelony [EMAIL PROTECTED] Received: Friday, 12 September, 2008, 8:59 AM One possibility is as follows: If r$userid is your array of (2000) ID's then s - paste(r$userid,sep=,) s- paste(select t.userid, x, y, z from largetable t where t.serid in (,s,),sep=) and finally d - sqlQuery(connection,s) Regards, Moshe. --- On Fri, 12/9/08, Avram Aelony [EMAIL PROTECTED] wrote: From: Avram Aelony [EMAIL PROTECTED] Subject: [R] database table merging tips with R To: [EMAIL PROTECTED] Received: Friday, 12 September, 2008, 4:33 AM Dear R list, What is the best way to efficiently marry an R dataset with a very large (Oracle) database table? The goal is to only return Oracle table rows that match IDs present in the R dataset. I have an R data frame with 2000 user IDs analogous to: r = data.frame(userid=round(runif(2000)*10,0)) ...and I need to pull data from an Oracle table only for these 2000 IDs. The Oracle table is quite large. Additionally, the sql query may need to join to other tables to bring in ancillary fields. I currently connect to Oracle via odbc: library(RODBC) connection - odbcConnect(, uid=, pwd=) d = sqlQuery(connection, select userid, x, y, z from largetable where timestamp sysdate -7) ...allowing me to pull data from the database table into the R object d and then use the R merge function. The problem however is that if d is too large it may fail due to memory limitations or be inefficient. I would like to push the merge portion to the database and it would be very convenient if it were possible to request that the query look to the R object for the ID's to which it should restrict the output. Is there a way to do this? Something like the following fictional code: d = sqlQuery(connection, select t.userid, x, y, z from largetable t where r$userid=t.userid) Would sqldf (http://code.google.com/p/sqldf/) help me out here? If so, how? This would be convenient and help me avoid needing to create a temporary table to store the R data, join via sql, then return the data back to R. I am using R version 2.7.2 (2008-08-25) / i386-pc-mingw32 . Thanks for your comments, ideas, recommendations. -Avram __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] densities with overlapping area of 0.35
Let X be normally distributed with mean 0 and let f be it's density. Now the density of X+a will be f shifted right by a. Since the density is symmetric around mean it follows that the area of overlap of the two densities is exactly P(Xa) + P(X-a). So if X~N(0,1), we want P(Xa) + P(X-a) = 2P(X-a) = 0.35, so P(X-a) = 0.175 which yields -a = qnorm(0.175) = -0.9345893, so a = 0.9345893. --- On Tue, 9/9/08, Lavan [EMAIL PROTECTED] wrote: From: Lavan [EMAIL PROTECTED] Subject: [R] densities with overlapping area of 0.35 To: r-help@r-project.org Received: Tuesday, 9 September, 2008, 12:11 PM Hi, I like to generate two normal densities such that the overlapping area between them is 0.35. Is there any code/package available in R to do that?? Regards, Lavan -- View this message in context: http://www.nabble.com/densities-with-overlapping-area-of-0.35-tp19384741p19384741.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] densities with overlapping area of 0.35
Just a correction: if we take X+2a then everything is OK (the curves intersect at a), so a = 0.9345893 is correct but one must take X ~ N(0,1) and Y ~N(2*a,1). --- On Tue, 9/9/08, Moshe Olshansky [EMAIL PROTECTED] wrote: From: Moshe Olshansky [EMAIL PROTECTED] Subject: Re: [R] densities with overlapping area of 0.35 To: r-help@r-project.org, Lavan [EMAIL PROTECTED] Received: Tuesday, 9 September, 2008, 12:37 PM Let X be normally distributed with mean 0 and let f be it's density. Now the density of X+a will be f shifted right by a. Since the density is symmetric around mean it follows that the area of overlap of the two densities is exactly P(Xa) + P(X-a). So if X~N(0,1), we want P(Xa) + P(X-a) = 2P(X-a) = 0.35, so P(X-a) = 0.175 which yields -a = qnorm(0.175) = -0.9345893, so a = 0.9345893. --- On Tue, 9/9/08, Lavan [EMAIL PROTECTED] wrote: From: Lavan [EMAIL PROTECTED] Subject: [R] densities with overlapping area of 0.35 To: r-help@r-project.org Received: Tuesday, 9 September, 2008, 12:11 PM Hi, I like to generate two normal densities such that the overlapping area between them is 0.35. Is there any code/package available in R to do that?? Regards, Lavan -- View this message in context: http://www.nabble.com/densities-with-overlapping-area-of-0.35-tp19384741p19384741.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] intercept of 3D line? (Orthogonal regression)
I do not see why you can not use regression even in this case. To make things more simple suppose that the exact model is: y = a + b*x, i.e. y1 = a + b*x1 ... yn = a + b*xn But you can not observe y and x. Instead you observe ui = xi + ei (i=1,...,n) and vi = yi + di (i=1,...,n) Now you have vi = yi + di = a + b*xi + di = a + b*(ui - ei) + di = a + b*ui + (di - b*ei) and under regular assumptions about ei's end di's we get a standard regression problem (note that b is unknown to you but is constant). --- On Tue, 2/9/08, William Simpson [EMAIL PROTECTED] wrote: From: William Simpson [EMAIL PROTECTED] Subject: [R] intercept of 3D line? (Orthogonal regression) To: r-help@r-project.org Received: Tuesday, 2 September, 2008, 4:53 AM I posted before recently about fitting 3D data x, y, z where all have error attached. I want to predict z from x and y; something like z = b0 + b1*x + b2*y But multiple regression is not suitable because all of x, y, and z have errors. I have plotted a 3D scatterplot of some data using rgl. I see that the data form a cigar-shaped cloud. I think multiple regression is only suitable when the points fall on a plane (forgetting about the error in x and y). I now know the right way how to find the best fitting plane to x,y,z data using princomp. But a new problem is how to get the best fitting *line*. I actually know how to do that too using princomp. But there is a mathematical problem: there's no way to specify a line in 3D space in the form z=f(x,y) or in other words with an intercept and slopes. Instead, one way to deal with the problem is to use a parametric version of the line: you use an arbitrary starting point x0, y0, z0 and the direction vector of your line (I know how to get the direction vector). BUT how do I get the intercept??? At this point my lines just go through the origin. Do I just use $center from the princomp output modified in some way? Thanks for any help! Cheers Bill __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Integrate a 1-variable function with 1 parameter (Jose L. Romero)
This can be done analytically: after changing a variable (2*t - t) and some scaling we need to compute f(x) = integral from 0 to 20 of (t^x*exp(-t))dt/factorial(x) f(0) = int from 0 to 20 of exp(-t)dt = 1 - exp(-20) and integration by parts yields (for x=1,2,3,...) f(x) = -exp(-20)*20^x/factorial(x) + f(x-1) so that f(x) = 1 - exp(-20)*sum(20^k/factorial(k)) where the sum is for k=0,1,...,x If I did not a mistake, your original quantity should be f(x)/20. --- On Thu, 28/8/08, jose romero [EMAIL PROTECTED] wrote: From: jose romero [EMAIL PROTECTED] Subject: [R] Integrate a 1-variable function with 1 parameter (Jose L. Romero) To: r-help@r-project.org Received: Thursday, 28 August, 2008, 12:23 AM Hey fellas: I would like to integrate the following function: integrand - function (x,t) { exp(-2*t)*(2*t)^x/(10*factorial(x)) } with respect to the t variable, from 0 to 10. The variable x here works as a parameter: I would like to integrate the said function for each value of x in 0,1,..,44. I have tried Vectorize to no avail. Thanks in advance, jose romero __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with Integrate for NEF-HS distribution
If you look at your sech(pi*x/2) you can write it as sech(pi*x/2) = 2*exp(pi*x/2)/(1 + exp(pi*x)) For x -15, exp(pi*x) 10^-20, so for this interval you can replace sech(pi*x/2) by 2*exp(pi*x/2) and so the integral from -Inf to -15 (or even -10 - depends on your accuracy requirements) can be computed analytically, so you are left with integral from -10 (or -15) to your upper bound and this should be all right for numerical integration. --- On Wed, 27/8/08, xia wang [EMAIL PROTECTED] wrote: From: xia wang [EMAIL PROTECTED] Subject: RE: [R] Problem with Integrate for NEF-HS distribution To: [EMAIL PROTECTED], [EMAIL PROTECTED] Received: Wednesday, 27 August, 2008, 12:26 AM Thanks. I revised the code but it doesn't make difference. The cause of the error is just as stated in the ?integrate If the function is approximately constant (in particular, zero) over nearly all its range it is possible that the result and error estimate may be seriously wrong. I have searched R-archive. It may not be feasible to solve the problem by rectangle approximation as some postings suggested because the integration is in fact within MCMC samplings as follows. The lower bound is always -Inf. The upper bound and the value of theta changes for each sampling in MCMC. I tried to multiple the integrand by a large number ( like 10^16) and changes the rel.tol. It does help for some range of theta but not all. Xia like.fun - function(beta,theta) { integrand - function(X,theta) {0.5*cos(theta)*exp(X*theta)*sech(pi*X/2)} upper-x%*%beta prob-matrix(NA, nrow(covariate),1) for (i in 1:nrow(covariate)) {prob[i]-integrate(integrand,lower=-Inf, upper=upper[i],theta=theta)$value } likelihood - sum(log(dbinom(y,n,prob))) return(likelihood) } Date: Tue, 26 Aug 2008 00:49:02 -0700 From: [EMAIL PROTECTED] Subject: Re: [R] Problem with Integrate for NEF-HS distribution To: [EMAIL PROTECTED] Shouldn't you define integrand - function(X,theta) .. and not integrand - function(X) . --- On Tue, 26/8/08, xia wang [EMAIL PROTECTED] wrote: From: xia wang [EMAIL PROTECTED] Subject: [R] Problem with Integrate for NEF-HS distribution To: r-help@r-project.org Received: Tuesday, 26 August, 2008, 3:00 PM I need to calcuate the cumulative probability for the Natural Exponential Family - Hyperbolic secant distribution with a parameter theta between -pi/2 and pi/2. The integration should be between 0 and 1 as it is a probability. The function integrate works fine when the absolute value of theta is not too large. That is, the NEF-HS distribution is not too skewed. However, once the theta gets large in absolute value, such as -1 as shown in the example below, integrate keeps give me error message for non-finite function when I put the lower bound as -Inf. I suspect that it is caused by the very heavy tail of the distribution. Is there any way that I can get around of this and let integrate work for the skewed distribution? Or is there any other function for integrating in R-package? Thanks a lot for your advice in advance! theta--1 sech-function(X) 2/(exp(-X)+exp(X)) integrand - function(X) {0.5*cos(theta)*exp(X*theta)*sech(pi*X/2)} integrate(integrand, -3,1) 0.8134389 with absolute error 7e-09 integrate(integrand, -10,1) 0.9810894 with absolute error 5.9e-06 integrate(integrand, -15,1) 0.9840505 with absolute error 7e-09 integrate(integrand, -50,1) 0.9842315 with absolute error 4.4e-05 integrate(integrand, -100,1) 0.9842315 with absolute error 3.2e-05 integrate(integrand, -Inf,1) Error in integrate(integrand, -Inf, 1) : non-finite function value Xia _ Be the filmmaker you always wanted to be—learn how to burn a DVD with Windows®. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ Be the filmmaker you always wanted to be—learn how to burn a DVD with Windows®. http://clk.atdmt.com/MRT/go/108588797/direct/01/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding a probability
You commands are correct and the interpretation is that the probability that a normal random variable with mean 1454.190 and standard deviation 162.6301 achieves a value of 417 or less is 8.99413e-11 --- On Wed, 27/8/08, rr400 [EMAIL PROTECTED] wrote: From: rr400 [EMAIL PROTECTED] Subject: [R] Finding a probability To: r-help@r-project.org Received: Wednesday, 27 August, 2008, 11:50 AM Hi, i am trying to determine a certain probability and i am unsure if the commands i have entered into R are correct. I am trying to determine how unusual it would be to find something with the value 417 for a set of numbers with mean 1454.190 and standard deviation 162.6301. The command i entered was: pnorm (417, 1454.190, 162.6301) Which returned: [1] 8.99413e-11 I am unsure how to interpret this number as it is, and hence whether i have entered the appropriate commands for what i am trying to find out. If the commands are correct, how can i simplify the answer to understand the probability? Thanks in advance, R. -- View this message in context: http://www.nabble.com/Finding-a-probability-tp19173491p19173491.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Igraph library: How to calculate APSP (shortest path matrix) matrix for a subset list of nodes.
I was too optimistic - the complexity is O(E*log(V)) where V is the number of nodes, but since log(25000) 20 this is still reasonable. --- On Mon, 25/8/08, Moshe Olshansky [EMAIL PROTECTED] wrote: From: Moshe Olshansky [EMAIL PROTECTED] Subject: Re: [R] Igraph library: How to calculate APSP (shortest path matrix) matrix for a subset list of nodes. To: r-help@r-project.org, dinesh kumar [EMAIL PROTECTED] Received: Monday, 25 August, 2008, 3:27 PM As far as I know/remember, if your graph is connected and contains E edges then you can find the shortest distance from any particular vertex to all other vertices in O(E) operations. You can repeat this procedure starting from every node (out of the 500). If you have 100,000 edges this will require about 100,000*500 = 50,000,000 operations (times a small factor) which is very reasonable. --- On Mon, 25/8/08, dinesh kumar [EMAIL PROTECTED] wrote: From: dinesh kumar [EMAIL PROTECTED] Subject: [R] Igraph library: How to calculate APSP (shortest path matrix) matrix for a subset list of nodes. To: r-help@r-project.org Received: Monday, 25 August, 2008, 8:00 AM Dear R Users, I have a network of 25000 total nodes and a list of 500 node which is a subset of all nodes. Now I want to calculate the APSP (all pair shortest path) matrix only for these 500 nodes. I would appreciate any help. Thanks in advance Dinesh -- Dinesh Kumar Barupal Research Associate Metabolomics Fiehn Lab UCD Genome Center 451 East Health Science Drive GBSF Builidng University of California DAVIS 95616 http://fiehnlab.ucdavis.edu/staff/kumar [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] paste: multiple collapse arguments?
One possibility is: y - rep( ,6) y[6] - y[c(2,4)] - \n res - paste(paste(x,y,sep=),collapse=) --- On Tue, 26/8/08, remko duursma [EMAIL PROTECTED] wrote: From: remko duursma [EMAIL PROTECTED] Subject: [R] paste: multiple collapse arguments? To: r-help@r-project.org Received: Tuesday, 26 August, 2008, 9:36 AM Dear R-helpers, I have a numeric vector, like: x - c(1,2,3,4,5,6) I make this into a string for output to a text file, separated by \n: paste(x, collapse=\n) Is there a way to alternate the collapse argument? So between the first two elements of x, I want to separate by , then by \n, and so forth. The result should then look like: 1 2\n3 4\n5 6 (This way I get 2 elements of x on each line using writeLines, instead of one or all). I could do this in some ugly loop, but surely there is a better way? thanks, Remko _ Get thousands of games on your PC, your mobile phone, and the web with Windows®. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] deconvolution: Using the output and a IRF to get the input
Hi Wolf, Without noise you could use FFT, i.e. FFT of a convolution is the product of the individual FFTs and so you get the FFT of your input signal and using inverse FFT you get the signal itself. When there is noise you must experiment. You may want to filter the response before doing FFT. Whay do you know about the noise? Regards, Moshe. --- On Mon, 25/8/08, wolf zinke [EMAIL PROTECTED] wrote: From: wolf zinke [EMAIL PROTECTED] Subject: [R] deconvolution: Using the output and a IRF to get the input To: r-help@r-project.org Received: Monday, 25 August, 2008, 8:22 AM Hi, Maybe someone could give me some pointers for my problem. So far I have not found a good solution, maybe it is just ill posed? I have a signal that is the result of an input signal convolved with a given impulse response function (IRF) plus noise. I want to use the this signal and the IRF to determine the underlying input signal. In my naivety I thought this just might be a deconvolution problem. But here I found only routines that use the input signal and the output signal to get the IRF. Is it possible to derive the input signal when output and IRF are given? If so, how could I do this with R? Thanks a lot for any hints, wolf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Igraph library: How to calculate APSP (shortest path matrix) matrix for a subset list of nodes.
As far as I know/remember, if your graph is connected and contains E edges then you can find the shortest distance from any particular vertex to all other vertices in O(E) operations. You can repeat this procedure starting from every node (out of the 500). If you have 100,000 edges this will require about 100,000*500 = 50,000,000 operations (times a small factor) which is very reasonable. --- On Mon, 25/8/08, dinesh kumar [EMAIL PROTECTED] wrote: From: dinesh kumar [EMAIL PROTECTED] Subject: [R] Igraph library: How to calculate APSP (shortest path matrix) matrix for a subset list of nodes. To: r-help@r-project.org Received: Monday, 25 August, 2008, 8:00 AM Dear R Users, I have a network of 25000 total nodes and a list of 500 node which is a subset of all nodes. Now I want to calculate the APSP (all pair shortest path) matrix only for these 500 nodes. I would appreciate any help. Thanks in advance Dinesh -- Dinesh Kumar Barupal Research Associate Metabolomics Fiehn Lab UCD Genome Center 451 East Health Science Drive GBSF Builidng University of California DAVIS 95616 http://fiehnlab.ucdavis.edu/staff/kumar [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help Regarding 'integrate'
The phenomenon is most likely caused by numerical errors. I do not know how 'integrate' works but numerical integration over a very long interval does not look a good idea to me. I would do the following: f1-function(x){ return(dchisq(x,9,77)*((13.5/x)^5)*exp(-13.5/x)) } f2-function(y){ return(dchisq(1/y,9,77)*((13.5*y)^5)*exp(-13.5*y)/y^2) } # substitute y = 1/x I1 - integrate(f1,0,1,abs.tol=abs.tol=1.0e-20*.Machine$double.eps^0.25) I2 - integrate(f2,0,1,abs.tol=abs.tol=1.0e-20*.Machine$double.eps^0.25) --- On Thu, 21/8/08, A [EMAIL PROTECTED] wrote: From: A [EMAIL PROTECTED] Subject: [R] Help Regarding 'integrate' To: r-help@r-project.org Received: Thursday, 21 August, 2008, 4:44 PM I have an R function defined as: f-function(x){ return(dchisq(x,9,77)*((13.5/x)^5)*exp(-13.5/x)) } Numerically integrating this function, I observed a couple of things: A) Integrating the function f over the entire positive real line gives an error: integrate(f,0,Inf) Error in integrate(f, 0, Inf) : the integral is probably divergent B) Increasing the interval of integration actually decreases the value of the integral: integrate(f,0,10^5) 9.813968e-06 with absolute error 1.9e-05 integrate(f,0,10^6) 4.62233e-319 with absolute error 4.6e-319 Since the function f is uniformly positive, B) can not occur. Also, the theory tells me that the infinite integral actually exists and is finite, so A) can not occur. That means there are certain problems with the usage of function 'integrate' which I do not understand. The help document tells me that 'integrate' uses quadrature approximation to evaluate integrals numerically. Since I do not come from the numerical methods community, I would not know the pros and cons of various methods of quadrature approximation. One naive way that I thought of evaluating the above integral was by first locating the maximum of the function (may be by using 'optimize' in R) and then applying the Simpson's rule to an interval around the maximum. However, I am sure that the people behind the R project and other users have much better ideas, and I am sure the 'correct' method has already been implemented in R. Therefore, I would appreciate if someone can help me find it. Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Null and Alternate hypothesis for Significance test
Hi Nitin, I believe that you can not have null hypothesis to be that A and B come from different distributions. Asymptotically (as both sample sizes go to infinity) KS test has power 1, i.e. it will reject H0:A=B for any case where A and B have different distributions. To work with a finite sample you must be more specific, i.e. your null hypothesis must be not that A and B just have different distributions but must be more specific, i.e that their means are different by at least something or that certain distance between their distributions is bigger than something, etc. and such hypotheses can be tested (and rejected). --- On Fri, 22/8/08, Nitin Agrawal [EMAIL PROTECTED] wrote: From: Nitin Agrawal [EMAIL PROTECTED] Subject: [R] Null and Alternate hypothesis for Significance test To: r-help@r-project.org Received: Friday, 22 August, 2008, 6:58 AM Hi, I had a question about specifying the Null hypothesis in a significance test. Advance apologies if this has already been asked previously or is a naive question. I have two samples A and B, and I want to test whether A and B come from the same distribution. The default Null hypothesis would be H0: A=B But since I am trying to prove that A and B indeed come from the same distribution, I think this is not the right choice for the null hypothesis (it should be one that is set up to be rejected) How do I specify a null hypothesis H0: A not equal to B for say a KS test. An example to do this in R would be greatly appreciated. On a related note: what is a good way to measure the difference between observed and expected PDFs? Is the D statistic of the KS test a good choice? Thanks! Nitin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How I can read the binary file with different type?
Hi Miao, I can write a function which takes an integer and produces a float number whose binary representation equals to that of the integer, but this would be an awkward solution. So if nobody suggests anything better I will write such a function for you, but let's wait for a better solution. --- On Fri, 22/8/08, Bingxiang Miao [EMAIL PROTECTED] wrote: From: Bingxiang Miao [EMAIL PROTECTED] Subject: [R] How I can read the binary file with different type? To: r-help@r-project.org Received: Friday, 22 August, 2008, 11:51 AM Hi all, I have a binary file which have 8*100 bytes. The structure of the file is as follows: every eigth bytes represent 2 data:the first four bytes is the little-endian for integer, the next four bytes is the little-endian for floating32. The structure of the following bytes is as the same as the first eight bytes'. As the function readBin only read the binary file with one structure like this: readBin(my data file,what=integer,size=4,n=200) or readBin(my data file,what=numeric,size=4,n=200).But only one type of the data can be read properly. Can anyone suggest me how should I read this file? Thanks in advance. Miao [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random sequence of days?
How about d[sample(length(d),10)] --- On Wed, 20/8/08, Lauri Nikkinen [EMAIL PROTECTED] wrote: From: Lauri Nikkinen [EMAIL PROTECTED] Subject: [R] Random sequence of days? To: [EMAIL PROTECTED] Received: Wednesday, 20 August, 2008, 4:04 PM Dear list, I tried to find a solution for this problem from the archives but couldn't find any. I would like sample sequence of ten days from vector d d - seq(as.Date(2007-02-12), as.Date(2008-08-18), by=days) so that the days follow each other (sample(d, 10) is not the appropriate solution). Any ideas? Best regards, Lauri __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conversion - lowercase to Uppercase letters
Use toupper or tolower (see ?toupper, ?tolower) --- On Wed, 20/8/08, suman Duvvuru [EMAIL PROTECTED] wrote: From: suman Duvvuru [EMAIL PROTECTED] Subject: [R] Conversion - lowercase to Uppercase letters To: r-help@r-project.org Received: Wednesday, 20 August, 2008, 2:19 PM I would like to know how to convert a string with characters to all uppercase or all lowercase? If anyone could let me know if there exists a function in R for the conversion, that will be very helpful. Regards, Suman [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matrix row product and cumulative product
Hi Jeff, If I understand correctly, the overhead of a loop is that at each iteration the command must be interpreted, and this time is independent of the number of rows N. So if N is small this overhead may be very significant but when N is large this should be very small compared to the time needed to multiply N pairs of numbers. --- On Mon, 18/8/08, Jeff Laake [EMAIL PROTECTED] wrote: From: Jeff Laake [EMAIL PROTECTED] Subject: [R] matrix row product and cumulative product To: r-help@r-project.org Received: Monday, 18 August, 2008, 12:49 PM I spent a lot of time searching and came up empty handed on the following query. Is there an equivalent to rowSums that does product or cumulative product and avoids use of apply or looping? I found a rowProd in a package but it was a convenience function for apply. As part of a likelihood calculation called from optim, I’m computing products and cumulative products of rows of matrices with far more rows than columns. I started with apply and after some thought realized that a loop of columns might be faster and it was substantially faster (see below). Because the likelihood function is called many times I’d like to speed it up even more if possible. Below is an example showing the cumprod.matrix and prod.matrix looping functions that I wrote and some timing comparisons to the use of apply for different column and row dimensions. At this point I’m better off with looping but I’d like to hear of any further suggestions. Thanks –jeff prod.matrix=function(x) + { + y=x[,1] + for(i in 2:dim(x)[2]) + y=y*x[,i] + return(y) + } cumprod.matrix=function(x) + { + y=matrix(1,nrow=dim(x)[1],ncol=dim(x)[2]) + y[,1]=x[,1] + for (i in 2:dim(x)[2]) + y[,i]=y[,i-1]*x[,i] + return(y) + } N=1000 xmat=matrix(runif(N),ncol=10) system.time(cumprod.matrix(xmat)) user system elapsed 1.07 0.09 1.15 system.time(t(apply(xmat,1,cumprod))) user system elapsed 29.27 0.21 29.50 system.time(prod.matrix(xmat)) user system elapsed 0.29 0.00 0.30 system.time(apply(xmat,1,prod)) user system elapsed 30.69 0.00 30.72 xmat=matrix(runif(N),ncol=100) system.time(cumprod.matrix(xmat)) user system elapsed 1.05 0.13 1.18 system.time(t(apply(xmat,1,cumprod))) user system elapsed 3.55 0.14 3.70 system.time(prod.matrix(xmat)) user system elapsed 0.38 0.01 0.39 system.time(apply(xmat,1,prod)) user system elapsed 2.87 0.00 2.89 xmat=matrix(runif(N),ncol=1000) system.time(cumprod.matrix(xmat)) user system elapsed 1.30 0.18 1.46 system.time(t(apply(xmat,1,cumprod))) user system elapsed 1.77 0.27 2.05 system.time(prod.matrix(xmat)) user system elapsed 0.46 0.00 0.47 system.time(apply(xmat,1,prod)) user system elapsed 0.7 0.0 0.7 xmat=matrix(runif(N),ncol=1) system.time(cumprod.matrix(xmat)) user system elapsed 1.28 0.00 1.29 system.time(t(apply(xmat,1,cumprod))) user system elapsed 1.19 0.08 1.26 system.time(prod.matrix(xmat)) user system elapsed 0.40 0.00 0.41 system.time(apply(xmat,1,prod)) user system elapsed 0.57 0.00 0.56 xmat=matrix(runif(N),ncol=10) system.time(cumprod.matrix(xmat)) user system elapsed 3.18 0.00 3.19 system.time(t(apply(xmat,1,cumprod))) user system elapsed 1.42 0.21 1.64 system.time(prod.matrix(xmat)) user system elapsed 1.05 0.00 1.05 system.time(apply(xmat,1,prod)) user system elapsed 0.82 0.00 0.81 R.Version() $platform [1] i386-pc-mingw32 . . . $version.string [1] R version 2.7.1 (2008-06-23) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A doubt about lm and the meaning of its summary
Hi Alberto, Please disregard my previous note - I probably had a black-out!!! --- On Tue, 19/8/08, Alberto Monteiro [EMAIL PROTECTED] wrote: From: Alberto Monteiro [EMAIL PROTECTED] Subject: [R] A doubt about lm and the meaning of its summary To: r-help@r-project.org Received: Tuesday, 19 August, 2008, 4:31 AM I have a conceptual problem (sort of). Maybe there's a simple solution, maybe not. First, let me explain the test case that goes ok. Let x be a (fixed) n-dimensional vector. I simulate a lot of linear models, y = m x + c + error, then I do a lot of regressions. As expected, the estimated values of m (let's call them m.bar) are distributed according to a Student's t distribution. More precisely (test case, it takes a few minutes to run): # # start with fixed numbers # m - sample(c(-1, -0.1, 0.1, 1), 1) c - sample(c(-1, -0.1, 0, 0.1, 1), 1) sigma - sample(c(0.1, 0.2, 0.5, 1), 1) n - sample(c(4,5,10,1000), 1) x - 1:n # it's possible to use other x NN - sample(c(3000, 4000, 5000), 1) m.bar - m.std.error - 0 # these vectors will hold the simulation output # # Now let's repeat NN times: # simulate y # regress y ~ x # store m.bar and its error # for (i in 1:NN) { y - m * x + c + sigma * rnorm(n) reg - lm(y ~ x) m.bar[i] - summary(reg)$coefficients['x', 'Estimate'] m.std.error[i] - summary(reg)$coefficients['x', 'Std. Error'] } # # Finally, let's analyse it # The distribution of (m.bar - m) / m.std.error is # a Student's t with n - 2 degrees of freedom. # Is it? Let's test! # print(ks.test((m.bar - m) / m.std.error, rt(NN, n - 2))) Now, this goes ok, with ks.test returning a number uniformly distributed in the 0-1 interval. Next, I did a slightly different model. This model is a reversion model, where the simulated random variable lp goes along the equation: lp[t + 1] - (1 + m) * lp[t] + c + error I tried to use the same method as above, defining x = lp[1:n], y = lp[2:(n+1}] - lp[1:n], with equation y - m * x + c + error And now it breaks. Test case (it takes some minutes to run): # # start with fixed numbers # m - runif(1, -1, 0) # m must be something in the (-1,0) interval c - sample(c(-1, -0.1, 0, 0.1, 1), 1) sigma - sample(c(0.1, 0.2, 0.5, 1), 1) n - sample(c(4,5,10,1000), 1) NN - sample(c(3000, 4000, 5000), 1) m.bar - m.std.error - 0 # these vectors will hold the simulation output # # Now let's repeat NN times: # simulate y # regress y ~ x # store m.bar and its error # for (i in 1:NN) { lp - 0 lp[1] - 0 for (j in 1:n) { lp[j+1] = (1 + m) * lp[j] + c + sigma * rnorm(1) } x - lp[1:n] y - lp[-1] - x reg - lm(y ~ x) m.bar[i] - summary(reg)$coefficients['x', 'Estimate'] m.std.error[i] - summary(reg)$coefficients['x', 'Std. Error'] } # # Finally, let's analyse it # The distribution of (m.bar - m) / m.std.error should be # a Student's t with n - 2 degrees of freedom. # Is it? Let's test! # print(ks.test((m.bar - m) / m.std.error, rt(NN, n - 2))) ... and now it's not. What's wrong with this? I suspect that the model y ~ x does only give meaningful values when x is deterministic; in this case x is stochastic. Is there any correct way to estimate this model giving meaningful outputs? Alberto Monteiro __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Vectorization of duration of the game in the gambler ruin's problem
Hi Jose, If you are only interested in the expected duration, the problem can be solved analytically - no simulation is needed. Let P be the probability to get total.capital (and then 1-P is the probability to loose all the money) when starting with initial.capital. This probability P is well known (I do not remember it now but I can derive the formula if you need - let me know). Let X(i) be the gain at game i and let D be the duration. Let S(n) = X(1)+...+X(n). Since EX(i) = p - (1-p) = 2p-1, S(n) - n*(2p-1) is a martingale, and since D is a stopping time we get that E(S(D) - (2p-1)*D) = 0, so that (2p-1)*E(D) = E(S(D)) = P*(total.capital-initial.capital) + (1-P)*(-initial.capital), and so E(D) can be computed provided that p != 1/2. If p = 1/2 then S(n) is a martingale and then by Wald's Lemma, E(S(D)^2) = E(D)*E(X^2) = E(D). Since E(S(D)^2) = P*(total.capital-initial.capital)^2 + (1-P)*(-initial.capital)^2, we can compute E(D). Regards, Moshe. --- On Fri, 15/8/08, jose romero [EMAIL PROTECTED] wrote: From: jose romero [EMAIL PROTECTED] Subject: [R] Vectorization of duration of the game in the gambler ruin's problem To: r-help@r-project.org Received: Friday, 15 August, 2008, 2:26 PM Hey fellas: In the context of the gambler's ruin problem, the following R code obtains the mean duration of the game, in turns: # total.capital is a constant, an arbitrary positive integer # initial.capital is a constant, an arbitrary positive integer between, and not including # 0 and total.capital # p is the probability of winning 1$ on each turn # 1-p is the probability of loosing 1$ # N is a large integer representing the number of times to simulate # dur is a vector containing the simulated game durations T - total.capital dur - NULL for (n in 1:N) { x - initial.capital d - 0 while ((x!=0)(x!=T)) { x - x+sample(c(-1,1),1,replace=TRUE,c(1-p,p)) d - d+1 } dur - c(dur,d) } mean(dur) #returns the mean duration of the game The problem with this code is that, using the traditional control structures (while, for, etc.) it is rather slow. Does anyone know of a way i could vectorize the while and the for to produce a faster code? And while I'm at it, does anyone know of a discrete-event simulation package in R such as the SimPy for Python? Thanks in advance [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ignoring zeros or converting to NA
Since 0 can be represented exactly as a floating point number, there is no problem with something like x[x==0]. What you can not rely on is something like 0.1+0.2 == 0.3 to be TRUE. --- On Thu, 14/8/08, Roland Rau [EMAIL PROTECTED] wrote: From: Roland Rau [EMAIL PROTECTED] Subject: Re: [R] ignoring zeros or converting to NA To: rcoder [EMAIL PROTECTED] Cc: r-help@r-project.org Received: Thursday, 14 August, 2008, 1:23 AM Hi, since many suggestions are following the form of x[x==0] (or similar) I would like to ask if this is really recommended? What I have learned (the hard way) is that one should not test for equality of floating point numbers (which is the default for R's numeric values, right?) since the binary representation of these (decimal) floating point numbers is not necessarily exact (with the classic example of decimal 0.1). Is it okay in this case for the value zero where all binary elements are zero? Or does R somehow recognize that it is an integer? Just some questions out of curiosity. Thank you, Roland rcoder wrote: Hi everyone, I have a matrix that has a combination of zeros and NAs. When I perform certain calculations on the matrix, the zeros generate Inf values. Is there a way to either convert the zeros in the matrix to NAs, or only perform the calculations if not zero (i.e. like using something similar to an !all(is.na() construct)? Thanks, rcoder __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] missing TRUE/FALSE error in conditional construct
The problem is that if x is either NA or NaN then x != 0 is NA (and not FALSE or TRUE) and the function is.nan tests for a NaN but not for NA, i.e. is.nan(NA) returns FALSE. You can do something like: mat_zeroless[!is.na(mat) mat != 0] - mat[!is.na(mat) mat != 0] --- On Thu, 14/8/08, rcoder [EMAIL PROTECTED] wrote: From: rcoder [EMAIL PROTECTED] Subject: [R] missing TRUE/FALSE error in conditional construct To: r-help@r-project.org Received: Thursday, 14 August, 2008, 8:16 AM Hi everyone, I posted something similar to this in reply to another post, but there seems to be a problem getting it onto the board, so I'm starting a new post. I am trying to use conditional formatting to select non-zero and non-NaN values from a matrix and pass them into another matrix. The problem is that I keep encountering an error message indicating the :missing value where TRUE/FALSE needed My code is as follows: ##Code Start mat_zeroless-matrix(NA,5000,2000) #generating holding matrix ##Assume source matrix containing combination of values, NaNs and zeros## for (j in 1:5000) { for (k in 1:2000) { if(mat[j,k]!=0 !is.NaN(mat[j,k])) {mat_zeroless[j,k]-mat[j,k]} } } ##Code End Error in if (mat[j,k] !=0 !is.NaN(mat[j,k])) { :missing value where TRUE/FALSE needed I'm not sure how to resolve this. rcoder -- View this message in context: http://www.nabble.com/missing-TRUE-FALSE-error-in-conditional-construct-tp18972244p18972244.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Covariance matrix
Just interchange rows 2 and 3 and then columns 2 and 3 of the original covariance matrix. --- On Fri, 8/8/08, Zhang Yanwei - Princeton-MRAm [EMAIL PROTECTED] wrote: From: Zhang Yanwei - Princeton-MRAm [EMAIL PROTECTED] Subject: [R] Covariance matrix To: r-help@r-project.org r-help@r-project.org Received: Friday, 8 August, 2008, 12:18 AM Hi all, Assume I have a random vector with four variables, i.e. A=(a,b,c,d). I am able to get the covariance matrix of vector A, but how can I get the covariance matrix of vector B=(a,c,b,d) by manipulating the corresponding covariance matrix of A? Thanks. Sincerely, Yanwei Zhang Department of Actuarial Research and Modeling Munich Re America Tel: 609-275-2176 Email: [EMAIL PROTECTED]mailto:[EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simulate data based on partial correlation matrix
Hi Benjamin, Creating 0 correlations is easier and always possible, but creating arbitrary correlations can be done as well (when possible - see below). Suppose that x1,x2,x3,x4 have mean 0 and suppose that the desired correlations are r = (r1,r2,r3,r4). Let A be an orthogonal 4x4 matrix such that (y1,y2,y3,y4) = (x1,x2,x3,x4)*A are orthonormal (i.e. the norm of y1,y2,y3,y4 is 1 and they are orthogonal to each other - this can be done if x1,x2,x3,x4 are independent). Then the correlations between z and y1,y2,y3,y4 should be q = (q1,q2,q3,q4) = (r1,r2,r3,r4)*A. Let now v be any vector which has norm 1, mean 0 and is orthogonal to y1,y2,y3,y4 (it exists and can be easily found - see below). Now let z = a1*y1 + a2*y2 + a3*y3 +a4*y4 + a*v where a1,a2,a3,a4,a are scalars and assume that the norm of z is 1. In order for the correlations between z and y1,y2,y3,y4 to be q1,q2,q3,q4 we must have a1=q1,a2=q2,a3=q3,a4=q4. Moreover, the square of the norm of z is q1^2 + q2^2 + q3^2 + q4^2 + a^2 = 1, which means that q1^2 + q2^2 + q3^2 + q4^2 = 1. This condition is necessary and sufficient for a solution to exist!!! So now we need to things: (a) to find the matrix A and (b) to find the vector v. (a) Let M = (x1,x2,x3,x4) be a nx4 matrix, Then t(M)*M is a 4x4 matrix which is positive definite if x1,x2,x3,x4 are independent (let's assume this). Then the exists a unitary matrix U such that t(M)*M = t(U)*D*U where D is a diagonal matrix (U and D can be found using the function eigen). In such a case A = t(U)*(1/sqrt(D))*U. (b) To find v, test Vi = (-1,0,..,0,1,0,0,...,0) where the 1 is at place i. We need to find Vi which is independent of y1,y2,y3,y4. Most probably any of V2,V3,... is independent of y1,...,y4, but in any case at least one of V2,V3,V4,V5,V6 is independent of them. To check which one compute (y1%*%V)^2 + ... + (y4%*%V)^2 (where %*% is dot product). If this sum is noticeably less than 2 (= V%*%V) then this is what we need (let's say it is less than 1.99). Now take v = V - (V%*%y1)*y1 - (V%*%y2)*y2- (V%*%y3)*y3 - (V%*%y4)*y4 and normalize v so that it's norm is 1. Regards, Moshe. --- On Tue, 5/8/08, Benjamin Michael Kelcey [EMAIL PROTECTED] wrote: From: Benjamin Michael Kelcey [EMAIL PROTECTED] Subject: [R] simulate data based on partial correlation matrix To: r-help@r-project.org Received: Tuesday, 5 August, 2008, 6:24 AM Given four known and fixed vectors, x1,x2,x3,x4, I am trying to generate a fifth vector,z, with specified known and fixed partial correlations. How can I do this? In the past I have used the following (thanks to Greg Snow) to generate a fifth vector based on zero order correlations---however I'd like to modify it so that it can generate a fifth vector with specific partial correlations rather than zero order correlations: # create x1-x4 x1 - rnorm(100, 50, 3) x2 - rnorm(100) + x1/5 x3 - rnorm(100) + x2/5 x4 - rnorm(100) + x3/5 # find current correlations cor1 - cor( cbind(x1,x2,x3,x4) ) cor1 # create 1st version of z z - rnorm(100) # combine in a matrix m1 - cbind( x1, x2, x3, x4, z ) # center and scale m2 - scale(m1) # find cholesky decomp c1 - chol(var(m2)) # force to be independent m3 - m2 %*% solve(c1) # create new correlation matrix: cor2 - cbind( rbind( cor1, z=c(.5,.3,.1,.05) ), z=c(.5,.3,.1,.05,1) ) # create new matrix m4 - m3 %*% chol(cor2) # uncenter and unscale m5 - sweep( m4, 2, attr(m2, 'scaled:scale'), '*') m5 - sweep( m5, 2, attr(m2, 'scaled:center'), '+') ##Check they are equal zapsmall(cor(m5))==zapsmall(cor2) Thanks, ben __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] stats question
Hello Jason, You are not specific enough. What do you mean by significant difference? Let's assume that indeed the incidence in A is 6% and in B is 10% and we are looking for Na and Nb such that with probability of at least 80% the mean of Nb sample from B will be at least, say, 0.03 (=3%) above the mean of Na sample from A. The solution is not unique. If Mb is the mean of the sample from B and Ma is the one from A, using Normal approximation we get the Mb is approximately normal with mean 0.10 and variance 0.1*0.9/Nb and Ma is approximately normal with mean 0.06 and variance 0.06*0.94/Na, so Mb - Ma is approximately normal with mean 0.04 and variance 0.09/Nb + 0.0564/Na. So let V be the maximal variance for which the probability that a normal rv with mean 0.04 and variance V is above 0.03 equals 0.80 (finding such V is straightforward). Then you must choose Na and Nb which satisfy 0.09/Nb + 0.0564/Na = V. One such choice is Nb = 2*0.09/V, Na = 2*0.0564/V. As I said, this solution is only approximate and probably not optimal, so see what other people say. Regards, Moshe. --- On Thu, 31/7/08, Iasonas Lamprianou [EMAIL PROTECTED] wrote: From: Iasonas Lamprianou [EMAIL PROTECTED] Subject: [R] stats question To: r-help@r-project.org Received: Thursday, 31 July, 2008, 2:46 PM Dear friends, I am not sure that this is the right place to ask, but please feel free to suggest an alternative discussion group. My question is that I want to do a comparative study in order to compare the rate of incidence in two populations. I know that a pilot study was conducted a few weeks ago and found 8/140 (around 6%) incidence in population A. Population B was not sampled. Assuming this is (about) the right proportion in the Population A what is the sample size I need for population A and B in the main study, in order to have power of 80% to idenitfy significant differences? I would expect the incidence in population B to be around 10% compared to the 6% of the Population A. Any suggestions? How can I do this in R? Jason __ Not happy with your email address?. Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Code to calculate internal rate of return
You can use uniroot (see ?uniroot). As an example, suppose you have a $100 bond which pays 3% every half year (6% coupon) and lasts for 4 years. Suppose that it now sells for $95. In such a case your time intervals are 0,0.5,1,...,4 and the payoffs are: -95,3,3,...,3,103. To find internal rate of return you can do the following: tim - (0:8)/2 tim [1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 pay - 3 + 0*tim pay[1] - -95 pay[9] - 103 pay [1] -95 3 3 3 3 3 3 3 103 f - function(r) sum(pay*exp(-r*tim)) z - uniroot(f,c(0,1)) z $root [1] 0.07329926 $f.root [1] 0.01035543 $iter [1] 5 $estim.prec [1] 6.103516e-05 So the internal rate of return is 0.07329926 (z$root) = 7.33% (continuously compound). --- On Fri, 1/8/08, Thomas E [EMAIL PROTECTED] wrote: From: Thomas E [EMAIL PROTECTED] Subject: [R] Code to calculate internal rate of return To: r-help@r-project.org Received: Friday, 1 August, 2008, 2:05 AM Hi all. I am an R newbie and trying to grasp how the simple optimization routines in R work. Specifically, I would like some guidance on how to set up a code to calculate the internal rate of return on an investment project (http://en.wikipedia.org/wiki/Internal_rate_of_return). My main problem (I think) is that I want a generic code where N (number of periods) can be easily changed and set within the code, Hope this makes sense - any help appreciated !!! Thomas -- View this message in context: http://www.nabble.com/Code-to-calculate-internal-rate-of-return-tp18757967p18757967.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cutting out numbers from vectors
This is something that is easier done in C than in R (to the best of my very limited knowledge). To do this in R you could do something like: x - 082-232-232-1 y -unlist(strsplit(x,)) i - which(y != 0)[1]-1 paste(y[-(1:i)],collapse=) [1] 82-232-232-1 --- On Fri, 1/8/08, calundergrad [EMAIL PROTECTED] wrote: From: calundergrad [EMAIL PROTECTED] Subject: [R] cutting out numbers from vectors To: r-help@r-project.org Received: Friday, 1 August, 2008, 6:40 AM i have a vector with values similar to the below text [1] 001-010-001-0 I want to get rid of all leading zeroes. for example i want to change the values of the vector so that [1] 001-010-001-0 becomes [1] 1-010-001-0. Another example [1]082-232-232-1 becomes [1] 82-232-232-1 how do i do this? -- View this message in context: http://www.nabble.com/cutting-out-numbers-from-vectors-tp18763058p18763058.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping Index of Matrix Based on Certain Condition
y - 2 - (x[,1] x[,2]) you can also do cbind(x,y) if you wish. --- On Fri, 1/8/08, Gundala Viswanath [EMAIL PROTECTED] wrote: From: Gundala Viswanath [EMAIL PROTECTED] Subject: [R] Grouping Index of Matrix Based on Certain Condition To: [EMAIL PROTECTED] Received: Friday, 1 August, 2008, 1:00 PM Hi, I have the following (M x N) matrix, where M = 10 and N =2 What I intend to do is to group index of (M) based on this condition of x_mn , namely For each M, If x_m1 x_m2, assign index of M to Group1 otherwise assign index of M into Group 2 x [,1] [,2] [1,] 4.482909e-01 0.55170907 [2,] 9.479594e-01 0.05204063 [3,] 8.923553e-01 0.10764474 [4,] 9.295003e-01 0.07049966 [5,] 8.880434e-01 0.11195664 [6,] 9.197367e-01 0.08026327 [7,] 9.431232e-01 0.05687676 [8,] 9.460356e-01 0.05396442 [9,] 6.053829e-01 0.39461708 [10,] 9.515173e-01 0.04848268 Is there a simple way to do it in R? - Gundala Viswanath Jakarta - Indonesia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cutting out numbers from vectors
Yes, this is how it should be done! --- On Fri, 1/8/08, Christos Hatzis [EMAIL PROTECTED] wrote: From: Christos Hatzis [EMAIL PROTECTED] Subject: Re: [R] cutting out numbers from vectors To: 'calundergrad' [EMAIL PROTECTED], r-help@r-project.org Received: Friday, 1 August, 2008, 2:11 PM Have a look at gsub: x - 001-010-001-0 gsub(^0+, , x) [1] 1-010-001-0 -Christos -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of calundergrad Sent: Thursday, July 31, 2008 4:40 PM To: r-help@r-project.org Subject: [R] cutting out numbers from vectors i have a vector with values similar to the below text [1] 001-010-001-0 I want to get rid of all leading zeroes. for example i want to change the values of the vector so that [1] 001-010-001-0 becomes [1] 1-010-001-0. Another example [1]082-232-232-1 becomes [1] 82-232-232-1 how do i do this? -- View this message in context: http://www.nabble.com/cutting-out-numbers-from-vectors-tp18763 058p18763058.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Urgent
Hi Yunlei, Is your problem constrained or not? If it is unconstrained and your matrix is not positive definite, the minimum is unbounded (unless you are extremely lucky and the matrix is positive semi-definite and the vector which multiplies the unknowns is exactly perpendicular to all the eigenvectors with 0 eigenvalues). If the problem is constrained you may use regular optimization functions (like optim). Regards, Moshe. --- On Tue, 22/7/08, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: From: [EMAIL PROTECTED] [EMAIL PROTECTED] Subject: [R] Urgent To: r-help@r-project.org Received: Tuesday, 22 July, 2008, 6:12 PM Hi everyone I try to use the solve.QP to solve the quadratic programming problem. But the Dmat in the funciotn is non-positive definite. Can anyone tell me which R function can deal with the quadratic programming with non positive definite Dmat matrix ( in solve.QP) I really appreaciated. Yunlei Hu ___ This e-mail may contain information that is confidential, privileged or otherwise protected from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete it and any attachments and notify the sender that you have received it in error. Unless specifically indicated, this e-mail is not an offer to buy or sell or a solicitation to buy or sell any securities, investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Barclays. Any views or opinions presented are solely those of the author and do not necessarily represent those of Barclays. This e-mail is subject to terms available at the following link: www.barcap.com/emaildisclaimer. By messaging with Barclays you consent to the foregoing. Barclays Capital is the investment banking division of Barclays Bank PLC, a company registered in England (number 1026167) with its registered offi! ce at 1 Churchill Place, London, E14 5HP. This email may relate to or be sent from other members of the Barclays Group. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling two exponentials
I am not sure that this is well defined. For a multivariate normal distribution (which is well defined), the covariance matrix (and the means vector) fully determine the distribution. In the exponential case, what is multivariate (bivariate) exponential distribution? I believe that knowing that the two marginal distributions are exponential and knowing the covariance matrix does not uniquely determine the joint distribution. On the other hand, if you want just any such pair, this is not difficult (at least if we want the correlation to be non-negative). Let X and Z be independent exponential variables with lambda = 1 and let W be independent of X and Z with P(W=1) = p = 1 - P(W=0). Now let Y = X if W = 1 and Y = Z if W = 0. It is easy to see that Y is also exponential with lambda = 1 and the correlation coefficient between X and Y is p. Now multiply X and Y by appropriate constants to get the desired covariance matrix (correlation coefficient won't be affected). --- On Thu, 31/7/08, Ben Bolker [EMAIL PROTECTED] wrote: From: Ben Bolker [EMAIL PROTECTED] Subject: Re: [R] Sampling two exponentials To: [EMAIL PROTECTED] Received: Thursday, 31 July, 2008, 10:40 AM Zhang Yanwei - Princeton-MRAm YZhang at munichreamerica.com writes: Hi all, I am going to sample two variables from two exponential distributions, but I want to specify a covariance structure between these two variables. Is there any way to do it in R? Or is there a Multivariate Exponential thing corresponding to the multivariate normal? Thanks in advance. It's not at all easy, but I'd suggest that you do some reading about copulas (and take a look at the copula package). Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] product of successive rows
Assuming that the number of rows is even and that your matrix is A, element-wise product of pairs of rows can be calculated as A[seq(1,nrow(A),by=2),]*A{seq(2,nrow(A),by=2),] --- On Mon, 28/7/08, rcoder [EMAIL PROTECTED] wrote: From: rcoder [EMAIL PROTECTED] Subject: [R] product of successive rows To: r-help@r-project.org Received: Monday, 28 July, 2008, 8:20 AM Hi everyone, I want to perform an operation on a matrx that outputs the product of successive pairs of rows. For example: calculating the product between rows 1 2; 3 4; 5 6...etc. Does anyone know of any readily available functions that can do this? Thanks, rcoder -- View this message in context: http://www.nabble.com/product-of-successive-rows-tp18681259p18681259.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] finding a faster way to do an iterative computation
Try abs(outer(xk,x,-)) (see ?outer) --- On Wed, 30/7/08, dxc13 [EMAIL PROTECTED] wrote: From: dxc13 [EMAIL PROTECTED] Subject: [R] finding a faster way to do an iterative computation To: r-help@r-project.org Received: Wednesday, 30 July, 2008, 4:12 AM useR's, I am trying trying to find out if there is a faster way to do a certain computation. I have successfully used FOR loops and the apply function to do this, but it can take some time to fully compute, but I was wondering if anyone may know of a different function or way to do this: x [1] 1 2 3 4 5 xk [1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 I want to do: abs(x-xk[i]) where i = 1 to length(xk)=13. It should result in a 13 by 5 matrix or data frame. Does anyone have a quicker solution than FOR loops or apply()? Much appreciation and thanks, dxc13 -- View this message in context: http://www.nabble.com/finding-a-faster-way-to-do-an-iterative-computation-tp18718233p18718233.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chi-square parameter estimation
If v is your vector of sample variances (and assuming that their distribution is chi-square) you can define f(df) - sum(dchisq(v,df,log=TRUE)) and now you need to maximize f, which can be done using any optimization function (like optim). --- On Sat, 26/7/08, Julio Rojas [EMAIL PROTECTED] wrote: From: Julio Rojas [EMAIL PROTECTED] Subject: [R] Chi-square parameter estimation To: r-help@r-project.org Received: Saturday, 26 July, 2008, 12:03 AM Hi. I have made 100 experiments of an M/M/1 queue, and for each one I have calculated both, mean and variance of the queue size. Now, a professor has told me that variance is usually chi-squared distributed. Is there a way in R that I can find the parameter that best fits a chi-square to the variance data? I know there's fitdistr()m but this function doesn't handle chi-square. I believe the mean estimator for the chi-square is df (degrees of freedom). The theoretical variance for an M/M/1 queue with lambda=30/33 is ~108. So, should I expect the chi-square with parameter 108 is the one that best fits the data? Thanks a lot for your help. Yahoo! MTV Blog Rock gt;¡Cuéntanos tu historia, inspira una canción y gánate un viaje a los Premios MTV! Participa aquí http://mtvla.yahoo.com/ [[alternative HTML version deleted]]__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained coefficients in lm (correction)
This problem can be easily solved analytically: we want to minimize sum(res(i) -a*st(i) -b*mod(i))^2 subject to a+b=1,a,b=0, so we want to minimize f(a) = sum((res(i)-mod(i)) - a*(st(i)-mod(i)))^2 for 0=a=1 Define Xi = res(i) - mod(i), Yi = st(i) - mod(i), then f(a) = sum(Xi - a*Yi)^2 f(0) = sum(Xi^2), f(1) = sum(Xi-Yi)^2, f'(a) = -2*sum(Yi*(Xi-aYi)), so f'(a) = 0 for a* = sum(Xi*Yi)/sum(Yi^2) So if 0 = a* = 1 then a* is the solution, otherwise take a=0 or a=1 depending on whether f(0) f(1) or not. --- On Thu, 24/7/08, Jorge Ivan Velez [EMAIL PROTECTED] wrote: From: Jorge Ivan Velez [EMAIL PROTECTED] Subject: [R] Constrained coefficients in lm (correction) To: R mailing list r-help@r-project.org Received: Thursday, 24 July, 2008, 5:39 AM Dear list, In my previous email, the model I'd like to estimate is rea=a*st+b*mod+error, where a+b=1 and a,b0. My apologies for the misunderstanding. Thanks for all your help, Jorge On Wed, Jul 23, 2008 at 3:35 PM, Jorge Ivan Velez [EMAIL PROTECTED] wrote: Dear list, I have a data set which looks like myDF (see below) and I'd like to estimate the linear model rea=a*rea+b*mod+error, where a+b=1 and a,b0. I tried mymodel=lm(rea~ st + mod-1, data=myDF) summary(mymodel) but I couldn't get what I'm looking for. Also, I tried RSiteSearch(constrained coefficients in lm) which produces 20 hints, but they're not what I need. Any help would be appreciated. Thanks in advance, Jorge # Data set to estimate the model myDF=read.table(textConnection(rea st mod 14482734 14305907 14761000 14427969 14502228 14848024 14698723 14408264 14259492 15741064 14797138 14565303 15914249 16162714 14348592 16475594 15977623 15592229 17124613 16688456 14988151 17575281 17383822 15647240 17721548 17757635 15624907 18178273 17779858 15598802 18005810 18364233 16363321 18032618 17938963 16536844 17737470 18043063 17096089 17652014 17624039 17056355),header=TRUE,sep=) closeAllConnections() [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] spectral decomposition for near-singular pd matrices
How large is your matrix? Are the very small eigenvalues well separated? If your matrix is not very small and the lower eigenvalues are clustered, this may be a really hard problem! You may need a special purpose algorithm and/or higher precision arithmetic. If your matrix is A and there exists a sequence of matrices A1,A2,...An = A such that A1 is good, A2 is a bit worse (and is not very different from A1), etc., you may be able to compute the eigensystem for A1 and then track it down to A2, A3,...,An. I have worked on such problems some 15 years ago but I believe that by now there should be packages doing this (I do not know whether they exist in R). --- On Thu, 17/7/08, Prasenjit Kapat [EMAIL PROTECTED] wrote: From: Prasenjit Kapat [EMAIL PROTECTED] Subject: [R] spectral decomposition for near-singular pd matrices To: [EMAIL PROTECTED] Received: Thursday, 17 July, 2008, 7:27 AM Hi, I have a matrix M, quite a few ( 1/4th) of its eigen values are of O(10^-16). Analytically I know that M is positive definite, but numerically of course it is not. Some of the small (O(10^-16)) eigen values (as obtained from eigen()) are negative. It is the near-singularity that is causing the numerical errors. I could use svd(), but then the left ($u) and right ($v) vectors are not identical, again numerical errors. Is there any function that imposes pd property while calculating the eigen decomposition. Thanks, PK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] spectral decomposition for near-singular pd matrices
Your problem seems to be pretty hard (for larger dimensions)! When speaking about the A1,A2,...,An sequence I did not mean representing A as a product of A1*A2*...*An. What I mean is that A1,...,An is a sequence of matrices which starts at a good matrix, i.e. it's eigenvalues are well separated, then A2 is not too different from A1 and having already found the eigensystem (or a part of it) for A1 you can use them to find that system for A2 (without doing it from scratch), A3 is not too much different from A2 and so having the eigensystem for A2 you get ne for A3, etc. until you get to the last matrix which is A. We had a matrix arising from elasticity. When the elasticity coefficient mu was small the matrix was good but when mu approached 0.5, half of the eigenvalues approached 0 (for mu = 0.5 half of the eigenvalues are really 0). Starting from something like mu = 0.4 we were able to track several selected (lowest) eigenvalues up to something like mu = 0.4999, while no other program (at that time) could find the eigensystem of such matrix even for much less problematic values of mu. --- On Thu, 17/7/08, Prasenjit Kapat [EMAIL PROTECTED] wrote: From: Prasenjit Kapat [EMAIL PROTECTED] Subject: Re: [R] spectral decomposition for near-singular pd matrices To: [EMAIL PROTECTED] Received: Thursday, 17 July, 2008, 10:56 AM Moshe Olshansky m_olshansky at yahoo.com writes: How large is your matrix? Right now I am looking at sizes between 30x30 to 150x150, though it will increase in future. Are the very small eigenvalues well separated? If your matrix is not very small and the lower eigenvalues are clustered, this may be a really hard problem! Here are the deciles of the eigenvalues (using eigen()) from a typical random generation (30x30): Min 10th 20th 30th 40th -1.132398e-16 -6.132983e-17 3.262002e-18 1.972702e-17 8.429709e-17 50th 60th 70th 80th 90th Max 2.065645e-13 1.624553e-09 4.730935e-06 0.00443353 0.9171549 16.5156 I am guessing they are *not* well-separated. You may need a special purpose algorithm and/or higher precision arithmetic. If your matrix is A and there exists a sequence of matrices A1,A2,...An = A such that A1 is good, A2 is a bit worse (and is not very different from A1), etc., you may be able to compute the eigensystem for A1 and then track it down to A2, A3,...,An. I have worked on such problems some 15 years ago but I believe that by now there should be packages doing this (I do not know whether they exist in R). I will have to think on the possibility of such a product-decomposition. But even if it were possible, it would be an infinite product (just thinking out loud). Thanks again, PK PS: Kindly CC me when replying. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rounding
The problem is that neither 0.55 nor 2.55 are exact machine numbers (the computer uses binary representation), so it may happen that the machine representation of 0.55 is slightly less than 0.55 while the machine representation of 2.55 is slightly above 2.55. --- On Fri, 11/7/08, Korn, Ed (NIH/NCI) [E] [EMAIL PROTECTED] wrote: From: Korn, Ed (NIH/NCI) [E] [EMAIL PROTECTED] Subject: [R] rounding To: [EMAIL PROTECTED] Received: Friday, 11 July, 2008, 6:07 AM Hi, Round(0.55,1)=0.5 Round(2.55,1)=2.6 Can this be right? Thanks, Ed [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rounding
Actually, your result is strange, since if x = 0.55 then a hexadecimal representation of x*100 is 404B8000 0001 which is above 55, meaning that x is above 0.55 and should have been rounded to 0.6, while if x = 2.55 then the hexadecimal representation of x*100 is 406FDFFF which is below 255, so that x is less than 2.55 and should have been rounded to 2.5. --- On Fri, 11/7/08, Moshe Olshansky [EMAIL PROTECTED] wrote: From: Moshe Olshansky [EMAIL PROTECTED] Subject: Re: [R] rounding To: [EMAIL PROTECTED], Korn, Ed (NIH/NCI) [E] [EMAIL PROTECTED] Received: Friday, 11 July, 2008, 10:39 AM The problem is that neither 0.55 nor 2.55 are exact machine numbers (the computer uses binary representation), so it may happen that the machine representation of 0.55 is slightly less than 0.55 while the machine representation of 2.55 is slightly above 2.55. --- On Fri, 11/7/08, Korn, Ed (NIH/NCI) [E] [EMAIL PROTECTED] wrote: From: Korn, Ed (NIH/NCI) [E] [EMAIL PROTECTED] Subject: [R] rounding To: [EMAIL PROTECTED] Received: Friday, 11 July, 2008, 6:07 AM Hi, Round(0.55,1)=0.5 Round(2.55,1)=2.6 Can this be right? Thanks, Ed [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] number of effective tests
It looks like SR, SU and ST are strongly correlated to each other, as well as DR, DU and DT. You can try to do PCA on your 6 variables, pick the first 2 principal components as your new variables and use them for regression. --- On Fri, 11/7/08, Georg Ehret [EMAIL PROTECTED] wrote: From: Georg Ehret [EMAIL PROTECTED] Subject: [R] number of effective tests To: r-help [EMAIL PROTECTED] Received: Friday, 11 July, 2008, 11:46 AM Dear R community, I am using 6 variables to test for an effect (by linear regression). These 6 variables are strongly correlated among each other and I would like to find out the number of independent test that I perform in this calcuation. For this I calculated a matrix of correlation coefficients between the variables (see below). But to find the rank of the table in R is not the right approach... What else could I do to find the effective number of independent tests? Any suggestion would be very welcome! Thanking you and with my best regards, Georg. for (a in 1:6){ + for (b in 1:6){ + r[a,b]-summary(lm(unlist(d[a])~unlist(d[b])),na.action=na.exclude)$adj.r.squared + } + } r SRSUSTDRDU DT SR 1.000 0.9636642 0.9554952 0.2975892 0.3211303 0.3314694 SU 0.9636642 1.000 0.9101678 0.3324979 0.3331389 0.3323826 ST 0.9554952 0.9101678 1.000 0.2756876 0.3031676 0.3501157 DR 0.2975892 0.3324979 0.2756876 1.000 0.9981733 0.9674843 DU 0.3211303 0.3331389 0.3031676 0.9981733 1.000 0.9977780 DT 0.3314694 0.3323826 0.3501157 0.9674843 0.9977780 1.000 * Georg Ehret Johns Hopkins University Baltimore, US [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum(Random Numbers)=100
For arbitrary lambda_i it can take years until the sum of 50 such random variables is 100! But if one makes lambda_i = 2, then the probability that the sum of 50 of them equals 100 is about 1/sqrt(2*pi*100), so on average that sequence of 50 numbers must be generated about sqrt(2*pi*100)) ~ 25 times, which is very reasonable. --- On Tue, 8/7/08, Peng Jiang [EMAIL PROTECTED] wrote: From: Peng Jiang [EMAIL PROTECTED] Subject: Re: [R] Sum(Random Numbers)=100 To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED], Shubha Vishwanath Karanth [EMAIL PROTECTED] Received: Tuesday, 8 July, 2008, 4:56 PM Hi, I am afraid there is no other way except using brute force, that is , loop until their sum reaches your expectation. it is easy to figure out this probability by letting their sum to be a new random variable Z and Z = X_1 + \ldots + X_n where X_i ~ Poisson({\lambda}_i) . By calculating their moment generate function we can find the pmf of Z which is a new Poisson random variable with the parameter \sum_{i}{{\lambda}_i}. and Moshe Olshansky's method is also correct except it is based on the conditioning. On 2008-7-8, at 下午1:58, Shubha Vishwanath Karanth wrote: On 2008-7-8, at 下午2:39, Moshe Olshansky wrote: If they are really random you can not expect their sum to be 100. However, it is not difficult to get that given that the sum of n independent Poisson random variables equals N, any individual one has the conditional binomial distribution with size = N and p = 1/n, i.e. P(Xi=k/Sn=N) = (N over k)*(1/n)^k*((n-1)/n)^(N-k). So you can generate X1 binomial with size = 100 and p = 1/50; if X1 = k1 then the sum of the rest 49 must equal 100 - k1, so now you generate X2 binomial with size = 100-k1 and p = 1/49; if X2 = k2 then generate X3 binomial with size = 100 -(k1+k2) and p = 1/48, etc. Why do you need this? --- On Tue, 8/7/08, Shubha Vishwanath Karanth [EMAIL PROTECTED] wrote: From: Shubha Vishwanath Karanth [EMAIL PROTECTED] Subject: [R] Sum(Random Numbers)=100 To: [EMAIL PROTECTED] Received: Tuesday, 8 July, 2008, 3:58 PM Hi R, I need to generate 50 random numbers (preferably poisson), such that their sum is equal to 100. How do I do this? Thank you, Shubha This e-mail may contain confidential and/or privileged i...{{dropped:13}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --- Peng Jiang 江鹏 ,Ph.D. Candidate Antai College of Economics Management 安泰经济管理学院 Department of Mathematics 数学系 Shanghai Jiaotong University (Minhang Campus) 800 Dongchuan Road 200240 Shanghai P. R. China __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiplication question
The answer to your first question is sum(x)8sum(y) - sum(x*y) and for the second one x %*% R %*% y - sum(x*y*diag(R)) --- On Thu, 3/7/08, Murali Menon [EMAIL PROTECTED] wrote: From: Murali Menon [EMAIL PROTECTED] Subject: [R] multiplication question To: [EMAIL PROTECTED] Received: Thursday, 3 July, 2008, 2:30 AM folks, is there a clever way to compute the sum of the product of two vectors such that the common indices are not multiplied together? i.e. if i have vectors X, Y, how can i compute Sum (X[i] * Y[j]) i != j where i != j also, what if i wanted Sum (X[i] * Y[j] * R[i, j]) i != j where R is a matrix? thanks, murali _ Enter the Zune-A-Day Giveaway for your chance to win — day after day after day M_Mobile_Zune_V1 [[alternative HTML version deleted]]__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] odd dnorm behaviour (?)
dnorm() computes the density, so it may be 1; pnorm() computes the distribution function. --- On Tue, 8/7/08, Mike Lawrence [EMAIL PROTECTED] wrote: From: Mike Lawrence [EMAIL PROTECTED] Subject: Re: [R] odd dnorm behaviour (?) To: Rhelp [EMAIL PROTECTED] Received: Tuesday, 8 July, 2008, 2:51 PM #Let's try this again! This time the code is more sensible (curve range, same sd value). #Quick one hopefully. Shouldn't dnorm be returning the pdf? Last time I checked, #a probability shouldn't be greater than 1 as produced by: curve(dnorm(x,0,.1),from=-.5,to=.5) #Shouldn't I be getting an axis more like that produced by: f=function(x,m,s){ y=rep(NA,length(x)) for(i in 1:length(x)){ y[i]=integrate( dnorm , upper=x[i]+sqrt(.Machine$double.eps) , lower=x[i]-sqrt(.Machine$double.eps) , mean=m , sd=s )$value } return(y) } curve(f(x,0,.1),from=-.5,to=.5) #If the latter code betrays a misunderstanding of what a pdf is, be gentle! #Mike -- Mike Lawrence Graduate Student, Department of Psychology, Dalhousie University www.memetic.ca The road to wisdom? Well, it's plain and simple to express: Err and err and err again, but less and less and less. - Piet Hein __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lots of huge matrices, for-loops, speed
Another possibility is to use explicit formula, i.e. if you are doing linear regression like y = a*x + b then the explicit formulae are: a = (meanXY - meanX*meanY)/(meanX2 - meanX^2) b = (meanY*meanX2 - meanX*meanXY)/(meanX2 - meanX^2) where meanX is mean(x), meanXY is mean(x*y), meanX2 is mean(x^2), etc. So if x is your time vector you could do something like: meanY - matrix(0,nrow=4000,ncol=3500) meanXY - matrix(0,nrow=4000,ncol=3500) meanY2 - matrix(0,nrow=4000,ncol=3500) for (i in 1:80) { A - read.table(file[i]) meanY = meanY +A meanXY - meanXY + x[i]*A meanY2 - meanY2 + A*A } now: meanY - meanY/80 meanXY - meanXY/80 meanY2 - meanY2/80 meanX - mean(x) meanX2 - mean(x*x) and now use the above formulae to compute regression coefficients. You will need space for 4 4000x3500 matrices and this should not be a problem. Once a and b are found you can use meanX,meanX2,meanXY,meanY,meanY2 to compute R2 as well. --- On Mon, 7/7/08, Zarza [EMAIL PROTECTED] wrote: From: Zarza [EMAIL PROTECTED] Subject: [R] Lots of huge matrices, for-loops, speed To: r-help@r-project.org Received: Monday, 7 July, 2008, 1:39 AM Hello, we have 80 text files with matrices. Each matrix represents a map (rows for latitude and columns for longitude), the 80 maps represent steps in time. In addition, we have a vector x of length 80. We would like to compute a regression between matrices (response through time) and x and create maps representing coefficients, r2 etc. Problem: the 80 matrices are of the size 4000 x 3500 and we were running out of memory. We computed line by line and the results for each line were appended to output grids. This works. But - for each line, 80 text files must be scanned and output must be written. And there are several for-loops involved. This takes a lot of time (about a week). I read the contributions related to speeding up code and maybe vectorizing parts of the procedure could help a bit. However, I am a neophyte (as you may see from the code below) and did not find a way by now. I would appreciate very much any suggestions for speeding up the procedure. Thanks, Zarza The code (running but sloow): regrid - function (infolder, x, outfolder) { # List of input files setwd (infolder) filelist - dir (pattern=.*.asc$, full.names = F) # Dimensions (making use of the header information coming with # the .asc-input files, ESRI-format) hd - read.table (filelist [1], nrows = 6) cols - hd[1,2] rows - hd[2,2] times - length (filelist) items - 4 + ncol (x) # Prepare output out1 - matrix (numeric (times * cols), ncol = cols) out2 - matrix (numeric (items * cols), ncol = items) out3 - as.numeric (items) # Prepare .asc-files filenames - c(R2, adj.R2, p, b0, colnames (x)) for (i in 1:items) { write.table (hd, file = paste (outfolder, filenames [i],.asc,sep =), quote=F, row.names=F, col.names=F) } rm (hd) # Prepare regression xnam - paste (x[,, 1:(ncol(x)),], sep=) form - paste(y ~ , paste(xnam, collapse=+)) rm (xnam) # Loop through rows for (j in 1:rows) { getgrid - function (j) { print (paste (Row,j,/,rows),quote = F) # Read out multi-temporal response values for one grid-row of cells for (k in 1:times) { getslice - function (k) { values - scan (filelist [k], what=0, na.strings = -, skip = (5 + j), nlines = 1, nmax = cols, quiet=T) values } out1[k,] - getslice (k) } # Regression for (l in 1:cols) { y - as.vector (out1 [,l]) if(length (y) length (na.omit (y))) { setNA - function (l) { NAs - rep (NA, length (out3)) NAs } out2[l,] - setNA (l) } else { regression - function (l) { model - lm (as.formula(form)) out3[1] - summary (model)$r.squared out3[2] - summary (model)$adj.r.squared f - summary (model)$fstatistic out3[3] - 1-pf(f[1],f[2],f[3]) out3[4:items] - coef(model)[1:(1 + ncol(x))] out3 } out2[l,] - regression (l) } } out2 } fillrow - getgrid (j) # Append results to output files for (m in 1:items) { write.table (t(fillrow [,m]), file = paste (outfolder, filenames [m], .asc, sep =), append=T, quote=F, na = as.character (-), row.names = F, col.names = F, dec=.) } } } -- View this message in context: http://www.nabble.com/Lots-of-huge-matrices%2C-for-loops%2C-speed-tp18303230p18303230.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read
Re: [R] Lots of huge matrices, for-loops, speed
This is correct for larger (more columns) matrices - computing the t(A)*A matrix and inverting it may cause numerical problems, but this should not be the case with two columns (one of which is all 1's). In any case, the matrix depends on vector x only and since it is small (80 entries), the resulting matrix can be decomposed/processed in any desirable way independently of the huge Y matrices (but once again, this is not needed for the particular case). --- On Mon, 7/7/08, Rolf Turner [EMAIL PROTECTED] wrote: From: Rolf Turner [EMAIL PROTECTED] Subject: Re: [R] Lots of huge matrices, for-loops, speed To: [EMAIL PROTECTED] Cc: r-help@r-project.org, Zarza [EMAIL PROTECTED] Received: Monday, 7 July, 2008, 9:40 AM On 7/07/2008, at 11:05 AM, Moshe Olshansky wrote: Another possibility is to use explicit formula, i.e. if you are doing linear regression like y = a*x + b then the explicit formulae are: a = (meanXY - meanX*meanY)/(meanX2 - meanX^2) b = (meanY*meanX2 - meanX*meanXY)/(meanX2 - meanX^2) where meanX is mean(x), meanXY is mean(x*y), meanX2 is mean(x^2), etc. My understanding is that such formulae, while ``algebraically correct'' are highly unstable numerically and hence should be avoided. Others who are wiser and more knowledgeable than I may wish to comment or correct me. cheers, Rolf Turner ## Attention: This e-mail message is privileged and confidential. If you are not the intended recipient please delete the message and notify the sender. Any views or opinions presented are solely those of the author. This e-mail has been scanned and cleared by MailMarshal www.marshalsoftware.com ## __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plot Mixtures of Synthetically Generated Gamma Distributions
I know very little about graphics, so my primitive and brute force solution would be plot(density(x[1:30]),col=blue);lines(density(x[31:60]),col=red);lines(density(x[61:90]),col=green) --- On Mon, 7/7/08, Gundala Viswanath [EMAIL PROTECTED] wrote: From: Gundala Viswanath [EMAIL PROTECTED] Subject: [R] Plot Mixtures of Synthetically Generated Gamma Distributions To: [EMAIL PROTECTED] Received: Monday, 7 July, 2008, 1:24 PM Hi, I have the following vector which is created from 3 distinct distribution (three components) of gamma: x=c(rgamma(30,shape=.2,scale=14),rgamma(30,shape=12,scale=10),rgamma(30,shape=5,scale=6)) I want to plot the density curve of X, in a way that it shows a distinct 3 curves that represent each component. How can I do that? I tried this but doesn't work: lines(density(x)) Please advise. - Gundala Viswanath Jakarta - Indonesia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A regression problem using dummy variables
Do you have a reason to treat all 3 levels together and not have a separate regression for each level? --- On Tue, 1/7/08, rlearner309 [EMAIL PROTECTED] wrote: From: rlearner309 [EMAIL PROTECTED] Subject: [R] A regression problem using dummy variables To: r-help@r-project.org Received: Tuesday, 1 July, 2008, 11:38 PM This is actually more like a Statistics problem: I have a dataset with two dummy variables controlling three levels. The problem is, one level does not have many observations compared with other two levels (a couple of data points compared with 1000+ points on other levels). When I run the regression, the result is bad. I have unbalanced SE and VIF. Does this kind of problem also belong to near sigularity problem? Does it make any difference if I code the level that lacks data (0,0) in stead of (0,1)? thanks a lot! -- View this message in context: http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18214377.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help_transformation
Let F be the distribution function of Y, PSI the standard normal distribution anf IPSI it's inverse. Let f(x) = IPSI(F(x)). It is not difficult to see that f(Y) has standard normal distribution. So you can replace F with the empirical distribution and IPSI is the qnorm function of R. --- On Wed, 25/6/08, Indermaur Lukas [EMAIL PROTECTED] wrote: From: Indermaur Lukas [EMAIL PROTECTED] Subject: [R] help_transformation To: [EMAIL PROTECTED] Received: Wednesday, 25 June, 2008, 9:56 PM heya, i am fitting linear mixed effect model to a response Y. Y shows an s-shaped distribution when using QQ-plots (some zero values and some very high values). hence, which transformation should i apply that Y follows a normal distribution? any r-function/package available to do this? thanks for any hint, regards, lukas °°° Lukas Indermaur, PhD student eawag / Swiss Federal Institute of Aquatic Science and Technology ECO - Department of Aquatic Ecology Überlandstrasse 133 CH-8600 Dübendorf Switzerland Phone: +41 (0) 71 220 38 25 Fax: +41 (0) 44 823 53 15 Email: [EMAIL PROTECTED] www.lukasindermaur.ch http://www.lukasindermaur.ch/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [SPAM] - constructing arbitrary (positive definite) covariance matrix - Found word(s) list error in the Text body
If the main diagonal element of matrix A is 1 and the off diagonal element is a then for any vector x we get that t(x)*A*x = (1-a)*sum(x^2) +a*(sum(x))^2 . If we want A to be positive (semi)definite we need this expression to be positive (non-negative) for any x!= 0. Since sum(x)^2/sum(x*2) = n where n is the dimension of the matrix and equality is possible we get that A is positive (semi)definite if and only if -1/(n-1) = a = 1 (sharp inequalities for positive definiteness). Since any symmetric (semi)positive definite matrix can be a covariance matrix this describes all the matrices which satisfy the requirement. --- On Fri, 27/6/08, Patrick Burns [EMAIL PROTECTED] wrote: From: Patrick Burns [EMAIL PROTECTED] Subject: Re: [R] [SPAM] - constructing arbitrary (positive definite) covariance matrix - Found word(s) list error in the Text body To: [EMAIL PROTECTED] Cc: Mizanur Khondoker [EMAIL PROTECTED], r-help@r-project.org Received: Friday, 27 June, 2008, 3:15 AM To make David's approach a little more concrete: You can always have correlations all equal to 1 -- the variables are all the same, except for the names you've given them. You can have two variables with correlation -1, but you can't get a third variable that has -1 correlation to both of the first two. Patrick Burns [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and A Guide for the Unwilling S User) [EMAIL PROTECTED] wrote: Well, if you think about the geometry, all correlations equal usually won't work. Think of the SDs as the sides of a simplex and the correlations as the cosines of the angles between the sides (pick one variable as the 'origin'.) Only certain values will give a valid covariance or correlation matrix. HTH, David L. Reiner, PhD Head Quant Rho Trading Securities, LLC -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mizanur Khondoker Sent: Thursday, June 26, 2008 11:11 AM To: r-help@r-project.org Subject: [SPAM] - [R] constructing arbitrary (positive definite) covariance matrix - Found word(s) list error in the Text body Dear list, I am trying to use the 'mvrnorm' function from the MASS package for simulating multivariate Gaussian data with given covariance matrix. The diagonal elements of my covariance matrix should be the same, i.e., all variables have the same marginal variance. Also all correlations between all pair of variables should be identical, but could be any value in [-1,1]. The problem I am having is that the matrix I create is not always positive definite (and hence mvrnorm fails). Is there any simple way of constructing covariance matrix of the above structure (equal variance, same pairwise correlation from [-1, 1]) that will always be positive definite? I have noticed that covraince matrices created using the following COV function are positive definite for -0.5 r 1. However, for r -0.5, the matrix is not positive definite. Does anyone have any idea why this is the case? For my simualtion, I need to generate multivariate data for the whole range of r, [-1, 1] for a give value of sd. Any help/ suggestion would be greatly appreciated. Examples COV-function (p = 3, sd = 1, r= 0.5){ cov - diag(sd^2, ncol=p, nrow=p) for (i in 1:p) { for (j in 1:p) { if (i != j) { cov[i, j] - r * sd*sd } } } cov } library(MASS) ### Simualte multivarite gaussin data (works OK) Sigma-COV(p = 3, sd = 2, r= 0.5) mu-1:3 mvrnorm(5, mu=mu, Sigma=Sigma) [,1] [,2] [,3] [1,] 1.2979984 1.843248 4.460891 [2,] 2.1061054 1.457201 3.774833 [3,] 2.1578538 2.761939 4.589977 [4,] 0.8775056 4.240710 2.203712 [5,] 0.2698180 2.075759 2.869573 ### Simualte multivarite gaussin data ( gives Error) Sigma-COV(p = 3, sd = 2, r= -0.6) mu-1:3 mvrnorm(5, mu=mu, Sigma=Sigma) Error in mvrnorm(5, mu = mu, Sigma = Sigma) : 'Sigma' is not positive definite __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Measuring Goodness of a Matrix
What do you mean by A similar to X? Do you mean norm of the difference, similar eigenvalues/vectors, anything else? --- On Wed, 25/6/08, Gundala Viswanath [EMAIL PROTECTED] wrote: From: Gundala Viswanath [EMAIL PROTECTED] Subject: [R] Measuring Goodness of a Matrix To: [EMAIL PROTECTED] Received: Wednesday, 25 June, 2008, 12:41 AM Hi all, Suppose I have 2 matrices A and B. And I want to measure how good each of this matrix is. So I intend to compare A and B with another gold standard matrix X. Meaning the more similar a matrix to X the better it is. What is the common way in R to measure matrix similarity (ie. A vs X, and B vs X) ? - Gundala Viswanath Jakarta - Indonesia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pairwise Partitioning of a Vector
One possibility is: x - c(30.9, 60.1 , 70.0 , 73.0 , 75.0 , 83.9 , 93.1 , 97.6 , 98.8 , 113.9) for (i in 1:9) assign(paste(PAIR,i,sep=),list(part1 = x[1:i],part2 = x[-(1:i)])) --- On Mon, 23/6/08, Gundala Viswanath [EMAIL PROTECTED] wrote: From: Gundala Viswanath [EMAIL PROTECTED] Subject: [R] Pairwise Partitioning of a Vector To: [EMAIL PROTECTED] Received: Monday, 23 June, 2008, 4:13 PM Hi, How can I partitioned an example vector like this print(myvector) [1] 30.9 60.1 70.0 73.0 75.0 83.9 93.1 97.6 98.8 113.9 into the following pairwise partition: PAIR1 part1 = 30.9 part2 = 60.1 70.0 73.0 75.0 83.9 93.1 97.6 98.8 113.9 PAIR2 part1 = 30.9 60.1 part2 = 70.0 73.0 75.0 83.9 93.1 97.6 98.8 113.9 PAIR9 part1 = 30.9 60.1 70.0 73.0 75.0 83.9 93.1 97.6 98.8 part2 = 113.9 I'm stuck with this kind of loop: __BEGIN__ # gexp is a Vector process_two_partition - function(gexp) { sort.gexp - sort(as.matrix(gexp)) print(sort.gexp) for (posb in 1:ncol(gexp)) { for (pose in 1:ncol(gexp)) { sp_b - pose+1 sp_e - ncol(gexp) # This two doesn't do what I want part1 - sort.gexp[posb:pose] part2 - sort.gexp[sp_b:sp_e] # we later want to process part1 and part2 separately } } } __END__ - Gundala Viswanath Jakarta - Indonesia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.