Re: [R] How to colorize the panel backgrounds of pairs()?
Dear Ilai, I tried to also adjust the diagonal panels. However, the variable names are not positioned correctly anymore. Do you know a solution? Cheers, Marius count - 0 mypanel - function(x, y, ...){ count - count+1 bg - if(count %in% c(1,4,9,12)) #FDFF65 else transparent ll - par(usr) rect(ll[1], ll[3], ll[2], ll[4], col=bg) points(x, y, cex=0.5) } mydiag.panel - function(x, ...){ ll - par(usr) rect(ll[1], ll[3], ll[2], ll[4], col=#FDFF65) } U - matrix(runif(4*500), ncol=4) pairs(U, panel=mypanel, diag.panel=mydiag.panel) Marius Hofert marius.hof...@math.ethz.ch writes: Indeed, precisely what I was looking for. Many thanks, Ilai. ilai ke...@math.montana.edu writes: par('bg') is not what you are looking for - it will set the bg of the whole graphic device, not panels. I think you want: count - 0 mypanel - function(x, y, ...){ count - count+1 ll- par('usr') if(count %in% c(1,4,9,12)) bg- #FDFF65 else bg- 'transparent' rect(ll[1],ll[3],ll[2],ll[4],col=bg) points(x, y, cex=0.5) } Cheers On Thu, Mar 1, 2012 at 4:49 PM, Marius Hofert marius.hof...@math.ethz.ch wrote: Dear expeRts, I would like to colorize the backgrounds of a pairs plot according to the respective panel number. Here is what I tried (without success): count - 0 mypanel - function(x, y, ...){ count - count+1 bg. - if(count %in% c(1,4,9,12)) #FDFF65 else NA points(x, y, cex=0.5, bg=bg) } U - matrix(runif(4*500), ncol=4) pairs(U, panel=mypanel) I also tried to set par(bg=bg.) before the call to points(), but that didn't work either. The only thing I found is that bg= can be used to fill certain plot symbols, but I would like to colorize the background of each panel, not the drawn circles. Cheers, Marius __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parameterization of Inverse Wishart distribution available in MCMCpack and bayesm libraries
On Mar 2, 2012, at 05:55 , ilai wrote: What do you make of the following from ?riwish riwish(v, S) snip v: Degrees of freedom (scalar). does a m/2 parameterization yield a scalar for, say, 3 dof ? Er, yes (scalar does not imply integer) As a general matter: 1. This is the Open Source world, you can read the actual function and see what it does. It might even say so in the comments. 2. You can investigate empirically -- the moments are known as a function of the parameters (check Wikipedia), so how about simulating a few thousand matrices and looking at the means and variances. (I don't do IW on a daily basis, but AFAICT, the two parametrizations have roughly the same mean, but a factor of two between variances, so it should be fairly easy to spot whether it is one or the other.) -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rscript example
On Thu, Mar 01, 2012 at 11:18:33PM -0800, statquant2 wrote: Hey Petr, ok I was thinking that R would handle the split by itself. I guess using eval we can even make arg1=val1 being executed by R. Hi. For executing the assignments, try myRscript.R containing args - commandArgs(TRUE); argmat - sapply(strsplit(args, =), identity) for (i in seq.int(length=ncol(argmat))) { assign(argmat[1, i], argmat[2, i]) } # available variables print(ls()) # print variables arg1, arg2, arg3 print(arg1) print(arg2) print(arg3) Then the command line ./myRscript.R arg1=aa arg2=22 arg3=cc yields [1] argmat args arg1 arg2 arg3 i [1] aa [1] 22 [1] cc Another option for setting some variables before executing an R script is to have two scripts, a shell script and an R script, containing shell script callR.sh #!/bin/bash PATH_TO_R/bin/R --vanilla EEE args - c($2,$3,$4,$5) source($1) EEE a trivial R script tst.R print(args) Then, ./callR.sh tst.R aa 22 cc leads to ... ... Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. args - c(aa,22,cc,) source(tst.R) [1] aa 22 cc Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.table issue with #
use the 'comment.char' parameter of read.table Sent from my iPad On Mar 1, 2012, at 17:51, Rui Barradas rui1...@sapo.pt wrote: Hello, The problem is that I get a the following error bacause anything after the # is ignored. Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 6 did not have 500 elements R thinks that line 6 has only 2 elements because of the #. Use 'readLines' instead, followed by 'strsplit'. In the example below the separator is a space. tc - textConnection( yes yes yes yes yes yes yes yes yes yes yes yes # yes yes ) #x - read.table(tc) # same error: line 3 did not have 5 elements x - readLines(tc) close(tc) strsplit(x, ) Hope this helps, Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/read-table-issue-with-tp4436554p4436737.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Why do my regular expressions require a double escape \\ to get a literal??
Hi, I was recently misfortunate enough to have to use regular expressions to sort out some data in R. I'm working on a data file which contains taxonomical data of bacteria in hierarchical order. A sample of this file can be generated using: tax.data - read.table(header=F, con - textConnection(' G9SS7BA01D15EC Bacteria(100)Cyanobacteria(84)unclassified G9SS7BA01C9UIRBacteria(100)Proteobacteria(94) Alphaproteobacteria(89) G9SS7BA01CM00DBacteria(100)Proteobacteria(99) Alphaproteobacteria(99) ')) close(con) What I try to do is to remove the parenthesis and the number inside (which could contain a decimal point) I assumed that the following command would solve it, but instead I got an error. tax.data - as.data.frame(apply(tax.data, 2, function(x) gsub('\(.*\)','',x))) Error: '\(' is an unrecognized escape in character string starting \( And it doesn't matter if I use perl = TRUE or not. To solve it I need to use a double escape sign '\\' before opening and closing the parenthesis: tax.data - as.data.frame(apply(tax.data, 2, function(x) gsub('\\(.*\\)','',x))) This yields the desired result but I wonder why it does that? No other regular expression system I'm used to (e.g. Perl, Shell) works like that. I'm using R 2.14 (but also R 2.10) and I get the same results on Ubuntu and win XP. I'd appreciate any explanation. Thanks in advance, baffled Roey -- Dr. Roey Angel Max-Planck-Institute for Terrestrial Microbiology Karl-von-Frisch-Strasse 10 D-35043 Marburg, Germany Office: +49 (0)6421/178-832 Mobile: +49 (0)176/612-785-88 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 回复: Bayesian Hidden Markov Models
Dear Oscar,  Thanks for your help.It's so nice of you to explain this package to me.  Best Regards,  James LAN åä»¶äººï¼ Oscar Rueda [via R] ml-node+s789695n4431468...@n4.nabble.com æ¶ä»¶äººï¼ monkeylan lanjin...@yahoo.com.cn åéæ¥æï¼ 2012å¹´2æ29æ¥, ææä¸, ä¸å 9:21 主é¢: Re: Bayesian Hidden Markov Models Dear James, The distances are normalized between zero and 1, so in your case all of them will be zero. You can check that with res$Dist.for.model And do Q.NH(summary(res)[[1]]$beta, x=0) To obtain the common transition matrix. Cheers, Oscar On 29/2/12 03:59, monkeylan [hidden email] wrote: Dear Oscar,  I am extremely grateful to your help and detailed explanation of the use of RJaCGH package. But, when runing the sample codes you listed, another issue I am a little confused is as following: After runing summary(res), I have got the estimation of the random matrix Beta: Parameters of the transition functions:     Normal  Gain Normal  0.000 4.258 Gain   2.001 0.000  But, the transition probabilty matrix Q based on the aboving Beta is more concerned in my modeling. Here, I am not sure how can I get the  matrix Q. I did try the Q.NH functions.However, Shoud I set the distance parameter x be 1 or 0? I am not sure.   If 1( according to my own understanding), the following result seems not reseanable.  tran-matrix(c(0,2.001,4.528,0),2,2) Q.NH(beta=tran, x=1)    [,1] [,2] [1,]  0.5  0.5 [2,]  0.5  0.5  Many thanks for your further help and time.  James Allan --- 12å¹´2æ28æ¥ï¼å¨äº, Oscar Rueda [via R] [hidden email] åéï¼ å件人: Oscar Rueda [via R] [hidden email] 主é¢: Re: Bayesian Hidden Markov Models æ¶ä»¶äºº: monkeylan [hidden email] æ¥æ: 2012å¹´2æ28æ¥,å¨äº,ä¸å7:02 Dear James, Basically you just need the values (y) and the positions (in your case it would be the index of the times series). The chromosome argument does not apply to your case so it can be a vector of ones. If the positions are at the same distance between (equally spaced) then the model will be homogeneous. So for example something like this would be enough: library(RJaCGH) y - c(rnorm(100,0,1), rnorm(20, 2, 1), rnorm(50, 0, 1)) Pos - 1:length(y) Chrom - rep(1, length(y)) res - RJaCGH(y=y, Pos=Pos, Chrom=Chrom) summary(res) However, it uses a Reversible Jump algorithm and therefore jumps between models with different hidden states. I would suggest you take a look at the vignette that comes with the package or the paper that is referenced there for specific details of the model it fits. Hope it helps, Oscar  On 28/2/12 04:52, monkeylan [hidden email] wrote: Dear Doctor Oscar,  Sorry for not noticing that you are the author of the RJaCGH package. But I noticed that hidden Markov model in your package is with non-homogeneous transition probabilities. Here in my work, the HMM is just a first-order homogeneous Markov chain, i.e. the  transition  matrix is constant.  So, Could you please tell me how can I adjust the R functions in your package to implement my analysis?  Best Regards,  James Allan --- 12å¹´2æ27æ¥ï¼å¨ä¸, Oscar Rueda [via R] [hidden email] åéï¼ å件人: Oscar Rueda [via R] [hidden email] 主é¢: Re: Bayesian Hidden Markov Models æ¶ä»¶äºº: monkeylan [hidden email] æ¥æ: 2012å¹´2æ27æ¥,å¨ä¸,ä¸å6:05 Dear James, Although designed for the analysis of copy number CGH microarrays, RJaCGH uses a Bayesian HMM model. Cheers, Oscar On 27/2/12 08:32, monkeylan [hidden email] wrote: Dear R buddies, Recently, I attempt to model the US/RMB Exchange rate log-return time series with a *Hidden Markov model (first order Markov Chain mixed Normal distributions). * I have applied the RHmm package to accomplish this task, but the results are not so satisfying. So, I would like to try a *Bayesian method *for the parameter estimation of the Hidden Markov model. Could anyone kindly tell me which R package can perform Bayesian estimation of the model? Many thanks for your help and time. Best Regards, James Allan -- View this message in context: http://r.789695.n4.nabble.com/Bayesian-Hidden-Markov-Models-tp4423946p442394 6 . html Sent from the R help mailing list archive at Nabble.com. __ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Oscar M. Rueda, PhD. Postdoctoral Research Fellow, Breast Cancer Functional Genomics. Cancer Research UK Cambridge Research Institute. Li Ka Shing Centre, Robinson Way.
Re: [R] Delete rows from data.frame matching a certain criteria
Hi my favourite would be test$v[which(test$pattern==1)]-NA Regards Petr Hi, On Mar 1, 2012, at 12:38 PM, Sarah Goslee wrote: Hi, On Thu, Mar 1, 2012 at 11:11 AM, mails mails00...@gmail.com wrote: Hello, consider the following data.frame: test - data.frame(n = c(1,2,3,4,5), v = c(6,5,7,5,3), pattern = c(1,1,NA,1,NA)) snip So basically the result should look like this: test n v pattern 1 1 NA 1 2 2 NA 1 3 3 7 NA 4 4 NA 1 5 5 3 NA So far, I solved it by creating subsets and using merge but it turns out to be super slow. Is there a way to do that with the apply function? Far too much work. What about: test$v - ifelse(test$pattern == 1, NA, v) test n v pattern 1 1 NA 1 2 2 NA 1 3 3 NA NA 4 4 NA 1 5 5 NA NA Actually that doesn't work because of those pesky missing values. You need test - transform(test, v = ifelse(pattern == 1 !is.na(pattern), NA, v)) Best, Ista -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Vector errors and missing values
Hi Hi, I am trying to run two Non-Gaussian regressions: logistic and probit. I am receiving two different errors when I try to run these regressions and I am not sure what they mean or how to fix my syntax. Here is the logistic regression error: Error in family$linkfun(mustart) : Argument mu must be a nonempty numeric vector Here is the probit regression error: Error in pmax(eta, -thresh) : cannot mix 0-length vectors with others Without any code and structure of your data you probably does not get much help. See the bottom of any email. PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Probably somehow your data are not as you expect. What str(yourdata) tells you? Is it in compliance with function you use? Regards Petr The dataset that I am using has some missing data. R puts NA values in place of the missing values. I am not sure if this is what is causing my vector problems or not. I have tried to use the 'data=na.omit(DataMiss)' in my glm as well as the command: 'na.action=na.exclude', but I am not sure if I am using the correct syntax because it is not working. Any help would be most appreciated. -- View this message in context: http://r.789695.n4.nabble.com/Vector-errors- and-missing-values-tp4437306p4437306.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why do my regular expressions require a double escape \\ to get a literal??
On 02-03-2012, at 09:36, Roey Angel wrote: Hi, I was recently misfortunate enough to have to use regular expressions to sort out some data in R. I'm working on a data file which contains taxonomical data of bacteria in hierarchical order. A sample of this file can be generated using: tax.data - read.table(header=F, con - textConnection(' G9SS7BA01D15EC Bacteria(100)Cyanobacteria(84)unclassified G9SS7BA01C9UIRBacteria(100)Proteobacteria(94) Alphaproteobacteria(89) G9SS7BA01CM00DBacteria(100)Proteobacteria(99) Alphaproteobacteria(99) ')) close(con) What I try to do is to remove the parenthesis and the number inside (which could contain a decimal point) I assumed that the following command would solve it, but instead I got an error. tax.data - as.data.frame(apply(tax.data, 2, function(x) gsub('\(.*\)','',x))) Error: '\(' is an unrecognized escape in character string starting \( And it doesn't matter if I use perl = TRUE or not. To solve it I need to use a double escape sign '\\' before opening and closing the parenthesis: tax.data - as.data.frame(apply(tax.data, 2, function(x) gsub('\\(.*\\)','',x))) This yields the desired result but I wonder why it does that? No other regular expression system I'm used to (e.g. Perl, Shell) works like that. I'm using R 2.14 (but also R 2.10) and I get the same results on Ubuntu and win XP. I'd appreciate any explanation. Section Character vectors in the R Intro manual. ?Quotes The regular expression is provided as a string to gsub. In strings there are escape sequences. To get the \ as a single \ to the regular expression parser it has to be \-ed in the string stage: \\ Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tobit Fixed Effects
Dear Felipe On 29 September 2011 14:10, Arne Henningsen arne.henning...@googlemail.com wrote: Hi Felipe On 25 September 2011 00:16, Felipe Nunes felipnu...@gmail.com wrote: Hi Arne, my problem persists. I am still using censReg [version - 0.5-7] to run a random effects model in my data (50,000 cases), but I always get the message. tob7 - censReg(transfers.cap ~ pt.pt + psdb.pt + pt.opp + pt.coa + psdb.coa + pib.cap + transfers.cap.lag + pib.cap + ifdm + log(populat) + mayor.vot.per + log(bol.fam.per+0.01) + factor(uf.name) + factor(year) - 1, left=0, right=Inf, method=BHHH, nGHQ=8, iterlim=1, data = pdata2) Error in maxNRCompute(fn = logLikAttr, fnOrig = fn, gradOrig = grad, hessOrig = hess, : NA in the initial gradient If I sent you my data set, could you try to help me? I have been struggling with that for two months now. Thanks for sending me your data set. With it, I was able to figure out, where the NAs in the (initial) gradients come from: when calculating the derivatives of the standard normal density function [d dnorm(x) / d x = - dnorm(x) * x], values for x that are larger than approximately 40 (in absolute terms) result in so small values (in absolute terms) that R rounds them to zero. Later, these derivatives are multiplied by some other values and then the logarithm is taken ... and multiplying any number by zero and taking the logarithms gives not a finite number :-( When *densities* of the standard normal distribution become too small, one can use dnorm(x,log=TRUE) and store the logarithm of the small number, which is much larger (in absolute terms) than the density and hence, is not rounded to zero. However, in the case of the *derivative* of the standard normal density function, this is more complicated as log( d dnorm(x) / d x ) = log( dnorm(x) ) + log( - x ) is not defined if x is positive. I will try to solve this problem by case distinction (x0 vs. x0). Or does anybody know a better solution? Finally(!), I have implemented this solution in the censReg() package. Some initial tests (including your model and data) show that the revised calculation of the gradient of the random effects panel data model for censored dependent variables is much more robust to rounding errors. The improved version of the censReg package is not yet via CRAN, but it is available at R-Forge: https://r-forge.r-project.org/R/?group_id=256 If you have further questions or feedback regarding the censReg package, please use a forum or tracker on the R-Forge site of the sampleSelection project: https://r-forge.r-project.org/projects/sampleselection/ Best wishes from Copenhagen, Arne -- Arne Henningsen http://www.arne-henningsen.name __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] speed up merge
Hello, I have a nasty loop that I have to do 11877 times. The only thing that slows it down really is this merge: xx1 = merge(dt,ua_rd,by.x=1,by.y= 'rt_date',all.x=T) Any ideas on how to speed it up? The output can't change materially (it works), but I'd like it to go faster. I'm looking at getting around the loop (not shown), but I'm trying to speed up the merge first. I'll post regarding the loop if nothing comes of this post. Here is some information on what type of stuff is going into the merge: class(ua_rd) [1] matrix dim(ua_rd) [1] 20 2 head(ua_rd) AName rt_date 2007-03-31 14066.580078125 2007-04-26 2007-06-30 14717 2007-07-19 2007-09-30 15528 2007-10-25 2007-12-31 17609 2008-01-24 2008-03-31 17168 2008-04-24 2008-06-30 17681 2008-07-17 class(dt) [1] character length(dt) [1] 1799 dt[1:10] [1] 2007-03-31 2007-04-01 2007-04-02 2007-04-03 2007-04-04 2007-04-05 2007-04-06 2007-04-07 [9] 2007-04-08 2007-04-09 thanks, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cloning R Installation Across Multiple Computers
I would like to set up identical R installations, with the same packages, on multiple computers and with minimal interaction by users. Ideally, I would like to have an installation script that the user can just run that will set up everything, including R itself and base packages. Standard packages would need to include ggplot2 and its dependencies, Rcmdr and its dependencies and some Rcmdr plug-ins. Best of all would be to have the script include JGR and Deducer in the installation. This would be set up on Windows XP 32-bit, and all packages have to install from local .zip files; R cannot get to the package servers through our firewall. I could do the setup once, manually, on one machine, but I'm not sure if I can simply copy the R installation directory to other computers and have it still work. I don't think that the Windows registry wouldn't be configured if I did it this way, and I'm not sure that JGR would be correctly installed. Another approach would to have the user install R, then copy the R directory tree from another computer that has been set up, and finally JGR would have to be installed by the user. I'd like to roll this all into one, simple step that the users don't have to worry about screwing up. I recall that there has been some discussion of this on r-help in the past, but I seem to be using the wrong search terms as I cannot find anything. Any suggestions for, or pointers to, solutions will be much appreciated. Thank you, Tom Hopper [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speed up merge
Hi Ben, It seems you merge a matrix and a vector. As far as I understand the first thing merge does is convert these to data.frame. Is it possible to make the preceding steps give data frames? Regards, Kees On Fri, Mar 2, 2012 at 11:24 AM, Ben quant ccqu...@gmail.com wrote: Hello, I have a nasty loop that I have to do 11877 times. The only thing that slows it down really is this merge: xx1 = merge(dt,ua_rd,by.x=1,by.y= 'rt_date',all.x=T) Any ideas on how to speed it up? The output can't change materially (it works), but I'd like it to go faster. I'm looking at getting around the loop (not shown), but I'm trying to speed up the merge first. I'll post regarding the loop if nothing comes of this post. Here is some information on what type of stuff is going into the merge: class(ua_rd) [1] matrix dim(ua_rd) [1] 20 2 head(ua_rd) AName rt_date 2007-03-31 14066.580078125 2007-04-26 2007-06-30 14717 2007-07-19 2007-09-30 15528 2007-10-25 2007-12-31 17609 2008-01-24 2008-03-31 17168 2008-04-24 2008-06-30 17681 2008-07-17 class(dt) [1] character length(dt) [1] 1799 dt[1:10] [1] 2007-03-31 2007-04-01 2007-04-02 2007-04-03 2007-04-04 2007-04-05 2007-04-06 2007-04-07 [9] 2007-04-08 2007-04-09 thanks, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speed up merge
On Fri, Mar 02, 2012 at 03:24:20AM -0700, Ben quant wrote: Hello, I have a nasty loop that I have to do 11877 times. Are you completely sure about that? I often find my self avoiding loops-by-row by constructing vectors of which rows that fullfil a condition, and then creating new vectors out of that vector. If you elaborate on the problem, perhaps we could find a way to avoid the loops altogether? Mostly as a note to self, I wrote http://code.cjb.net/vectors-instead-of-loop.html, it might be understood by others too, but I'm not sure. -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cloning R Installation Across Multiple Computers
On 12-03-02 5:32 AM, Tom Hopper wrote: I would like to set up identical R installations, with the same packages, on multiple computers and with minimal interaction by users. Ideally, I would like to have an installation script that the user can just run that will set up everything, including R itself and base packages. Standard packages would need to include ggplot2 and its dependencies, Rcmdr and its dependencies and some Rcmdr plug-ins. Best of all would be to have the script include JGR and Deducer in the installation. This would be set up on Windows XP 32-bit, and all packages have to install from local .zip files; R cannot get to the package servers through our firewall. I could do the setup once, manually, on one machine, but I'm not sure if I can simply copy the R installation directory to other computers and have it still work. I don't think that the Windows registry wouldn't be configured if I did it this way, and I'm not sure that JGR would be correctly installed. Another approach would to have the user install R, then copy the R directory tree from another computer that has been set up, and finally JGR would have to be installed by the user. I'd like to roll this all into one, simple step that the users don't have to worry about screwing up. I recall that there has been some discussion of this on r-help in the past, but I seem to be using the wrong search terms as I cannot find anything. Any suggestions for, or pointers to, solutions will be much appreciated. See the Installation and Administration manual, in particular the section (3.1.8, I think) on building the Inno Setup installer. You may even find the MSI installer more convenient, but that code is tested less, and is unsupported. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Call the Standard Error and t-test probability in linear regression
Hello, I run a linear regression I get the summary, e.g.: summary(lm.r) Call: lm(formula = signal ~ conc) Residuals: 12 3 4 5 0.4 -1.0 1.6 -1.80.8 Coefficients: Estimate Std. Errort valuePr(|t|) (Intercept) 3.61.232882.92 0.0615 . conc 1.94000 0.05033 38.54 3.84e-05 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.592 on 3 degrees of freedom Multiple R-Squared: 0.998, Adjusted R-squared: 0.9973 F-statistic: 1486 on 1 and 3 DF, p-value: 3.842e-0 I would like to call the probability of the t-test only in order to use it separately. For example I 'd like to get: Pr-3.84e-05 Similarly I want to call the standard error of the parameters and the function: SEconc-0.05033 I don't know how to do this. Any help? Regards, Ioanna __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Call the Standard Error and t-test probability in linear regression
On 02-Mar-2012 IOANNA wrote: Hello, I run a linear regression I get the summary, e.g.: Call: lm(formula = signal ~ conc) Residuals: 12 3 4 5 0.4 -1.0 1.6 -1.80.8 Coefficients: Estimate Std. Errort valuePr(|t|) (Intercept) 3.61.232882.92 0.0615 . conc 1.94000 0.05033 38.54 3.84e-05 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.592 on 3 degrees of freedom Multiple R-Squared: 0.998, Adjusted R-squared: 0.9973 F-statistic: 1486 on 1 and 3 DF, p-value: 3.842e-0 I would like to call the probability of the t-test only in order to use it separately. For example I 'd like to get: Pr-3.84e-05 Similarly I want to call the standard error of the parameters and the function: SEconc-0.05033 I don't know how to do this. Any help? Regards, Ioanna Hi Ioanna, If you look at '?summary.lm' and read the section Value, you will see that the returned value is a list with several components, one of which is: coefficients: a p x 4 matrix with columns for the estimated coefficient, its standard error, t-statistic and corresponding (two-sided) p-value. Aliased coefficients are omitted. This is effectively as displayed by summary(lm...)). So your Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 3.6 1.23288 2.920.0615 . conc 1.94000 0.0503338.54 3.84e-05 *** (apart from the significance codes . and ***) are the elements in this p=2 x 4 matrix. Hence summary(lm.r)$coef would give the full 4x4 matrix (you can abbreviate coefficients to coef), and so summary(lm.r)$coef[2,4] will give you the P-value for conc, and summary(lm.r)$coef[2,2] will give the SE of the estimate of conc. And so on. Ted. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 02-Mar-2012 Time: 13:09:03 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 3d surface plot (ideally using rgl package)?
It is by no means clear what the peaks function does or if it has any R equivalent, but perhaps looking at demo(rgl) will get you started. After that, you should probably show what you've tried (at least as far as replicating the calculation aspects). Michael On Fri, Mar 2, 2012 at 1:32 AM, e-mail athula.herath athula.her...@ntlworld.com wrote: Dear HelpeRs, I would be grateful for anybody who might help to produce the following plot (the code for matlab is below) using the rgl package of R? [t,r] = meshgrid(linspace(0,2*pi,361),linspace(-4,4,101)); [x,y] = pol2cart(t,r); P = peaks(x,y); figure('color','white'); polarplot3d(P,'colordata',gradient(P)); view([-18 72]); You may see the code and plot output at: https://plus.google.com/u/0/109789409461372488563/posts/YfcrsMhkjyf Many Thanks A. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Call the Standard Error and t-test probability in linear regression
On 02-Mar-2012 IOANNA wrote: Hello, I run a linear regression I get the summary, e.g.: Call: lm(formula = signal ~ conc) Residuals: 12 3 4 5 0.4 -1.0 1.6 -1.80.8 Coefficients: Estimate Std. Errort valuePr(|t|) (Intercept) 3.61.232882.92 0.0615 . conc 1.94000 0.05033 38.54 3.84e-05 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.592 on 3 degrees of freedom Multiple R-Squared: 0.998, Adjusted R-squared: 0.9973 F-statistic: 1486 on 1 and 3 DF, p-value: 3.842e-0 I would like to call the probability of the t-test only in order to use it separately. For example I 'd like to get: Pr-3.84e-05 Similarly I want to call the standard error of the parameters and the function: SEconc-0.05033 I don't know how to do this. Any help? Regards, Ioanna Hi Ioanna, If you look at '?summary.lm' and read the section Value, you will see that the returned value is a list with several components, one of which is: coefficients: a p x 4 matrix with columns for the estimated coefficient, its standard error, t-statistic and corresponding (two-sided) p-value. Aliased coefficients are omitted. This is effectively as displayed by summary(lm...)). So your Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 3.6 1.23288 2.920.0615 . conc 1.94000 0.0503338.54 3.84e-05 *** (apart from the significance codes . and ***) are the elements in this p=2 x 4 matrix. indeed. Hence summary(lm.r)$coef would give the full 4x4 matrix (you can abbreviate coefficients to coef), Yes. In the present case, I tell all my students (:= everyone who wants to learn .. ;-) to use coef(summary(lm.r)) instead of summary(lm.r)$coef because using the coef() [or coefficients()] method will work with many more models and situations, notably also for cases where the (summary of the) fitted model is not a list (but e.g. a S4 class, reference class, ..), or does not store these in exactly this location. Martin and so summary(lm.r)$coef[2,4] will give you the P-value for conc, and summary(lm.r)$coef[2,2] will give the SE of the estimate of conc. And so on. Ted. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 02-Mar-2012 Time: 13:09:03 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Vector errors and missing values
Hi Petr! Thank you for responding to my post. I checked out all my variables in the way you suggested and they are all in integer form, but here are many missing values in some of my vectors, denoted with NA. So, they are in the correct form, I am just wondering if there is something else I need to do to the NAs to be able to run my regressions. Because as it is now, R may not be taking the NAs into account and that would cause a mismatch in vector length. But I don't think that it is expected that all your predictor variables and your outcome variable be the same length. I would imagine that is often not the case for most people when they collect data. -- Jessica -- View this message in context: http://r.789695.n4.nabble.com/Vector-errors-and-missing-values-tp4437306p4438415.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to change or copy to another the names of models
Hi help-list I try to better explain my problem. My problem is below. For each cycle of (n) I need to save the model (model.mlp), because the neuralnet is unique and when I had choose the best architecture of the neuralnet, and need the neuralnet (the model.mlp) that has already been trainned. I can´t train again, because the result will be not the same. I thought that I can put a line at the end of the cycles to copy the model.mlp to another name and save it as a model for each one of the cycle. I hope I have better explain my problem. thanks in advance ## # built a model mlp (neuralnet) and training it changing the number of hidden neurons (1 a 30) for (n in 1:30) { print(n) model.mlp - mlp(padroes$inputsTrain, padroes$targetsTrain, size = n, initFunc=Randomize_Weights, initFuncParams=c(-0.5, 0.5), learnFunc=Std_Backpropagation, learnFuncParams = c(0.2, 0.1), updateFunc=Topological_Order, hiddenActFunc=Act_Logistic, shufflePatterns=TRUE, linOut=TRUE, maxit = 3000, inputsTest = padroes$inputsTest, targetsTest = padroes$targetsTest) } 2012/3/1 Waldir de Carvalho Junior waldi...@gmail.com Hi I would like to know how I can change the name of a model for each trainning cycle of a model. I work with the RSNNS package and to build a neural network, I used : for (i in 5:30) model_ANN - mlp(X, Y, size=n,) # where size is the number of neurons in the hidden layer but I need to save each time that the model that is build (the end of each cycle), e.g., when i = 5, I need to save the model with a especific name, when i = 6, also I need to save the model with another name How i am doing, i am saving only the last model with n = 30 and with the name model_ANN Question how can I change the name of the model (model_ANN) at each end of cycle of i values? I have already tryied copy, save, rename,.. but unsuccessfully thanks for any answer -- Waldir de Carvalho Junior Pedologia/Pedologue Pesquisador/Chercheur Embrapa Solos/INRA - UMR- LISAH [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Statistical Histograms in R
Hi, I'm wondering if anybody could possibly help me? I have a table with 5 tab-delimited columns. Each column has 'e-value' scores for 5 different proteins. I'd like to plot a distribution curve using hist() for the 5 different proteins and show the 5 distribution curves on the same graph in different colours. In the case, E-values will be the X-axis and frequency will be the Y-axis. Is this at all possible? Thanks. -- View this message in context: http://r.789695.n4.nabble.com/Statistical-Histograms-in-R-tp4438336p4438336.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plotting standard deviation of multivariate normal distribution (preferred in rgl package)
Dear R colleagues, for a statistics tutorial I want to develop a nice 3d-graphic of the well known target comparison/analogy of accuracy and precision (see i.e. http://en.wikipedia.org/wiki/Accuracy_and_precision for a simple hand made 2d graphic). The code for a really beautiful graphic is already provided as demo(bivar) in the rgl package (for a picture see i.e http://rgl.neoscientists.org/gallery.shtml right upper corner). Now I'd like to add the standard deviation to the 3d plot. Unfortunately I couldn't figure out how to do that. What I did so far is: Assuming you have executed the code of the demo mentioned above you can plot a 2d scatter plot of the data (the shots on the target) via a simple plot(x,y) and add a contour plot via par(new=TRUE) contour(denobj, nlevels=8) So the contour levels are about what I am looking for but... 1. how to make contour plot display exactly standard deviations as contours and not arbitrary levels? 2. how to project the standard deviation contours on the 3d surface? Probably there are (as always) lots of different solutions as well. Id' appreciate any kind of help very much! Greetings from Munich, Felix __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to change or copy to another the names of models
Hi Waldir I think this is easier via an lappy() lapply(1:30, function(x) mlp(...your settings here, including size=x...) ) Regards, Kees On Fri, Mar 2, 2012 at 2:36 PM, Waldir de Carvalho Junior waldi...@gmail.com wrote: Hi help-list I try to better explain my problem. My problem is below. For each cycle of (n) I need to save the model (model.mlp), because the neuralnet is unique and when I had choose the best architecture of the neuralnet, and need the neuralnet (the model.mlp) that has already been trainned. I can´t train again, because the result will be not the same. I thought that I can put a line at the end of the cycles to copy the model.mlp to another name and save it as a model for each one of the cycle. I hope I have better explain my problem. thanks in advance ## # built a model mlp (neuralnet) and training it changing the number of hidden neurons (1 a 30) for (n in 1:30) { print(n) model.mlp - mlp(padroes$inputsTrain, padroes$targetsTrain, size = n, initFunc=Randomize_Weights, initFuncParams=c(-0.5, 0.5), learnFunc=Std_Backpropagation, learnFuncParams = c(0.2, 0.1), updateFunc=Topological_Order, hiddenActFunc=Act_Logistic, shufflePatterns=TRUE, linOut=TRUE, maxit = 3000, inputsTest = padroes$inputsTest, targetsTest = padroes$targetsTest) } 2012/3/1 Waldir de Carvalho Junior waldi...@gmail.com Hi I would like to know how I can change the name of a model for each trainning cycle of a model. I work with the RSNNS package and to build a neural network, I used : for (i in 5:30) model_ANN - mlp(X, Y, size=n,) # where size is the number of neurons in the hidden layer but I need to save each time that the model that is build (the end of each cycle), e.g., when i = 5, I need to save the model with a especific name, when i = 6, also I need to save the model with another name How i am doing, i am saving only the last model with n = 30 and with the name model_ANN Question how can I change the name of the model (model_ANN) at each end of cycle of i values? I have already tryied copy, save, rename,.. but unsuccessfully thanks for any answer -- Waldir de Carvalho Junior Pedologia/Pedologue Pesquisador/Chercheur Embrapa Solos/INRA - UMR- LISAH [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why do my regular expressions require a double escape \\ to get a literal??
Roey, you imply that this is unusual in implementations of regex, yet some of the oldest applications using regex out there are sed or awk, where extra quoting is so common that some people don't recognize regex patterns that are missing this extra level of quoting. Sigh. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Berend Hasselman b...@xs4all.nl wrote: On 02-03-2012, at 09:36, Roey Angel wrote: Hi, I was recently misfortunate enough to have to use regular expressions to sort out some data in R. I'm working on a data file which contains taxonomical data of bacteria in hierarchical order. A sample of this file can be generated using: tax.data - read.table(header=F, con - textConnection(' G9SS7BA01D15EC Bacteria(100)Cyanobacteria(84)unclassified G9SS7BA01C9UIRBacteria(100)Proteobacteria(94) Alphaproteobacteria(89) G9SS7BA01CM00DBacteria(100)Proteobacteria(99) Alphaproteobacteria(99) ')) close(con) What I try to do is to remove the parenthesis and the number inside (which could contain a decimal point) I assumed that the following command would solve it, but instead I got an error. tax.data - as.data.frame(apply(tax.data, 2, function(x) gsub('\(.*\)','',x))) Error: '\(' is an unrecognized escape in character string starting \( And it doesn't matter if I use perl = TRUE or not. To solve it I need to use a double escape sign '\\' before opening and closing the parenthesis: tax.data - as.data.frame(apply(tax.data, 2, function(x) gsub('\\(.*\\)','',x))) This yields the desired result but I wonder why it does that? No other regular expression system I'm used to (e.g. Perl, Shell) works like that. I'm using R 2.14 (but also R 2.10) and I get the same results on Ubuntu and win XP. I'd appreciate any explanation. Section Character vectors in the R Intro manual. ?Quotes The regular expression is provided as a string to gsub. In strings there are escape sequences. To get the \ as a single \ to the regular expression parser it has to be \-ed in the string stage: \\ Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Statistical Histograms in R
Have a look at http://had.co.nz/ggplot2/stat_density.html You'll find some exampled and the code to generate them. ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie Kwaliteitszorg / team Biometrics Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens SMcG Verzonden: vrijdag 2 maart 2012 13:49 Aan: r-help@r-project.org Onderwerp: [R] Statistical Histograms in R Hi, I'm wondering if anybody could possibly help me? I have a table with 5 tab-delimited columns. Each column has 'e-value' scores for 5 different proteins. I'd like to plot a distribution curve using hist() for the 5 different proteins and show the 5 distribution curves on the same graph in different colours. In the case, E-values will be the X-axis and frequency will be the Y-axis. Is this at all possible? Thanks. -- View this message in context: http://r.789695.n4.nabble.com/Statistical-Histograms-in-R-tp4438336p4438336.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.table issue with #
The # is the default comment character in read.table(), but that can easily be changed: tc - textConnection( + yes yes yes yes yes + yes yes yes yes yes + yes yes # yes yes + ) x - read.table(tc, comment.char=) x V1 V2 V3 V4 V5 1 yes yes yes yes yes 2 yes yes yes yes yes 3 yes yes # yes yes There's insufficient context here to know if that was actually the original problem, but is an alternate solution for what Rui proposed. Sarah On Thu, Mar 1, 2012 at 5:51 PM, Rui Barradas rui1...@sapo.pt wrote: Hello, The problem is that I get a the following error bacause anything after the # is ignored. Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 6 did not have 500 elements R thinks that line 6 has only 2 elements because of the #. Use 'readLines' instead, followed by 'strsplit'. In the example below the separator is a space. tc - textConnection( yes yes yes yes yes yes yes yes yes yes yes yes # yes yes ) #x - read.table(tc) # same error: line 3 did not have 5 elements x - readLines(tc) close(tc) strsplit(x, ) Hope this helps, Rui Barradas -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Vector errors and missing values
Hi Hi Petr! Thank you for responding to my post. I checked out all my variables in the way you suggested and they are all in integer form, but here are many missing values in some of my vectors, denoted with NA. So, they are in the correct form, I am just wondering if there is something else I need to do to the NAs to be able to run my regressions. Because as it is now, R may not be taking the NAs into account and that would cause a No. R functions usually react to NA with correct and in help pages specified ways. I do not know which functions you used and how. You still fail to provide any reasonable info. mismatch in vector length. But I don't think that it is expected that all your predictor variables and your outcome variable be the same length. I Without context I recall that you used some kind of regression. How do you suppose any regression or model can be executed if variables have different length? would imagine that is often not the case for most people when they collect data. That is why there are missing values used. And regressins can usually handle missing values based on na.action parameter value smoothly. At least for me and for thousands of R users for more than 10 years. Therefore I suspect that problem is either in your data or in the way you use them. But if you fail to provide some code and data you stay alone for resolving your problems. Regards Petr -- Jessica -- View this message in context: http://r.789695.n4.nabble.com/Vector-errors- and-missing-values-tp4437306p4438415.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why do my regular expressions require a double escape \\ to get a literal??
On 02-03-2012, at 14:13, Roey Angel wrote: Hi Bernard, thanks for the quick reply. Of course, I understand that an escape is needed because parenthesis are reserved symbols in regular expressions. My problem is that if I just use \( I get the error: Error: '\(' is an unrecognized escape in character string starting \( so in order to get a literal ( I need to use \\( which is odd cause I've never encountered that in any other language and also all the R manuals dont mention that. It is not odd as the previous poster has already mentioned. I have encountered this (e.g. awk). You need the \\ because the expression between tour quotes is interpreted twice: once and first as a character string (in which \( is illegal but \\ is legal) and then as a regular expression in which you want to match a literal ( and ) which must be escaped in the regular expression since they are meta characters. If you don't like doing that (the \\) use this instead as.data.frame(apply(tax.data, 2, function(x) gsub('[(].*[)]','',x))) i.e. put the ( and ) in a character class. Berend On 02-03-2012, at 09:36, Roey Angel wrote: Hi, I was recently misfortunate enough to have to use regular expressions to sort out some data in R. I'm working on a data file which contains taxonomical data of bacteria in hierarchical order. A sample of this file can be generated using: tax.data- read.table(header=F, con- textConnection(' G9SS7BA01D15EC Bacteria(100)Cyanobacteria(84)unclassified G9SS7BA01C9UIRBacteria(100)Proteobacteria(94) Alphaproteobacteria(89) G9SS7BA01CM00DBacteria(100)Proteobacteria(99) Alphaproteobacteria(99) ')) close(con) What I try to do is to remove the parenthesis and the number inside (which could contain a decimal point) I assumed that the following command would solve it, but instead I got an error. tax.data- as.data.frame(apply(tax.data, 2, function(x) gsub('\(.*\)','',x))) Error: '\(' is an unrecognized escape in character string starting \( And it doesn't matter if I use perl = TRUE or not. To solve it I need to use a double escape sign '\\' before opening and closing the parenthesis: tax.data- as.data.frame(apply(tax.data, 2, function(x) gsub('\\(.*\\)','',x))) This yields the desired result but I wonder why it does that? No other regular expression system I'm used to (e.g. Perl, Shell) works like that. I'm using R 2.14 (but also R 2.10) and I get the same results on Ubuntu and win XP. I'd appreciate any explanation. Section Character vectors in the R Intro manual. ?Quotes The regular expression is provided as a string to gsub. In strings there are escape sequences. To get the \ as a single \ to the regular expression parser it has to be \-ed in the string stage: \\ Berend angel.vcf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Memory issue. XXXX
Hi everyone, Any ideas on troubleshooting this memory issue: d1-read.csv(arrears.csv) Error: cannot allocate vector of size 77.3 Mb In addition: Warning messages: 1: In class(data) - data.frame : Reached total allocation of 1535Mb: see help(memory.size) 2: In class(data) - data.frame : Reached total allocation of 1535Mb: see help(memory.size) 3: In class(data) - data.frame : Reached total allocation of 1535Mb: see help(memory.size) 4: In class(data) - data.frame : Reached total allocation of 1535Mb: see help(memory.size) Thanks! Dan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory issue. XXXX
Let's see... You could delete objects from your R session. You could buy more RAM. You could see help(memory.size). You could try googling to see how others have dealt with memory management in R, a process which turns up useful information like this: http://www.r-bloggers.com/memory-management-in-r-a-few-tips-and-tricks/ You could provide the information on your system requested in the posting guide. Sarah On Fri, Mar 2, 2012 at 9:57 AM, Dan Abner dan.abne...@gmail.com wrote: Hi everyone, Any ideas on troubleshooting this memory issue: d1-read.csv(arrears.csv) Error: cannot allocate vector of size 77.3 Mb In addition: Warning messages: 1: In class(data) - data.frame : Reached total allocation of 1535Mb: see help(memory.size) 2: In class(data) - data.frame : Reached total allocation of 1535Mb: see help(memory.size) 3: In class(data) - data.frame : Reached total allocation of 1535Mb: see help(memory.size) 4: In class(data) - data.frame : Reached total allocation of 1535Mb: see help(memory.size) Thanks! Dan -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] solnp Hessian problem in some datasets not in others
Dear all, Sorry to insist in this, but I am passing really bad times trying to solve the problem. Just to remember you: I am tryng to solve a nonlinear optimization probel using the solnp function. I have different datasets. For the smaller I get full solutions, for the bigger I got an error message stating: Iter: 1 fn: 101.8017 Pars: 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 solnp-- Solution not reliableProblem Inverting Hessian. Warning messages: 1: In p0 * vscale[(neq + 2):(nc + np + 1)] : longer object length is not a multiple of shorter object length 2: In cbind(temp, funv) : number of rows of result is not a multiple of vector length (arg 1) Anyone knows what may be the reason? Just remembering that the same problem runs OK for smaller datasets. Thanks in advance, Diogo André Portugal __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subseting a data frame
HI, this is my problem I want to subset this file df, using only unique df$exon printing the line once even if df$exon appear several times: unique(df$exon) will show me the unique exons If I try to print only the unique exon lines with df[unique(df$exon),] -this doesn't print only the unique ones :( could you help? thanks Nat exon size chr start end 413077 ChrX_133594175_133594368_HPRT1 193 ChrX 133594175 133594368 413270 ChrX_133594183_133594368_HPRT1 185 ChrX 133594183 133594368 413455 ChrX_133594381_133594565_HPRT1 184 ChrX 133594381 133594565 413639 ChrX_133607389_133607495_HPRT1 106 ChrX 133607389 133607495 413745 ChrX_133607389_133607495_HPRT1 106 ChrX 133607389 133607495 413851 ChrX_133607404_133607495_HPRT1 91 ChrX 133607404 133607495 413942 ChrX_133609211_133609394_HPRT1 183 ChrX 133609211 133609394 414125 ChrX_133609211_133609394_HPRT1 183 ChrX 133609211 133609394 414308 ChrX_133620495_133620560_HPRT1 65 ChrX 133620495 133620560 414373 ChrX_133620495_133620560_HPRT1 65 ChrX 133620495 133620560 414438 ChrX_133620692_133620696_HPRT14 ChrX 133620692 133620696 414442 ChrX_133624218_133624235_HPRT1 17 ChrX 133624218 133624235 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] solnp Hessian problem in some datasets not in others
On 02-03-2012, at 16:12, Diogo Alagador wrote: Dear all, Sorry to insist in this, but I am passing really bad times trying to solve the problem. Just to remember you: I am tryng to solve a nonlinear optimization probel using the solnp function. I have different datasets. For the smaller I get full solutions, for the bigger I got an error message stating: Iter: 1 fn: 101.8017 Pars: 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 0.21000 solnp-- Solution not reliableProblem Inverting Hessian. Warning messages: 1: In p0 * vscale[(neq + 2):(nc + np + 1)] : longer object length is not a multiple of shorter object length 2: In cbind(temp, funv) : number of rows of result is not a multiple of vector length (arg 1) Anyone knows what may be the reason? Just remembering that the same problem runs OK for smaller datasets. No. You have not provided enough information. I know nothing about solnp but I am quite prepared to investigate. But from the warning message I would guess that [(neq + 2):(nc + np + 1)] is simply incorrect in your specific case. But no data or no description of your data, no function, no reproducible example === No help. You should really try to be more informative. Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with segmented package
Thank you Vito for your help. Works very nice. Have a nice day, Phil -- View this message in context: http://r.789695.n4.nabble.com/Help-with-segmented-package-tp4435550p4438589.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Vector errors and missing values
Here is my code: ##Centering predictors### verbal.ability_C - verbal.ability - mean(verbal.ability) children_C - children - mean(children) age_C - age - mean(age) education_C - education - mean(education) work.from.home.frequency_C - work.from.home.frequency - mean(work.from.home.frequency) religious.orientation_C - religious.orientation - mean(religious.orientation) political.orientation_C - political.orientation - mean(political.orientation) sexual.orientation_C - sexual.orientation -mean(sexual.orientation) ## Logistic Regression### logistic.model - glm( fire.communist.teacher ~ age_C + sex + children_C + currently.married + religious.orientation_C + political.orientation_C, binomial(logit) ) summary( logistic.model ) exp( coefficients( logistic.model ) ) ###Probit/Binomial Regression### install.packages(MASS) library(MASS) probit.model - polr( as.factor(verbal.ability) ~ education_C + children_C + currently.married + work.from.home.frequency_C, method=probit) summary( probit.model) Here is the output with I look at my data using the str(my.data) command: 'data.frame': 2044 obs. of 13 variables: $ sexual.orientation : int -1 -1 NA NA NA NA NA -1 NA NA ... $ political.orientation : int 5 5 6 0 3 6 4 5 6 NA ... $ religious.orientation : int 4 1 4 4 4 1 2 4 4 4 ... $ weekly.hours.on.internet: int 3 20 NA NA NA NA NA NA NA 0 ... $ verbal.ability : int 6 9 NA 3 NA NA NA 8 NA NA ... $ work.from.home.frequency: int 3 4 NA NA NA NA 1 NA 1 1 ... $ fire.communist.teacher : int NA NA 1 NA 0 NA 0 0 0 1 ... $ currently.married : int -1 -1 -1 -1 1 -1 -1 -1 1 -1 ... $ children: int 0 0 3 5 8 2 1 1 3 2 ... $ education : int 16 16 8 10 0 6 16 15 14 14 ... $ partnrs5: int 6 5 -1 99 -1 -1 -1 0 -1 -1 ... $ age : int 31 23 71 82 78 40 46 80 31 99 ... $ sex : int 1 -1 -1 -1 -1 1 -1 -1 -1 -1 ... I tried using the na.action command by putting right after the 'binomial(logit)' syntax, but it didn't work. I am not sure if I am using it properly though. So, I have tried this syntax to deal with the missing data: logistic.model - glm( fire.communist.teacher ~ age_C + sex + children_C + currently.married + religious.orientation_C + political.orientation_C, binomial(logit), na.action=na.exclude ) as well as: logistic.model - glm( fire.communist.teacher ~ age_C + sex + children_C + currently.married + religious.orientation_C + political.orientation_C, binomial(logit), na.action=na.exclude, data=na.omit(DataMiss) ) -- View this message in context: http://r.789695.n4.nabble.com/Vector-errors-and-missing-values-tp4437306p4438678.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] acf() plot of matrix cuts y-axis labels
Hello all, I found a funny problem with y-axis labels when plotting acf(matrix) - the labels are too close to one of the margins and cut in half. Here's the problem: test-matrix(rnorm(200),ncol=4) acf(test) This doesn't fix the problem: test-matrix(rnorm(200),ncol=4) par(mar=c(3,3,2,0.2),oma=c(0,0,0,0)) acf(test) This does fix the margin. I understand why, but not sure why ONLY this will work? test-matrix(rnorm(200),ncol=4) acf(test,mar=c(3,3,2,0.2),oma=c(0,0,0,0)) __ Win xp sp3, R.version.string [1] R version 2.14.1 (2011-12-22) Thanks Michael Folkes [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to colorize the panel backgrounds of pairs()?
Okay, one simply has to use label.pos=0.5 in pairs() to get the correct behavior. On 2012-03-02, at 09:10 , Marius Hofert wrote: Dear Ilai, I tried to also adjust the diagonal panels. However, the variable names are not positioned correctly anymore. Do you know a solution? Cheers, Marius count - 0 mypanel - function(x, y, ...){ count - count+1 bg - if(count %in% c(1,4,9,12)) #FDFF65 else transparent ll - par(usr) rect(ll[1], ll[3], ll[2], ll[4], col=bg) points(x, y, cex=0.5) } mydiag.panel - function(x, ...){ ll - par(usr) rect(ll[1], ll[3], ll[2], ll[4], col=#FDFF65) } U - matrix(runif(4*500), ncol=4) pairs(U, panel=mypanel, diag.panel=mydiag.panel) Marius Hofert marius.hof...@math.ethz.ch writes: Indeed, precisely what I was looking for. Many thanks, Ilai. ilai ke...@math.montana.edu writes: par('bg') is not what you are looking for - it will set the bg of the whole graphic device, not panels. I think you want: count - 0 mypanel - function(x, y, ...){ count - count+1 ll- par('usr') if(count %in% c(1,4,9,12)) bg- #FDFF65 else bg- 'transparent' rect(ll[1],ll[3],ll[2],ll[4],col=bg) points(x, y, cex=0.5) } Cheers On Thu, Mar 1, 2012 at 4:49 PM, Marius Hofert marius.hof...@math.ethz.ch wrote: Dear expeRts, I would like to colorize the backgrounds of a pairs plot according to the respective panel number. Here is what I tried (without success): count - 0 mypanel - function(x, y, ...){ count - count+1 bg. - if(count %in% c(1,4,9,12)) #FDFF65 else NA points(x, y, cex=0.5, bg=bg) } U - matrix(runif(4*500), ncol=4) pairs(U, panel=mypanel) I also tried to set par(bg=bg.) before the call to points(), but that didn't work either. The only thing I found is that bg= can be used to fill certain plot symbols, but I would like to colorize the background of each panel, not the drawn circles. Cheers, Marius __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subseting a data frame
I believe you want the duplicated() function. Michael On Mar 2, 2012, at 10:19 AM, nathalie n...@sanger.ac.uk wrote: HI, this is my problem I want to subset this file df, using only unique df$exon printing the line once even if df$exon appear several times: unique(df$exon) will show me the unique exons If I try to print only the unique exon lines with df[unique(df$exon),] -this doesn't print only the unique ones :( could you help? thanks Nat exon size chr start end 413077 ChrX_133594175_133594368_HPRT1 193 ChrX 133594175 133594368 413270 ChrX_133594183_133594368_HPRT1 185 ChrX 133594183 133594368 413455 ChrX_133594381_133594565_HPRT1 184 ChrX 133594381 133594565 413639 ChrX_133607389_133607495_HPRT1 106 ChrX 133607389 133607495 413745 ChrX_133607389_133607495_HPRT1 106 ChrX 133607389 133607495 413851 ChrX_133607404_133607495_HPRT1 91 ChrX 133607404 133607495 413942 ChrX_133609211_133609394_HPRT1 183 ChrX 133609211 133609394 414125 ChrX_133609211_133609394_HPRT1 183 ChrX 133609211 133609394 414308 ChrX_133620495_133620560_HPRT1 65 ChrX 133620495 133620560 414373 ChrX_133620495_133620560_HPRT1 65 ChrX 133620495 133620560 414438 ChrX_133620692_133620696_HPRT14 ChrX 133620692 133620696 414442 ChrX_133624218_133624235_HPRT1 17 ChrX 133624218 133624235 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speed up merge
I'm not sure. I'm still looking into it. Its pretty involved, so I asked the simplest answer first (the merge question). I'll reply back with a mock-up/sample that is testable under a more appropriate subject line. Probably this weekend. Regards, Ben On Fri, Mar 2, 2012 at 4:37 AM, Hans Ekbrand h...@sociologi.cjb.net wrote: On Fri, Mar 02, 2012 at 03:24:20AM -0700, Ben quant wrote: Hello, I have a nasty loop that I have to do 11877 times. Are you completely sure about that? I often find my self avoiding loops-by-row by constructing vectors of which rows that fullfil a condition, and then creating new vectors out of that vector. If you elaborate on the problem, perhaps we could find a way to avoid the loops altogether? Mostly as a note to self, I wrote http://code.cjb.net/vectors-instead-of-loop.html, it might be understood by others too, but I'm not sure. -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parameterization of Inverse Wishart distribution available in MCMCpack and bayesm libraries
On Fri, Mar 2, 2012 at 1:22 AM, peter dalgaard pda...@gmail.com wrote: Er, yes (scalar does not imply integer) Dough! awkward... Sorry Shantanu. I've added cat('###\n # ',substr(fortunes::fortune(90)$quote,1,146),'\n ### \n') To .First in my Rhelp directory. Hope that helps (me). As a general matter: 1. This is the Open Source world, you can read the actual function and see what it does. It might even say so in the comments. 2. You can investigate empirically -- the moments are known as a function of the parameters (check Wikipedia), so how about simulating a few thousand matrices and looking at the means and variances. (I don't do IW on a daily basis, but AFAICT, the two parametrizations have roughly the same mean, but a factor of two between variances, so it should be fairly easy to spot whether it is one or the other.) -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] acf() plot of matrix cuts y-axis labels
On 02/03/2012 11:40 AM, Folkes, Michael wrote: Hello all, I found a funny problem with y-axis labels when plotting acf(matrix) - the labels are too close to one of the margins and cut in half. Here's the problem: test-matrix(rnorm(200),ncol=4) acf(test) This doesn't fix the problem: test-matrix(rnorm(200),ncol=4) par(mar=c(3,3,2,0.2),oma=c(0,0,0,0)) acf(test) This does fix the margin. I understand why, but not sure why ONLY this will work? test-matrix(rnorm(200),ncol=4) acf(test,mar=c(3,3,2,0.2),oma=c(0,0,0,0)) acf uses plot.acf to do the plotting. If you read ?plot.acf, you'll see how it comes up with acf settings: the global ones are overridden for data like yours. You might want to do some experimenting and suggest better defaults for plot.acf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subseting a data frame
Please always cc the list for archival/threading reasons. Sort answer is that unique() gives the unique elements rather than something you should subset by, like a set of logical indices or row numbers. Note that in general unique(x) == x[!duplicated(x)] I'd imagine there are cases where this breaks down but I can't assemble one off the top of my head. Michael On Mar 2, 2012, at 12:13 PM, nathalie n...@sanger.ac.uk wrote: thanks why unique doesn't work here?? I believe you want the duplicated() function. Michael On Mar 2, 2012, at 10:19 AM, nathalien...@sanger.ac.uk wrote: HI, this is my problem I want to subset this file df, using only unique df$exon printing the line once even if df$exon appear several times: unique(df$exon) will show me the unique exons If I try to print only the unique exon lines with df[unique(df$exon),] -this doesn't print only the unique ones :( could you help? thanks Nat exon size chr start end 413077 ChrX_133594175_133594368_HPRT1 193 ChrX 133594175 133594368 413270 ChrX_133594183_133594368_HPRT1 185 ChrX 133594183 133594368 413455 ChrX_133594381_133594565_HPRT1 184 ChrX 133594381 133594565 413639 ChrX_133607389_133607495_HPRT1 106 ChrX 133607389 133607495 413745 ChrX_133607389_133607495_HPRT1 106 ChrX 133607389 133607495 413851 ChrX_133607404_133607495_HPRT1 91 ChrX 133607404 133607495 413942 ChrX_133609211_133609394_HPRT1 183 ChrX 133609211 133609394 414125 ChrX_133609211_133609394_HPRT1 183 ChrX 133609211 133609394 414308 ChrX_133620495_133620560_HPRT1 65 ChrX 133620495 133620560 414373 ChrX_133620495_133620560_HPRT1 65 ChrX 133620495 133620560 414438 ChrX_133620692_133620696_HPRT14 ChrX 133620692 133620696 414442 ChrX_133624218_133624235_HPRT1 17 ChrX 133624218 133624235 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] acf() plot of matrix cuts y-axis labels
Thanks for the advice. I guess I should have read the acf help page more thoroughly to appreciate the role of plot.acf(). Typical. Thanks Duncan. -Original Message- From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] Sent: March 2, 2012 9:53 AM To: Folkes, Michael Cc: r-help@r-project.org Subject: Re: [R] acf() plot of matrix cuts y-axis labels On 02/03/2012 11:40 AM, Folkes, Michael wrote: Hello all, I found a funny problem with y-axis labels when plotting acf(matrix) - the labels are too close to one of the margins and cut in half. Here's the problem: test-matrix(rnorm(200),ncol=4) acf(test) This doesn't fix the problem: test-matrix(rnorm(200),ncol=4) par(mar=c(3,3,2,0.2),oma=c(0,0,0,0)) acf(test) This does fix the margin. I understand why, but not sure why ONLY this will work? test-matrix(rnorm(200),ncol=4) acf(test,mar=c(3,3,2,0.2),oma=c(0,0,0,0)) acf uses plot.acf to do the plotting. If you read ?plot.acf, you'll see how it comes up with acf settings: the global ones are overridden for data like yours. You might want to do some experimenting and suggest better defaults for plot.acf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] MNP and exclusion restriction
Hi, I am running the MNP package in R. The model runs well. There are actually 4 choices and 4th is considered as base category. I got the result of all 19 covariates for all 3 model choices. What I want to do with the result is to eliminate all the covariates from one model choice except constant and to eliminate only one covariate from other model choice except constant and the third model choice remains with all the 19 covariates and constant (in other words, Exclusion restriction). I am working on a data to test for the existence of complementarity. For this I need the results as I mentioned above. Model Choice:B1 Model Choice: B2 Model Choice:B3 Constant only Constant + 18 covariates Constant + 19 covariates Blue color shows the result what I want. Waiting for the good suggestions Saqlain RAZA PhD Student (saqlain.r...@toulouse.inra.fr) IODA/AGIR (www.toulouse.inra.fr/agir) France. Mobile: (+33) 06 18 37 63 52 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Why predicted values are fewer that the real?
Hi i am running a glm model family Gamma(link=log) trying to predict a vector of 1554 (real) values Using predict() i got a vector of 950 predicted values instead of 1554. The predictions are good though The model doesnt take account of negative values and NAs which are only 121 values. Any clue? Thank -- View this message in context: http://r.789695.n4.nabble.com/Why-predicted-values-are-fewer-that-the-real-tp4438912p4438912.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subseting a data frame
Hello, HI, this is my problem I want to subset this file df, using only unique df$exon printing the line once even if df$exon appear several times: unique(df$exon) will show me the unique exons If I try to print only the unique exon lines with df[unique(df$exon),] -this doesn't print only the unique ones :( Try inx - match(unique(df$exon), df$exon) df[inx, ] Hope this helps, Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/subseting-a-data-frame-tp4438745p4438922.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reshaping
Hello, I have a large data set which I am trying to get in to a long/narrow format. I have given an example below of how I want my data to look before and after... any ideas for an easy way to do this? *###Start With this... *set.seed(1) a=rnorm(10) b=rnorm(10) c=rnorm(10) d=rnorm(10) e=rnorm(10) f=rnorm(10) g=rnorm(10) h=rnorm(10) G=c(1,2,3,4,5,6,7,8,9,10) test=matrix(c(G,a,b,c,d,e,f,g,h),ncol=9) colnames(test)=c(G,a,b,c,d,e,f,g,h) *test* ### WHERE... # a-d = male. e-h sex = female # a,c,e,g replicate = 1 # a+b experimental line = L1, c+d =L2 *### Which Becomes This... *z2=as.numeric(c(seq(1:10),seq(1:10),seq(1:10),seq(1:10))) sex=c(m,m,m,m,m,m,m,m,m,m,m,m,m,m,m,m,m,m,m, m,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f) rep=c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2) line=c(A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A1,A2,A3,A4,A5 ,A6 ,A7 ,A8 ,A9 ,A10 ,A1 ,A2 ,A3 ,A4 ,A5 ,A6 ,A7 ,A8 ,A9 ,A10) r=as.numeric(c(rnorm(10),rnorm(10),rnorm(10),rnorm(10))) test2=matrix(c(z2,sex,rep,line,r),ncol=5) colnames(test2)=c(G,Sex,Rep,Line,Res) *test2* -- View this message in context: http://r.789695.n4.nabble.com/reshaping-tp4439182p4439182.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] projection problem in sp package
I am plotting some data using sp package library (sp) library(maps) data.aggm # data # Define standard projection ll - CRS(+proj=longlat +datum=WGS84) # convert to a SpatialPointsDataFrame xy - cbind(data.aggm[,1], data.aggm[,2]) ch4.spPoints - SpatialPointsDataFrame(coords=xy, data=data.frame(data.aggm[,3]), proj4string=ll) after running this I am getting error is Error in validityMethod ( as (object,superClass)): Geographical CRS given to non -comformant data does any body know how to fix this error ? Cheers -- View this message in context: http://r.789695.n4.nabble.com/projection-problem-in-sp-package-tp4439317p4439317.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why predicted values are fewer that the real?
Hi, On Fri, Mar 2, 2012 at 11:19 AM, labbig lkaim...@windowslive.com wrote: Hi i am running a glm model family Gamma(link=log) trying to predict a vector of 1554 (real) values Using predict() i got a vector of 950 predicted values instead of 1554. The predictions are good though The model doesnt take account of negative values and NAs which are only 121 values. Any clue? To have a clue, we need far more information than you've provided, possibly even the reproducible example requested in the posting guide. Thank -- View this message in context: http://r.789695.n4.nabble.com/Why-predicted-values-are-fewer-that-the-real-tp4438912p4438912.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tobit Fixed Effects
Hi Arne, thanks for the improvements in the package. I'm using it right now and it's working very well. Best, *Felipe Nunes* CAPES/Fulbright Fellow PhD Student Political Science - UCLA Web: felipenunes.bol.ucla.edu On Fri, Mar 2, 2012 at 2:13 AM, Arne Henningsen arne.henning...@googlemail.com wrote: Dear Felipe On 29 September 2011 14:10, Arne Henningsen arne.henning...@googlemail.com wrote: Hi Felipe On 25 September 2011 00:16, Felipe Nunes felipnu...@gmail.com wrote: Hi Arne, my problem persists. I am still using censReg [version - 0.5-7] to run a random effects model in my data (50,000 cases), but I always get the message. tob7 - censReg(transfers.cap ~ pt.pt + psdb.pt + pt.opp + pt.coa + psdb.coa + pib.cap + transfers.cap.lag + pib.cap + ifdm + log(populat) + mayor.vot.per + log(bol.fam.per+0.01) + factor(uf.name) + factor(year) - 1, left=0, right=Inf, method=BHHH, nGHQ=8, iterlim=1, data = pdata2) Error in maxNRCompute(fn = logLikAttr, fnOrig = fn, gradOrig = grad, hessOrig = hess, : NA in the initial gradient If I sent you my data set, could you try to help me? I have been struggling with that for two months now. Thanks for sending me your data set. With it, I was able to figure out, where the NAs in the (initial) gradients come from: when calculating the derivatives of the standard normal density function [d dnorm(x) / d x = - dnorm(x) * x], values for x that are larger than approximately 40 (in absolute terms) result in so small values (in absolute terms) that R rounds them to zero. Later, these derivatives are multiplied by some other values and then the logarithm is taken ... and multiplying any number by zero and taking the logarithms gives not a finite number :-( When *densities* of the standard normal distribution become too small, one can use dnorm(x,log=TRUE) and store the logarithm of the small number, which is much larger (in absolute terms) than the density and hence, is not rounded to zero. However, in the case of the *derivative* of the standard normal density function, this is more complicated as log( d dnorm(x) / d x ) = log( dnorm(x) ) + log( - x ) is not defined if x is positive. I will try to solve this problem by case distinction (x0 vs. x0). Or does anybody know a better solution? Finally(!), I have implemented this solution in the censReg() package. Some initial tests (including your model and data) show that the revised calculation of the gradient of the random effects panel data model for censored dependent variables is much more robust to rounding errors. The improved version of the censReg package is not yet via CRAN, but it is available at R-Forge: https://r-forge.r-project.org/R/?group_id=256 If you have further questions or feedback regarding the censReg package, please use a forum or tracker on the R-Forge site of the sampleSelection project: https://r-forge.r-project.org/projects/sampleselection/ Best wishes from Copenhagen, Arne -- Arne Henningsen http://www.arne-henningsen.name [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speed up merge
One way to speed up the merge is not to use merge. You can use 'match' to find matching indices and then manually. Does this do what you want: ua - read.table(text = ' AName rt_date + 2007-03-31 14066.580078125 2007-04-01 + 2007-06-30 14717 2007-04-03 + 2007-09-30 15528 2007-10-25 + 2007-12-31 17609 2008-04-06 + 2008-03-31 17168 2008-04-24 + 2008-06-30 17681 2008-04-09', header = TRUE, as.is = TRUE) dt - c( 2007-03-31 ,2007-04-01 ,2007-04-02, 2007-04-03 ,2007-04-04, + 2007-04-05 ,2007-04-06 ,2007-04-07, + 2007-04-08, 2007-04-09) # find matching values in ua indx - match(dt, ua$rt_date) # create new result matrix xx1 - cbind(dt, ua[indx,]) rownames(xx1) - NULL # delete funny names xx1 dtANamert_date 1 2007-03-31 NA NA 2 2007-04-01 14066.58 2007-04-01 3 2007-04-02 NA NA 4 2007-04-03 14717.00 2007-04-03 5 2007-04-04 NA NA 6 2007-04-05 NA NA 7 2007-04-06 NA NA 8 2007-04-07 NA NA 9 2007-04-08 NA NA 10 2007-04-09 NA NA On Fri, Mar 2, 2012 at 5:24 AM, Ben quant ccqu...@gmail.com wrote: Hello, I have a nasty loop that I have to do 11877 times. The only thing that slows it down really is this merge: xx1 = merge(dt,ua_rd,by.x=1,by.y= 'rt_date',all.x=T) Any ideas on how to speed it up? The output can't change materially (it works), but I'd like it to go faster. I'm looking at getting around the loop (not shown), but I'm trying to speed up the merge first. I'll post regarding the loop if nothing comes of this post. Here is some information on what type of stuff is going into the merge: class(ua_rd) [1] matrix dim(ua_rd) [1] 20 2 head(ua_rd) AName rt_date 2007-03-31 14066.580078125 2007-04-26 2007-06-30 14717 2007-07-19 2007-09-30 15528 2007-10-25 2007-12-31 17609 2008-01-24 2008-03-31 17168 2008-04-24 2008-06-30 17681 2008-07-17 class(dt) [1] character length(dt) [1] 1799 dt[1:10] [1] 2007-03-31 2007-04-01 2007-04-02 2007-04-03 2007-04-04 2007-04-05 2007-04-06 2007-04-07 [9] 2007-04-08 2007-04-09 thanks, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Spacing of text does not match spacing of bars in barplot
I have a very standard barplot. My labels are too long to be printed horizontally under each bar, so I am using text to put the labels on a 45 degree slant. However, the labels are spaced more narrowly than the bars, so on an 8 vertical bar plot, the end of the eighth label is lined up with the seventh bar. Preferably I don't want to do every text label separately (I'm having this problem on all the graphs where I'm using text() Using Windows and version 2.14.1 X2sum - c(42.6, 3.6, 1.8, 3.9, 12.1, 14.3, 14.6 ,28.4) X2.labels - c(No earnings, Less than $5000/year, $5K to $10K , $10K to $15K , $ 15K to $20K , $20K to $25K , $25K to $30K , Over $30K ) barplot(X2sum) text(1:8, par(usr)[3] - 0.5, srt = 45, adj = 1, labels =X2.labels, xpd = TRUE) Thanks, Jon -- View this message in context: http://r.789695.n4.nabble.com/Spacing-of-text-does-not-match-spacing-of-bars-in-barplot-tp4439635p4439635.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why predicted values are fewer that the real?
try using glm(...,na.action=na.exclude) See ?na.exclude for the explanation On Fri, Mar 2, 2012 at 11:19 AM, labbig lkaim...@windowslive.com wrote: Hi i am running a glm model family Gamma(link=log) trying to predict a vector of 1554 (real) values Using predict() i got a vector of 950 predicted values instead of 1554. The predictions are good though The model doesnt take account of negative values and NAs which are only 121 values. Any clue? Thank -- View this message in context: http://r.789695.n4.nabble.com/Why-predicted-values-are-fewer-that-the-real-tp4438912p4438912.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why predicted values are fewer that the real?
On Mar 2, 2012, at 11:19 AM, labbig wrote: Hi i am running a glm model family Gamma(link=log) trying to predict a vector of 1554 (real) values Using predict() i got a vector of 950 predicted values instead of 1554. The predictions are good though The model doesnt take account of negative values and NAs which are only 121 values. Taking the logs of negative numbers? Any clue? I see a couple of possibilities. View this message in context: http://r.789695.n4.nabble.com/Why-predicted-values-are-fewer-that-the-real-tp4438912p4438912.html Sent from the R help mailing list archive at Nabble.com. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Spacing of text does not match spacing of bars in barplot
The return value of barplot contains the locations of the bars that it just drew. Use that instead of 1:8 when you draw the text: barCenters - barplot(X2sum) text(barCenters, par(usr)[3] - 0.5, srt = 45, adj = 1, labels =X2.labels, xpd = TRUE) Look at help(barplot) for details. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of jon waterhouse Sent: Friday, March 02, 2012 11:52 AM To: r-help@r-project.org Subject: [R] Spacing of text does not match spacing of bars in barplot I have a very standard barplot. My labels are too long to be printed horizontally under each bar, so I am using text to put the labels on a 45 degree slant. However, the labels are spaced more narrowly than the bars, so on an 8 vertical bar plot, the end of the eighth label is lined up with the seventh bar. Preferably I don't want to do every text label separately (I'm having this problem on all the graphs where I'm using text() Using Windows and version 2.14.1 X2sum - c(42.6, 3.6, 1.8, 3.9, 12.1, 14.3, 14.6 ,28.4) X2.labels - c(No earnings, Less than $5000/year, $5K to $10K , $10K to $15K , $ 15K to $20K , $20K to $25K , $25K to $30K , Over $30K ) barplot(X2sum) text(1:8, par(usr)[3] - 0.5, srt = 45, adj = 1, labels =X2.labels, xpd = TRUE) Thanks, Jon -- View this message in context: http://r.789695.n4.nabble.com/Spacing-of-text-does-not-match-spacing-of- bars-in-barplot-tp4439635p4439635.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Spacing of text does not match spacing of bars in barplot
On Mar 2, 2012, at 1:52 PM, jon waterhouse wrote: I have a very standard barplot. My labels are too long to be printed horizontally under each bar, so I am using text to put the labels on a 45 degree slant. However, the labels are spaced more narrowly than the bars, so on an 8 vertical bar plot, the end of the eighth label is lined up with the seventh bar. Preferably I don't want to do every text label separately (I'm having this problem on all the graphs where I'm using text() Using Windows and version 2.14.1 X2sum - c(42.6, 3.6, 1.8, 3.9, 12.1, 14.3, 14.6 ,28.4) X2.labels - c(No earnings, Less than $5000/year, $5K to $10K , $10K to $15K , $ 15K to $20K , $20K to $25K , $25K to $30K , Over $30K ) barplot(X2sum) text(1:8, par(usr)[3] - 0.5, srt = 45, adj = 1, labels =X2.labels, xpd = TRUE) Thanks, Jon Read ?barplot and take note of the Value section: A numeric vector (or matrix, when beside = TRUE), say mp, giving the coordinates of all the bar midpoints drawn, useful for adding to the graph. If beside is true, use colMeans(mp) for the midpoints of each group of bars, see example. Thus: mp - barplot(X2sum) text(mp, ...) You correctly read the R FAQ on the matter, but that example uses plot() rather than barplot(). The midpoints of the bars are not at integer values. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Computing line= for mtext
Hi Rich and Peter, What I am trying to do is the right-justify a vector of numbers to the right of the y-axis so that the leftmost digit of all of the numbers is one character to the right of the axis line. axis() plots tick marks and left-justifies the numbers. Peter's idea: - Since you're setting your right margin to 5, why not just mtext(s, side=4, las=1, at=5, adj=1, line = 5, cex=1) mtext(s, side=4, las=1, at=7, adj=1, line = 5, cex=2) i.e. set the line argument to the right margin? - comes close to what I'm trying to do but I haven't found out how to compute line from the maximum width over all the strings in the vector being plotted vertically, and I'm not sure it scales properly for different cex=. strwidth(s, units='inches')/par('cin')[1] doesn't seem to be a complete solution. Thanks Frank Richard M. Heiberger wrote Frank, this is it. It uses Peter's idea. plot(1:10) axis(side=2, 1:10, las=1, line=-31.5, lwd=0) axis(side=4, 1:10, las=1, labels=FALSE) Rich On Thu, Mar 1, 2012 at 6:52 PM, Frank Harrell lt;f.harrell@gt;wrote: Rich's pointers deals with lattice/grid graphics. Does anyone have a solution for base graphics? Thanks Frank Richard M. Heiberger wrote Frank, This can be done directly with a variant of the panel.axis function. See function panel.axis.right in the HH package. This was provided for me by David Winsemius in response to my query on this list in October 2011 https://stat.ethz.ch/pipermail/r-help/2011-October/292806.html The email thread also includes comments by Deepayan Sarkar and Paul Murrell. Rich On Wed, Feb 29, 2012 at 8:48 AM, Frank Harrell lt;f.harrell@gt;wrote: I want to right-justify a vector of numbers in the right margin of a low-level plot. For this I need to compute the line parameter to give to mtext. Is this the correct scalable calculation? par(mar=c(4,3,1,5)); plot(1:20) s - 'abcde'; w=strwidth(s, units='inches')/par('cin')[1] mtext(s, side=4, las=1, at=5, adj=1, line=w-.5, cex=1) mtext(s, side=4, las=1, at=7, adj=1, line=2*(w-.5), cex=2) Thanks Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Computing-line-for-mtext-tp4431554p4431554.html Sent from the R help mailing list archive at Nabble.com. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgtlt;http://www.r-project.org/posting-guide.htmlamp;lt;http://www.r-project.org/posting-guide.htmlamp;gtgt; ; and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgt; and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Computing-line-for-mtext-tp4431554p4436923.html Sent from the R help mailing list archive at Nabble.com. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgt; and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Computing-line-for-mtext-tp4431554p4439703.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reshaping
Hi, I *think* this is what you want... On Fri, Mar 2, 2012 at 12:29 PM, robgriffin247 robgriffin...@hotmail.com wrote: Hello, I have a large data set which I am trying to get in to a long/narrow format. I have given an example below of how I want my data to look before and after... any ideas for an easy way to do this? *###Start With this... *set.seed(1) a=rnorm(10) b=rnorm(10) c=rnorm(10) d=rnorm(10) e=rnorm(10) f=rnorm(10) g=rnorm(10) h=rnorm(10) G=c(1,2,3,4,5,6,7,8,9,10) test=matrix(c(G,a,b,c,d,e,f,g,h),ncol=9) colnames(test)=c(G,a,b,c,d,e,f,g,h) *test* library(reshape2) test.m - melt(as.data.frame(test), id.vars=G) ### WHERE... # a-d = male. e-h sex = female test.m$sex - ifelse(as.character(test.m$variable) %in% letters[1:4], male, female) # a,c,e,g replicate = 1 test.m$replicate - ifelse(as.character(test.m$variable) %in% c(a, c, e, g), 1, 2) # a+b experimental line = L1, c+d =L2 test.m$line - paste(test.m$variable, test.m$G, sep = ) HTH, Ista *### Which Becomes This... *z2=as.numeric(c(seq(1:10),seq(1:10),seq(1:10),seq(1:10))) sex=c(m,m,m,m,m,m,m,m,m,m,m,m,m,m,m,m,m,m,m, m,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f) rep=c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2) line=c(A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A1,A2,A3,A4,A5 ,A6 ,A7 ,A8 ,A9 ,A10 ,A1 ,A2 ,A3 ,A4 ,A5 ,A6 ,A7 ,A8 ,A9 ,A10) r=as.numeric(c(rnorm(10),rnorm(10),rnorm(10),rnorm(10))) test2=matrix(c(z2,sex,rep,line,r),ncol=5) colnames(test2)=c(G,Sex,Rep,Line,Res) *test2* -- View this message in context: http://r.789695.n4.nabble.com/reshaping-tp4439182p4439182.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Computing line= for mtext
On Fri, Mar 2, 2012 at 1:17 PM, Frank Harrell f.harr...@vanderbilt.edu wrote: Hi Rich and Peter, What I am trying to do is the right-justify a vector of numbers to the right of the y-axis so that the leftmost digit of all of the numbers is one character to the right of the axis line. axis() plots tick marks and No it doesn't (not always) left-justifies the numbers. For axis(4). If you really want right justify on the right hand side, you could use axis(2) with negative line numbers (device size dependent). Or you could try something like plot(1:20) axis(2,at=seq(1,20,4),labels=T,tick=F,las=T,pos=c(22.25,1)) Hope this is getting there (after only 6 messages...) Elai Peter's idea: - Since you're setting your right margin to 5, why not just mtext(s, side=4, las=1, at=5, adj=1, line = 5, cex=1) mtext(s, side=4, las=1, at=7, adj=1, line = 5, cex=2) i.e. set the line argument to the right margin? - comes close to what I'm trying to do but I haven't found out how to compute line from the maximum width over all the strings in the vector being plotted vertically, and I'm not sure it scales properly for different cex=. strwidth(s, units='inches')/par('cin')[1] doesn't seem to be a complete solution. Thanks Frank Richard M. Heiberger wrote Frank, this is it. It uses Peter's idea. plot(1:10) axis(side=2, 1:10, las=1, line=-31.5, lwd=0) axis(side=4, 1:10, las=1, labels=FALSE) Rich On Thu, Mar 1, 2012 at 6:52 PM, Frank Harrell lt;f.harrell@gt;wrote: Rich's pointers deals with lattice/grid graphics. Does anyone have a solution for base graphics? Thanks Frank Richard M. Heiberger wrote Frank, This can be done directly with a variant of the panel.axis function. See function panel.axis.right in the HH package. This was provided for me by David Winsemius in response to my query on this list in October 2011 https://stat.ethz.ch/pipermail/r-help/2011-October/292806.html The email thread also includes comments by Deepayan Sarkar and Paul Murrell. Rich On Wed, Feb 29, 2012 at 8:48 AM, Frank Harrell lt;f.harrell@gt;wrote: I want to right-justify a vector of numbers in the right margin of a low-level plot. For this I need to compute the line parameter to give to mtext. Is this the correct scalable calculation? par(mar=c(4,3,1,5)); plot(1:20) s - 'abcde'; w=strwidth(s, units='inches')/par('cin')[1] mtext(s, side=4, las=1, at=5, adj=1, line=w-.5, cex=1) mtext(s, side=4, las=1, at=7, adj=1, line=2*(w-.5), cex=2) Thanks Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Computing-line-for-mtext-tp4431554p4431554.html Sent from the R help mailing list archive at Nabble.com. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgtlt;http://www.r-project.org/posting-guide.htmlamp;lt;http://www.r-project.org/posting-guide.htmlamp;gtgt; ; and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgt; and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Computing-line-for-mtext-tp4431554p4436923.html Sent from the R help mailing list archive at Nabble.com. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgt; and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Computing-line-for-mtext-tp4431554p4439703.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal,
Re: [R] speed up merge
I'll have to give this a try this weekend. Thank you! ben On Fri, Mar 2, 2012 at 12:07 PM, jim holtman jholt...@gmail.com wrote: One way to speed up the merge is not to use merge. You can use 'match' to find matching indices and then manually. Does this do what you want: ua - read.table(text = ' AName rt_date + 2007-03-31 14066.580078125 2007-04-01 + 2007-06-30 14717 2007-04-03 + 2007-09-30 15528 2007-10-25 + 2007-12-31 17609 2008-04-06 + 2008-03-31 17168 2008-04-24 + 2008-06-30 17681 2008-04-09', header = TRUE, as.is = TRUE) dt - c( 2007-03-31 ,2007-04-01 ,2007-04-02, 2007-04-03 ,2007-04-04, + 2007-04-05 ,2007-04-06 ,2007-04-07, + 2007-04-08, 2007-04-09) # find matching values in ua indx - match(dt, ua$rt_date) # create new result matrix xx1 - cbind(dt, ua[indx,]) rownames(xx1) - NULL # delete funny names xx1 dtANamert_date 1 2007-03-31 NA NA 2 2007-04-01 14066.58 2007-04-01 3 2007-04-02 NA NA 4 2007-04-03 14717.00 2007-04-03 5 2007-04-04 NA NA 6 2007-04-05 NA NA 7 2007-04-06 NA NA 8 2007-04-07 NA NA 9 2007-04-08 NA NA 10 2007-04-09 NA NA On Fri, Mar 2, 2012 at 5:24 AM, Ben quant ccqu...@gmail.com wrote: Hello, I have a nasty loop that I have to do 11877 times. The only thing that slows it down really is this merge: xx1 = merge(dt,ua_rd,by.x=1,by.y= 'rt_date',all.x=T) Any ideas on how to speed it up? The output can't change materially (it works), but I'd like it to go faster. I'm looking at getting around the loop (not shown), but I'm trying to speed up the merge first. I'll post regarding the loop if nothing comes of this post. Here is some information on what type of stuff is going into the merge: class(ua_rd) [1] matrix dim(ua_rd) [1] 20 2 head(ua_rd) AName rt_date 2007-03-31 14066.580078125 2007-04-26 2007-06-30 14717 2007-07-19 2007-09-30 15528 2007-10-25 2007-12-31 17609 2008-01-24 2008-03-31 17168 2008-04-24 2008-06-30 17681 2008-07-17 class(dt) [1] character length(dt) [1] 1799 dt[1:10] [1] 2007-03-31 2007-04-01 2007-04-02 2007-04-03 2007-04-04 2007-04-05 2007-04-06 2007-04-07 [9] 2007-04-08 2007-04-09 thanks, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculation of standard error for a function
Dear list, If I know the standard error for k1 and k2, is there anything I can call in R to calculate the standard error of k1/k2? Thanks. Jun [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculation of standard error for a function
On Mar 2, 2012, at 4:47 PM, Jun Shen wrote: Dear list, If I know the standard error for k1 and k2, is there anything I can call in R to calculate the standard error of k1/k2? Thanks. This does not appear to be a well-posed question yet, and it is arguably more a statistics question than a coding question. Perhaps if you posted it at: http://stats.stackexchange.com/ . with more detail about the background for this question (especially _how_ you know these bits of information, but also what might be the purpose of this effort) you might get a more complete answer. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Noob question - Identity argument within aggregate function?
aggregate(z, identity, mean) 1 2 3 4 5 1.0 3.0 5.0 6.0 7.5 aggregate(z, mean) Error: length(time(x)) == length(by[[1]]) is not TRUE Can someone help me understand the error above and why identity is necessary to satisfy the error -- View this message in context: http://r.789695.n4.nabble.com/Noob-question-Identity-argument-within-aggregate-function-tp4439806p4439806.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] times series trellis plot
Dear List, I am struggling with the trellis graphic. A similar problem was mentioned here: http://r.789695.n4.nabble.com/R-How-can-you-get-N-replicates-of-a-multi-screen-multivariate-time-series-plot-td811850.html I do have 2 time series data sets. The 2 time series differ in some orders of magnitude. I managed to plot them into 1 graph, but, since the data is that different, on of the data set appears as a line only, well almost. So I would need to set a second y axis before, that is scaled to the second data set. I know how to do it with the usual plot.window routine or something similar, but not with the trellis graphic of the lattice package. Thanks in advance! stefan -- View this message in context: http://r.789695.n4.nabble.com/times-series-trellis-plot-tp4440032p4440032.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ?Syntax on Taking differential on both sides of the equation in 'R'
Hi,I am using package deSolve to run some ordinary differential equations (ODE) as part of a mathematical modeling project. I have solved for the following equilibrium states: Seq1-a*(1-Neq1)/(f*Veq1+m+d) Ceq1-(f*Seq1*Veq1+g*Ieq1+r*(1-Neq1)-b1*Veq1*Ieq1)/(b2+m+d+g) Ieq1-(-b2*Ceq1)-r*(1-Neq1)/(b1*Veq1-g-u) Veq1-o*(Ceq1+Ieq1)/e I want to take the differential of both sides of the equation and then solve for the inverse of the first as follows (the parameter values are made up): library(deSolve) rkMethod(rk45dp7) CSIeq1-function(t,yeq1,pars) { with (as.list(c(yeq1,pars)),{ Neq1-Seq1+Ceq1+Ieq1 dSeq1-d[a*(1-Neq1)/(f*Veq1+m+d)] dCeq1-d[(f*Seq1*Veq1+g*Ieq1+r*(1-Neq1)-b1*Veq1*Ieq1)/(b2+m+d+g)] dIeq1-d[(-b2*Ceq1)-r*(1-Neq1)/(b1*Veq1-g-u)] dVeq1-d[o*(Ceq1+Ieq1)/e] return(list(c(Seq1,Ceq1,Ieq1,Veq1),Neq1)) }) }pars - c(a=0.1, m=0.0005, u=0.5, b1=0.7, b2=0.2, f=0.01, g=0.4, r=0.3, o=20, e=90, d=0.01) # initial conditions yeq1 - c(Seq1=0.99,Ceq1=0.01,Ieq1=0,Veq1=1) t -seq(0,365, by=50) o1 - ode(yeq1, t, CSIeq1, pars, method = rkMethod(rk45dp7))1/o1$Seq1 *** The output gives the following error message so I am wondering about the syntax of the differentials such as dSeq1-d[a*(1-Neq1)/(f*Veq1+m+d)]: library(deSolve) rkMethod(rk45dp7) $ID [1] rk45dp7$varstep [1] TRUE$FSAL [1] TRUE$A [,1] [,2] [,3] [,4] [,5] [,6] [1,] 0. 0.00 0.000 0.000 0.000 0.000 [2,] 0.2000 0.00 0.000 0.000 0.000 0.000 [3,] 0.0750 0.225000 0.000 0.000 0.000 0.000 [4,] 0.9778 -3.73 3.556 0.000 0.000 0.000 [5,] 2.95259869 -11.595793 9.8228929 -0.2908093 0.000 0.000 [6,] 2.84627525 -10.757576 8.9064227 0.2784091 -0.2735313 0.000 [7,] 0.09114583 0.00 0.4492363 0.6510417 -0.3223762 0.1309524$b1 [1] 0.08991319 0. 0.45348907 0.61406250 -0.27151238 0.08904762 [7] 0.0250$b2 [1] 0.09114583 0. 0.44923630 0.65104167 -0.32237618 0.13095238 [7] 0.$c [1] 0.000 0.200 0.300 0.800 0.889 1.000 1.000$d [1] -1.127018 0.00 2.675424 -5.685527 3.521932 -1.767281 2.382469$densetype [1] 1$stage [1] 7$Qerr [1] 4attr(,class) [1] list rkMethod CSIeq1-function(t,yeq1,pars) { + with (as.list(c(yeq1,pars)),{ + Neq1-Seq1+Ceq1+Ieq1 + dSeq1-d[a*(1-Neq1)/(f*Veq1+m+d)] + dCeq1-d[(f*Seq1*Veq1+g*Ieq1+r*(1-Neq1)-b1*Veq1*Ieq1)/(b2+m+d+g)] + dIeq1-d[(-b2*Ceq1)-r*(1-Neq1)/(b1*Veq1-g-u)] + dVeq1-d[o*(Ceq1+Ieq1)/e] + return(list(c(Seq1,Ceq1,Ieq1,Veq1),Neq1)) + }) + } pars - c(a=0.1, m=0.0005, u=0.5, b1=0.7, b2=0.2, f=0.01, g=0.4, r=0.3, o=20, e=90, d=0.01) # initial conditions yeq1 - c(Seq1=0.99,Ceq1=0.01,Ieq1=0,Veq1=1) t -seq(0,365, by=50) o1 - ode(yeq1, t, CSIeq1, pars, method = rkMethod(rk45dp7)) There were 50 or more warnings (use warnings() to see the first 50) 1/o1$Seq1 Error in o1$Seq1 : $ operator is invalid for atomic vectors warnings() Warning messages: 1: In eval(expr, envir, enclos) : NAs introduced by coercion Any thoughts or suggestions would be appreciated. If you run the program with only the differential on the left, it runs just fine. Do I need to use a different 'R' package? -- AAP Anil A. Panackal, MD, ScM, FACP Confidential E-Mail: This e-mail is intended only for the person or entity to which it is addressed, and may contain information that is privileged, confidential, or otherwise protected from disclosure. Dissemination, distribution, or copying of this e-mail or the information herein by anyone other than the intended recipient is prohibited. If you have received this e-mail in error, please notify the sender by reply e-mail, and destroy the original message and all copies. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Count matches in pmatch
Hi, I need to find number of occurence of each word from one string in other string. So I need a function, which is similar to pmatch, but returns not references, but number of matches. Is there any function like this? If no, that is the way to calculate what I need? -- View this message in context: http://r.789695.n4.nabble.com/Count-matches-in-pmatch-tp4439722p4439722.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] pasting several things
I have this type of format: structure(list(day = 19, C1 = structure(1L, .Label = c(, C1 ), class = factor), C2 = structure(2L, .Label = c(, C2), class = factor), C3 = structure(1L, .Label = c(, C3), class = factor), Q1 = structure(2L, .Label = c(, Q1), class = factor), Q2 = structure(2L, .Label = c(, Q2), class = factor), Q3 = structure(1L, .Label = c(, Q3), class = factor)), .Names = c(day, C1, C2, C3, Q1, Q2, Q3), row.names = 8, class = data.frame) and want something like this: 19 -C2 _Q1_Q2 any ideas? obviously I could use paste() and get this. Keep in mind I have many of these and the presence of C1, C2, ... etc will vary. Thanks. -- View this message in context: http://r.789695.n4.nabble.com/pasting-several-things-tp4439770p4439770.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Noob question - Identity argument within aggregate function?
Hi, On Fri, Mar 2, 2012 at 3:51 PM, knavero knav...@gmail.com wrote: aggregate(z, identity, mean) 1 2 3 4 5 1.0 3.0 5.0 6.0 7.5 aggregate(z, mean) Error: length(time(x)) == length(by[[1]]) is not TRUE Can someone help me understand the error above and why identity is necessary to satisfy the error We can tell you to read ?aggregate and look at the order of the arguments. We can point out that aggregate(z, mean) is not the same as mean(z), and wonder what you are trying to accomplish with the former. We can ask for a reproducible example, and some idea of what you are trying to do. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Noob question - Identity argument within aggregate function?
On Mar 2, 2012, at 3:51 PM, knavero wrote: aggregate(z, identity, mean) 1 2 3 4 5 1.0 3.0 5.0 6.0 7.5 aggregate(z, mean) Error: length(time(x)) == length(by[[1]]) is not TRUE As generally happens when you call a function and fail to provide enough arguments to fill up its formals list. Can someone help me understand the error above and why identity is necessary to satisfy the error Well on my machine it throws an error, probably because you failed to provide the requested code to create the objects you were working on. Is 'z' so sort of special classed object for which there is an aggregate method? Is 'identity' a list as expected by aggregate.default or aggregate.data.frame? It would be an unfortunate choice of an object name, since there is a function with that nam. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory issue. XXXX
1. How much RAM do you have (looks like 2GB ) . If you have more than 2GB then you can allocate more memory with memory.size() 2. If you have 2GB or less then you have a couple options a) make sure your session is clean of unnecessary objects. b) Dont read in all the data if you dont need to ( see colClasses to control this ) c) use the bigmemory package or ff package d) buy more RAM On Fri, Mar 2, 2012 at 6:57 AM, Dan Abner dan.abne...@gmail.com wrote: Hi everyone, Any ideas on troubleshooting this memory issue: d1-read.csv(arrears.csv) Error: cannot allocate vector of size 77.3 Mb In addition: Warning messages: 1: In class(data) - data.frame : Reached total allocation of 1535Mb: see help(memory.size) 2: In class(data) - data.frame : Reached total allocation of 1535Mb: see help(memory.size) 3: In class(data) - data.frame : Reached total allocation of 1535Mb: see help(memory.size) 4: In class(data) - data.frame : Reached total allocation of 1535Mb: see help(memory.size) Thanks! Dan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning up messy Excel data
Try sending your clients a data set (data frame, table, etc) as an MS Access data table instead. They can still view the data as a table, but will have to go to much more effort to mess up the data, more likely they will do proper edits without messing anything up (mixing characters in with numbers, have more sexes than your biology teacher told you about, add extra lines at top or bottom that makes reading back into R more difficult, etc.) I have had a few clients that I talked into using MS Access from the start to enter their data, there was often a bit of resistance at first, but once they tried it and went through the process of designing the database up front they ended up thanking me and believed that the entire data entry process was easier and quicker than had the used excel as they originally planned. Access is still part of MS office, so they don't need to learn R or in any way break their chains from being prisoners of bill, but they will be more productive in more ways than just interfacing with you. Access (databases in general) force you to plan things out and do the correct thing from the start. It is possible to do the right thing in Excel, but Excel does not encourage (let alone force) you to do the right thing, but makes it easy to do the wrong thing. On Thu, Mar 1, 2012 at 6:15 AM, jim holtman jholt...@gmail.com wrote: But there are some important reasons to use Excel. In my work there are a lot of people that I have to send the equivalent of a data.frame to who want to look at the data and possibly slice/dice the data differently and then send back to me updates. These folks do not know how to use R, but do have Microsoft Office installed on their computers and know how to use the different products. I have been very successful in conveying what I am doing for them by communicating via Excel spreadsheets. It is also an important medium in dealing with some international companies who provide data via Excel and expect responses back via Excel. When dealing with data in a tabular form, Excel does provide a way for a majority of the people I work with to understand the data. Yes, there are problems with some of the ways that people use Excel, and yes I have had to invest time in scrubbing some of the data that I get from them, but if I did not, then I would probably not have a job working for them. I use R exclusively for the analysis that I do, but find it convenient to use Excel to provide a communication mechanism to the majority of the non-R users that I have to deal with. It is a convenient work-around because I would never get them to invest the time to learn R. So in the real world these is a need to Excel and we are not going to cause it to go away; we have to learn how to live with it, and from my standpoint, it has definitely benefited me in being able to communicate with my users and continuing to provide them with results that they are happy with. They refer to letting me work my magic on the data; all they know is they see the result via Excel and in the background R is doing the heavy lifting that they do not have to know about. On Wed, Feb 29, 2012 at 4:41 PM, Rolf Turner rolf.tur...@xtra.co.nz wrote: On 01/03/12 04:43, John Kane wrote: (mydata- as.factor(c(1,2,3, 2, 5, 2))) str(mydata) newdata- as.character(mydata) newdata[newdata==2]- 0 newdata- as.numeric(newdata) str(newdata) We really need to keep Excel (and other spreadsheets) out of peoples hands. Amen, bro'!!! cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Statistical Histograms in R
On 03/02/2012 11:49 PM, SMcG wrote: Hi, I'm wondering if anybody could possibly help me? I have a table with 5 tab-delimited columns. Each column has 'e-value' scores for 5 different proteins. I'd like to plot a distribution curve using hist() for the 5 different proteins and show the 5 distribution curves on the same graph in different colours. In the case, E-values will be the X-axis and frequency will be the Y-axis. Is this at all possible? Hi SMcG, Have a look at the last example for the barp function (plotrix). This may be what you want. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert list to text file
Or lapply(LIST, cat, file='outtext.txt', append=TRUE) On Thu, Mar 1, 2012 at 6:20 AM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Perhaps something like sink(outtext.txt) lapply(LIST, print) sink() You could replace print with cat and friends if you wanted more detailed control over the look of the output. Michael On Thu, Mar 1, 2012 at 5:28 AM, t.galesl...@ebh.umcn.nl wrote: Dear R users, Is it possible to write the following list to a text-file? List: [[1]] [1] 500 [[2]] [1] 1 [[3]] [,1] [,2] [,3] [,4] [,5] FID 1 2 3 4 5 Var 2 0 2 1 1 I would like to have the textfile look like this: 500 1 FID 1 2 3 4 5 Var 2 0 2 1 1 Thank you very much in advance for your help! Kind regards, Tessel Galesloot Department of Epidemiology, Biostatistics and HTA (133) Radboud University Nijmegen Medical Centre Het UMC St Radboud staat geregistreerd bij de Kamer van Koophandel in het handelsregister onder nummer 41055629. The Radboud University Nijmegen Medical Centre is listed in the Commercial Register of the Chamber of Commerce under file number 41055629. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Computing line= for mtext
Thanks Elai. axis(2) looks like a good approach. I think the way to solve for the pos= argument is to use: usr - par('usr'); plt - par('plt') usr[2] + (usr[2] - usr[1])/(plt[2] - plt[1]) * (1 - plt[2]) I think pos should have only one element. Thanks for your help, Frank ilai-2 wrote On Fri, Mar 2, 2012 at 1:17 PM, Frank Harrell lt;f.harrell@gt; wrote: Hi Rich and Peter, What I am trying to do is the right-justify a vector of numbers to the right of the y-axis so that the leftmost digit of all of the numbers is one character to the right of the axis line. axis() plots tick marks and No it doesn't (not always) left-justifies the numbers. For axis(4). If you really want right justify on the right hand side, you could use axis(2) with negative line numbers (device size dependent). Or you could try something like plot(1:20) axis(2,at=seq(1,20,4),labels=T,tick=F,las=T,pos=c(22.25,1)) Hope this is getting there (after only 6 messages...) Elai Peter's idea: - Since you're setting your right margin to 5, why not just mtext(s, side=4, las=1, at=5, adj=1, line = 5, cex=1) mtext(s, side=4, las=1, at=7, adj=1, line = 5, cex=2) i.e. set the line argument to the right margin? - comes close to what I'm trying to do but I haven't found out how to compute line from the maximum width over all the strings in the vector being plotted vertically, and I'm not sure it scales properly for different cex=. strwidth(s, units='inches')/par('cin')[1] doesn't seem to be a complete solution. Thanks Frank Richard M. Heiberger wrote Frank, this is it. It uses Peter's idea. plot(1:10) axis(side=2, 1:10, las=1, line=-31.5, lwd=0) axis(side=4, 1:10, las=1, labels=FALSE) Rich On Thu, Mar 1, 2012 at 6:52 PM, Frank Harrell lt;f.harrell@gt;wrote: Rich's pointers deals with lattice/grid graphics. Does anyone have a solution for base graphics? Thanks Frank Richard M. Heiberger wrote Frank, This can be done directly with a variant of the panel.axis function. See function panel.axis.right in the HH package. This was provided for me by David Winsemius in response to my query on this list in October 2011 https://stat.ethz.ch/pipermail/r-help/2011-October/292806.html The email thread also includes comments by Deepayan Sarkar and Paul Murrell. Rich On Wed, Feb 29, 2012 at 8:48 AM, Frank Harrell lt;f.harrell@gt;wrote: I want to right-justify a vector of numbers in the right margin of a low-level plot. For this I need to compute the line parameter to give to mtext. Is this the correct scalable calculation? par(mar=c(4,3,1,5)); plot(1:20) s - 'abcde'; w=strwidth(s, units='inches')/par('cin')[1] mtext(s, side=4, las=1, at=5, adj=1, line=w-.5, cex=1) mtext(s, side=4, las=1, at=7, adj=1, line=2*(w-.5), cex=2) Thanks Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Computing-line-for-mtext-tp4431554p4431554.html Sent from the R help mailing list archive at Nabble.com. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgtlt;http://www.r-project.org/posting-guide.htmlamp;lt;http://www.r-project.org/posting-guide.htmlamp;gtgt; ; and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgt; and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Computing-line-for-mtext-tp4431554p4436923.html Sent from the R help mailing list archive at Nabble.com. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgt; and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context:
Re: [R] Cleaning up messy Excel data
Unfortunately, a lot of people who use MS Office don't have or know how to use MS Access. Where I work now (as in the past) I have to tie someone to their chair, give them a few pokes with the cattle prod and then show them that a CSV file will load straight into Excel before I can convince them that they can use such a heretical data format. You don't want to know what I have to do to convince them that they can view my listings in HTML. Jim PS - Always give them a _copy_ of the CSV file. On 03/03/2012 10:41 AM, Greg Snow wrote: Try sending your clients a data set (data frame, table, etc) as an MS Access data table instead. They can still view the data as a table, but will have to go to much more effort to mess up the data, more likely they will do proper edits without messing anything up (mixing characters in with numbers, have more sexes than your biology teacher told you about, add extra lines at top or bottom that makes reading back into R more difficult, etc.) I have had a few clients that I talked into using MS Access from the start to enter their data, there was often a bit of resistance at first, but once they tried it and went through the process of designing the database up front they ended up thanking me and believed that the entire data entry process was easier and quicker than had the used excel as they originally planned. Access is still part of MS office, so they don't need to learn R or in any way break their chains from being prisoners of bill, but they will be more productive in more ways than just interfacing with you. Access (databases in general) force you to plan things out and do the correct thing from the start. It is possible to do the right thing in Excel, but Excel does not encourage (let alone force) you to do the right thing, but makes it easy to do the wrong thing. On Thu, Mar 1, 2012 at 6:15 AM, jim holtmanjholt...@gmail.com wrote: But there are some important reasons to use Excel. In my work there are a lot of people that I have to send the equivalent of a data.frame to who want to look at the data and possibly slice/dice the data differently and then send back to me updates. These folks do not know how to use R, but do have Microsoft Office installed on their computers and know how to use the different products. I have been very successful in conveying what I am doing for them by communicating via Excel spreadsheets. It is also an important medium in dealing with some international companies who provide data via Excel and expect responses back via Excel. When dealing with data in a tabular form, Excel does provide a way for a majority of the people I work with to understand the data. Yes, there are problems with some of the ways that people use Excel, and yes I have had to invest time in scrubbing some of the data that I get from them, but if I did not, then I would probably not have a job working for them. I use R exclusively for the analysis that I do, but find it convenient to use Excel to provide a communication mechanism to the majority of the non-R users that I have to deal with. It is a convenient work-around because I would never get them to invest the time to learn R. So in the real world these is a need to Excel and we are not going to cause it to go away; we have to learn how to live with it, and from my standpoint, it has definitely benefited me in being able to communicate with my users and continuing to provide them with results that they are happy with. They refer to letting me work my magic on the data; all they know is they see the result via Excel and in the background R is doing the heavy lifting that they do not have to know about. On Wed, Feb 29, 2012 at 4:41 PM, Rolf Turnerrolf.tur...@xtra.co.nz wrote: On 01/03/12 04:43, John Kane wrote: (mydata- as.factor(c(1,2,3, 2, 5, 2))) str(mydata) newdata- as.character(mydata) newdata[newdata==2]- 0 newdata- as.numeric(newdata) str(newdata) We really need to keep Excel (and other spreadsheets) out of peoples hands. Amen, bro'!!! cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculation of standard error for a function
On 12-03-02 4:47 PM, Jun Shen wrote: Dear list, If I know the standard error for k1 and k2, is there anything I can call in R to calculate the standard error of k1/k2? Thanks. No, because it depends on the joint distribution of k1 and k2. Even if you knew they were independent, that would not be sufficient (though you could use the delta method to get an approximation in that case; look it up). Duncan Murdoch Jun [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Computing line= for mtext
Hi, If you're going to use different text sizes and convert between units, it might be easier to do the calculations with grid. par(mar=c(1,1,1,5)) plot(1:10) labels = c(1, 2, 10, 123, 3.141592653589, 1.2, 2) sizes = c(1, 1, 2, 1, 0.4, 1, 3) # cex of individual labels ## pure base graphics max_width_base = do.call(max, mapply(function(l, size) strwidth(l, cex=size, units = inches), l=labels, size=sizes, SIMPLIFY=FALSE)) ## calculations with grid graphics max_width_grid = grid::convertUnit(do.call(max, mapply(function(l, size) grid::grobWidth(grid::textGrob(l, gp=grid::gpar(cex=size))), l=labels, size=sizes, SIMPLIFY=FALSE)), in, valueOnly=TRUE) all.equal(max_width_base, max_width_grid) ## add one line final_width = grid::convertUnit( grid::unit(max_width_base,in) + grid::unit(1,lines), lines, valueOnly=TRUE) mapply(function(l, size, ii) mtext(l, side=4, at=ii, las=1, adj=1, line=final_width, cex=size), l=labels, size=sizes, ii=seq_along(labels)) - b.quiet HTH, b. On 3 March 2012 09:17, Frank Harrell f.harr...@vanderbilt.edu wrote: Hi Rich and Peter, What I am trying to do is the right-justify a vector of numbers to the right of the y-axis so that the leftmost digit of all of the numbers is one character to the right of the axis line. axis() plots tick marks and left-justifies the numbers. Peter's idea: - Since you're setting your right margin to 5, why not just mtext(s, side=4, las=1, at=5, adj=1, line = 5, cex=1) mtext(s, side=4, las=1, at=7, adj=1, line = 5, cex=2) i.e. set the line argument to the right margin? - comes close to what I'm trying to do but I haven't found out how to compute line from the maximum width over all the strings in the vector being plotted vertically, and I'm not sure it scales properly for different cex=. strwidth(s, units='inches')/par('cin')[1] doesn't seem to be a complete solution. Thanks Frank Richard M. Heiberger wrote Frank, this is it. It uses Peter's idea. plot(1:10) axis(side=2, 1:10, las=1, line=-31.5, lwd=0) axis(side=4, 1:10, las=1, labels=FALSE) Rich On Thu, Mar 1, 2012 at 6:52 PM, Frank Harrell lt;f.harrell@gt;wrote: Rich's pointers deals with lattice/grid graphics. Does anyone have a solution for base graphics? Thanks Frank Richard M. Heiberger wrote Frank, This can be done directly with a variant of the panel.axis function. See function panel.axis.right in the HH package. This was provided for me by David Winsemius in response to my query on this list in October 2011 https://stat.ethz.ch/pipermail/r-help/2011-October/292806.html The email thread also includes comments by Deepayan Sarkar and Paul Murrell. Rich On Wed, Feb 29, 2012 at 8:48 AM, Frank Harrell lt;f.harrell@gt;wrote: I want to right-justify a vector of numbers in the right margin of a low-level plot. For this I need to compute the line parameter to give to mtext. Is this the correct scalable calculation? par(mar=c(4,3,1,5)); plot(1:20) s - 'abcde'; w=strwidth(s, units='inches')/par('cin')[1] mtext(s, side=4, las=1, at=5, adj=1, line=w-.5, cex=1) mtext(s, side=4, las=1, at=7, adj=1, line=2*(w-.5), cex=2) Thanks Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Computing-line-for-mtext-tp4431554p4431554.html Sent from the R help mailing list archive at Nabble.com. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgtlt;http://www.r-project.org/posting-guide.htmlamp;lt;http://www.r-project.org/posting-guide.htmlamp;gtgt; ; and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgt; and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Computing-line-for-mtext-tp4431554p4436923.html Sent from the R help mailing list archive at Nabble.com. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgt; and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@ mailing list
Re: [R] fridays date to date
If you know that your first date is a Friday then you can use seq with by=7 day, then you don't need to post filter the vector. On Thu, Mar 1, 2012 at 1:40 PM, Ben quant ccqu...@gmail.com wrote: Great thanks! ben On Thu, Mar 1, 2012 at 1:30 PM, Marc Schwartz marc_schwa...@me.com wrote: On Mar 1, 2012, at 2:02 PM, Ben quant wrote: Hello, How do I get the dates of all Fridays between two dates? thanks, Ben Days - seq(from = as.Date(2012-03-01), to = as.Date(2012-07-31), by = day) str(Days) Date[1:153], format: 2012-03-01 2012-03-02 2012-03-03 2012-03-04 ... # See ?weekdays Days[weekdays(Days) == Friday] [1] 2012-03-02 2012-03-09 2012-03-16 2012-03-23 2012-03-30 [6] 2012-04-06 2012-04-13 2012-04-20 2012-04-27 2012-05-04 [11] 2012-05-11 2012-05-18 2012-05-25 2012-06-01 2012-06-08 [16] 2012-06-15 2012-06-22 2012-06-29 2012-07-06 2012-07-13 [21] 2012-07-20 2012-07-27 HTH, Marc Schwartz [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Connecting points on a line with arcs/curves
?xspline On Thu, Mar 1, 2012 at 8:15 AM, hendersi ir...@cam.ac.uk wrote: Hello, I have a spreadsheet of pairs of coordinates and I would like to plot a line along which curves/arcs connect each pair of coordinates. The aim is to visualise the pattern of point connections. Thanks! Ian -- View this message in context: http://r.789695.n4.nabble.com/Connecting-points-on-a-line-with-arcs-curves-tp4435247p4435247.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with sum function
Others explained why it happens, but you might want to look at the zapsmall function for one way to deal with it. On Thu, Mar 1, 2012 at 2:49 PM, Mark A. Albins kamoko...@gmail.com wrote: Hi! I'm running R version 2.13.0 (2011-04-13) Platform: i386-pc-mingw32/i386 (32-bit) When i type in the command: sum(c(-0.2, 0.8, 0.8, -3.2, 1.8)) R returns the value: -5.551115e-17 Why doesn't R return zero in this case? There shouldn't be any rounding error in a simple sum. Thanks, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculation of standard error for a function
On Mar 2, 2012, at 7:05 PM, Duncan Murdoch wrote: On 12-03-02 4:47 PM, Jun Shen wrote: Dear list, If I know the standard error for k1 and k2, is there anything I can call in R to calculate the standard error of k1/k2? Thanks. No, because it depends on the joint distribution of k1 and k2. Even if you knew they were independent, that would not be sufficient (though you could use the delta method to get an approximation in that case; look it up). A nice article with useful information on three approaches to this problem appeared in BMC Medical Research Methodoogy: Methods for confidence interval estimation of a ratio parameter with application to location quotients, by Beyene and Moineddin. http://www.biomedcentral.com/1471-2288/5/32 Both were at Department of Public Health Science, University of Toronto, Toronto, Ontario, Canada, when this appeared in 2005. I thought they might have been neighbors of yours, Duncan, but I looked at a map and see that my understanding of Ontario geography is not particularly accurate. Duncan Murdoch Jun -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning up messy Excel data
On 03/03/12 12:41, Greg Snow wrote: SNIP It is possible to do the right thing in Excel, but Excel does not encourage (let alone force) you to do the right thing, but makes it easy to do the wrong thing. SNIP Fortune! cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Count matches in pmatch
Hello, Where is the reproducible example? apricum wrote Hi, I need to find number of occurence of each word from one string in other string. So I need a function, which is similar to pmatch, but returns not references, but number of matches. Is there any function like this? If no, that is the way to calculate what I need? Anyway, this might help. set.seed(1) x - paste(sample(letters[1:4], 10, T), collapse = ) x x2 - unlist(strsplit(x, )) xmatch - pmatch(x2, c(auvw, bxyz), duplicates.ok = TRUE) # Several ways to count matches tapply(xmatch, xmatch, length) aggregate(xmatch, list(xmatch), length) by(xmatch, xmatch, length) Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/Count-matches-in-pmatch-tp4439722p4440316.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] times series trellis plot
On Fri, Mar 2, 2012 at 5:15 PM, sluedtke slued...@gfz-potsdam.de wrote: Dear List, I am struggling with the trellis graphic. A similar problem was mentioned here: http://r.789695.n4.nabble.com/R-How-can-you-get-N-replicates-of-a-multi-screen-multivariate-time-series-plot-td811850.html I do have 2 time series data sets. The 2 time series differ in some orders of magnitude. I managed to plot them into 1 graph, but, since the data is that different, on of the data set appears as a line only, well almost. So I would need to set a second y axis before, that is scaled to the second data set. I know how to do it with the usual plot.window routine or something similar, but not with the trellis graphic of the lattice package. xyplot.zoo in the zoo package is a version of lattice's xyplot specialized to work with time series. See the help file and also the examples in it. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Correlation of huge matrix saved as binary file
Hi, I have a 900,000,000*9,000 matrix where I need to calculate the correlation between all entries along the smaller dimension, thus creating a 9k*9k correlation matrix. This matrix is too big to be uploaded in R, and is saved as a binary file. To access the data in the file I use mmap and some api-functions (to get all values in one row, one column, or one particular value). I'm looking for some advice in how to calculate the correlation matrix. Right now my approach is to do something similar to this (toy code): corr.matrix-matrix('numeric',ncol=9000,nrow=9000) for (i in 1:9000) { for (j in (i+1):9000) { # i1=... getting the index of item (i) in a second file # i2=getting the index of item (j) g1=api$getCol(i1) g2=api$getCol(i2) cor.matrix[i,j]=cor(g1,g2) }} This will work, but will take forever. Any advice for how this can be done more efficiently? I'm running on a 2.6.18 linux system, with R version R-2.11.1. Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Correlation-of-huge-matrix-saved-as-binary-file-tp4440119p4440119.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculation of standard error for a function
On 03/03/12 13:35, David Winsemius wrote: On Mar 2, 2012, at 7:05 PM, Duncan Murdoch wrote: On 12-03-02 4:47 PM, Jun Shen wrote: Dear list, If I know the standard error for k1 and k2, is there anything I can call in R to calculate the standard error of k1/k2? Thanks. No, because it depends on the joint distribution of k1 and k2. Even if you knew they were independent, that would not be sufficient (though you could use the delta method to get an approximation in that case; look it up). A nice article with useful information on three approaches to this problem appeared in BMC Medical Research Methodoogy: Methods for confidence interval estimation of a ratio parameter with application to location quotients, by Beyene and Moineddin. http://www.biomedcentral.com/1471-2288/5/32 SNIP Chapter 6 of Beyond Anova by Rupert G. Miller, Chapman Hall, 1997, might also be of interest. cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning up messy Excel data
Unfortunately they only know how to use Excel and Word. They are not folks who use a computer every day. Many of them run factories or warehouses and asking them to use something like Access would not happen in my lifetime (I have retired twice already). I don't have any problems with them messing up the data that I send them; they are pretty good about making changes within the context of the spreadsheet. The other issue is that I working with people in twenty different locations spread across the US, so I might be able to one of them to use Access (there is one I know that uses it), but that leaves 19 other people I would not be able to communicate with. The other thing is, is that I use Excel myself to slice/dice data since there are things that are easier in Excel than R (believe it or not). There are a number of tools I keep in my toolkit, and R is probably the most important, but I have not thrown the rest of them away since they still serve a purpose. So if you can come up with a way to 20 diverse groups, who are not computer literate, to change over in a couple of days from Excel to Access let me know. BTW, I tried to use Access once and gave it up because it was not as intuitive as some other tools and did not give me any more capability than the ones I was using. So I know I would have a problem in convincing other to make the change just so they could communicate with me, while they still had to use Excel to most of their other interfaces. This is the real world where you have to learn how to adapt to your environment and make the best of it. So you just have to learn that Excel can be your friend (or at least not your enemy) and can serve a very useful purpose in getting your ideas across to other people. On Fri, Mar 2, 2012 at 6:41 PM, Greg Snow 538...@gmail.com wrote: Try sending your clients a data set (data frame, table, etc) as an MS Access data table instead. They can still view the data as a table, but will have to go to much more effort to mess up the data, more likely they will do proper edits without messing anything up (mixing characters in with numbers, have more sexes than your biology teacher told you about, add extra lines at top or bottom that makes reading back into R more difficult, etc.) I have had a few clients that I talked into using MS Access from the start to enter their data, there was often a bit of resistance at first, but once they tried it and went through the process of designing the database up front they ended up thanking me and believed that the entire data entry process was easier and quicker than had the used excel as they originally planned. Access is still part of MS office, so they don't need to learn R or in any way break their chains from being prisoners of bill, but they will be more productive in more ways than just interfacing with you. Access (databases in general) force you to plan things out and do the correct thing from the start. It is possible to do the right thing in Excel, but Excel does not encourage (let alone force) you to do the right thing, but makes it easy to do the wrong thing. On Thu, Mar 1, 2012 at 6:15 AM, jim holtman jholt...@gmail.com wrote: But there are some important reasons to use Excel. In my work there are a lot of people that I have to send the equivalent of a data.frame to who want to look at the data and possibly slice/dice the data differently and then send back to me updates. These folks do not know how to use R, but do have Microsoft Office installed on their computers and know how to use the different products. I have been very successful in conveying what I am doing for them by communicating via Excel spreadsheets. It is also an important medium in dealing with some international companies who provide data via Excel and expect responses back via Excel. When dealing with data in a tabular form, Excel does provide a way for a majority of the people I work with to understand the data. Yes, there are problems with some of the ways that people use Excel, and yes I have had to invest time in scrubbing some of the data that I get from them, but if I did not, then I would probably not have a job working for them. I use R exclusively for the analysis that I do, but find it convenient to use Excel to provide a communication mechanism to the majority of the non-R users that I have to deal with. It is a convenient work-around because I would never get them to invest the time to learn R. So in the real world these is a need to Excel and we are not going to cause it to go away; we have to learn how to live with it, and from my standpoint, it has definitely benefited me in being able to communicate with my users and continuing to provide them with results that they are happy with. They refer to letting me work my magic on the data; all they know is they see the result via Excel and in the background R is doing the heavy lifting that they do not have to know
Re: [R] Correlation of huge matrix saved as binary file
I don't think you can speed it up by a whole lot... but you can try a few things, especially if you don't have missing data in the matrix (which you probably don't). The main question is what takes most of the time- the api calls or the cor() call? If it's cor, here's what you can try: 1. Pre-standardize the entire matrix input matrix, i.e. scale each column to mean=0 and sum of squares=1. Save the standardized matrix (or make sure it's available to api). Since your matrix only has 9000 columns, this should not take extremely long. 2. Instead of calculating correlations, calculate simply sum(g1*g2) - if g1 and g2 are standardized as above, correlation equals sum(g1*g2). 3. Instead of calculating the correlations one-by-one, calculate them in small blocks (if you have enough memory and you run a 64-bit R). With 900M rows, you will only be able to put a 900Mx2 into an R object, but if you have two such standardized matrices loaded in g1, g2, you can get their (2x2) correlation matrix by t(g1) %*% g2. This 2x2 matrix you can use to fill the appropriate components of the result matrix. 4. Use one of the multi-threading packages (multicore comes to mind but there are others) to parallelize your code. If you have 8 available cores, you can expect a nearly 8x speedup. All in all, this will probably still take forever, but should be one or two orders of magnitude faster than your current code :) HTH, Peter On Fri, Mar 2, 2012 at 2:50 PM, Bryo bryne...@gmail.com wrote: Hi, I have a 900,000,000*9,000 matrix where I need to calculate the correlation between all entries along the smaller dimension, thus creating a 9k*9k correlation matrix. This matrix is too big to be uploaded in R, and is saved as a binary file. To access the data in the file I use mmap and some api-functions (to get all values in one row, one column, or one particular value). I'm looking for some advice in how to calculate the correlation matrix. Right now my approach is to do something similar to this (toy code): corr.matrix-matrix('numeric',ncol=9000,nrow=9000) for (i in 1:9000) { for (j in (i+1):9000) { # i1=... getting the index of item (i) in a second file # i2=getting the index of item (j) g1=api$getCol(i1) g2=api$getCol(i2) cor.matrix[i,j]=cor(g1,g2) }} This will work, but will take forever. Any advice for how this can be done more efficiently? I'm running on a 2.6.18 linux system, with R version R-2.11.1. Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pasting several things
Try this: x - structure(list(day = 19, C1 = structure(1L, .Label = c(, C1 ), class = factor), C2 = structure(2L, .Label = c(, C2), class = factor), C3 = structure(1L, .Label = c(, C3), class = factor), Q1 = structure(2L, .Label = c(, Q1), class = factor), Q2 = structure(2L, .Label = c(, Q2), class = factor), Q3 = structure(1L, .Label = c(, Q3), class = factor)), .Names = c(day, C1, C2, C3, Q1, Q2, Q3), row.names = 8, class = data.frame) paste(x[1, 1], do.call(paste, c(x[1, x != ][, -1], list(sep=_))), sep= -) # Output looks like this paste(x[1, 1], do.call(paste, c(x[1, x != ][, -1], list(sep=_))), sep= -) [1] 19 -C2_Q1_Q2 x[1, 2] - C1 paste(x[1, 1], do.call(paste, c(x[1, x != ][, -1], list(sep=_))), sep= -) [1] 19 -C1_C2_Q1_Q2 HTH, Garrett On Fri, Mar 2, 2012 at 2:39 PM, chuck.01 charliethebrow...@gmail.com wrote: I have this type of format: structure(list(day = 19, C1 = structure(1L, .Label = c(, C1 ), class = factor), C2 = structure(2L, .Label = c(, C2), class = factor), C3 = structure(1L, .Label = c(, C3), class = factor), Q1 = structure(2L, .Label = c(, Q1), class = factor), Q2 = structure(2L, .Label = c(, Q2), class = factor), Q3 = structure(1L, .Label = c(, Q3), class = factor)), .Names = c(day, C1, C2, C3, Q1, Q2, Q3), row.names = 8, class = data.frame) and want something like this: 19 -C2 _Q1_Q2 any ideas? obviously I could use paste() and get this. Keep in mind I have many of these and the presence of C1, C2, ... etc will vary. Thanks. -- View this message in context: http://r.789695.n4.nabble.com/pasting-several-things-tp4439770p4439770.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Strategies to deal with unbalanced classification data in randomForest
Hello all, I have become somewhat confused with options available for dealing with a highly unbalanced data set (1 in one class, 50 in the other). As a summary I am unsure: a) if I am perform the two class weighting methods properly, b) if the data are too unbalanced and that this type of analysis is appropriate and c) if there is any interaction between the weighting for class imbalances and number of trees in a forest. An example will illustrate this best. Say I have a data set like the following: df - rbind( data.frame(var1=runif(1, 10, 50), var2=runif(1, -3, 3), var3=runif(1, 0.1, 0.25), cls=factor(CLASS-1) ), data.frame(var1=runif(50, 10, 50), var2=runif(50, 2, 7), var3=runif(50, 0.2, 0.35), cls=factor(CLASS-2) ) ) ## Where the response vector is highly imbalanced like so: summary(df$cls) library(randomForest) set.seed(17) ## Now the obviously an extreme case but I am wondering what the options are to deal with something like this. ## The problem with this situation manifests itself when I try to train a random forest ## without accounting for this imbalance df.rf-randomForest(cls~var1+var2+var3, data=df,importance=TRUE) ## Now one option is to down sample the majority variable. However, I can seem to find exactly ## how to do this. Does this seem correct? df.rf.downsamp -randomForest(cls~var1+var2+var3, data=df,sampsize=c(50,50), importance=TRUE) ## 50 being the number of observations in the minority variable ## The other option which there seems to be some confusion over is establish some class weights ## to balance the error rate. This approach I've mostly drawn from here: ## http://stat-www.berkeley.edu/users/breiman/RandomForests/cc_home.htm#balance ## This might not be appropriate, however, as of September it looks like Breiman method wasn't used in R df.rf.weights-randomForest(cls~var1+var2+var3, data=df,classwt=c(1, 600), importance=TRUE) ## Nevertheless, what I am concerned about is the effect of an unbalanced data set has on my randomForest model ## For example: par(mfrow=c(1,3)) plot(df.rf) plot(df.rf.downsamp) plot(df.rf.weights) presents three very different scenarios and I having trouble resolving the issues I mentioned above. I am extremely grateful for all the work that has been done on randomForests in R up to this point. I was hoping that someone, with more experience, might be able to advise what the best strategy is to deal with this problem. Which of these approaches are best and am I using them right? Thanks so much in advance for any help. Sam sessionInfo() R version 2.14.2 (2012-02-29) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C LC_TIME=en_CA.UTF-8 [4] LC_COLLATE=en_CA.UTF-8 LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] ggplot2_0.8.9 plyr_1.7.1tools_2.14.2 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Grouped barchart confidence intervals in lattice
Hi everyone, I'm having trouble adding error bars to a grouped barchart in lattice. I know that this topic has been addressed quite a bit, as I've been searching the internet for a while to try to troubleshoot the issue, but I've not been able to find any solution that I could get working on my data. I was wondering if someone could look at my code and tell me what I'm doing wrong. I was hoping somebody's found a way to do this (I'm sure they have) and can tell me how to fix my code. # Input example data growth - c(6.6,7.2,6.9,8.3,7.9,9.2,8.3,8.7,8.1,8.5,9.1,9.0) diet - as.factor(rep(c(A,B,C),2,each=2)) coat - as.factor(rep(c(light,dark),each=6)) growth.means - aggregate(growth,list(coat,diet),mean) library(plotrix) growth.errs - aggregate(growth,list(coat,diet),std.error) # Try using the superpose call with panel.groups results in an error panel.ci - function(x, y, subscripts, groups...){ panel.barchart(x, y, groups=groups, subscripts=subscripts, horiz=F,...) panel.segments(x[subscripts], y, x[subscripts], y+growth.errs$x, col = 'black') } barchart(growth~Group.1, groups=Group.2, data=growth.means, panel=panel.superpose, panel.groups=panel.ci ) # Try using the generic plot.barchart command gives three error bars, all are the appropriate sizes, but all are centered in each group and not on the grouped bars barchart(x~Group.1, groups=Group.2, data=growth.means, panel=function(x,y,subscripts, groups){ panel.barchart(x,y,horiz=F,groups=groups, subscripts=subscripts) panel.segments(as.numeric(x)[subscripts],y,as.numeric(x)[subscripts],y+growth.errs$x) } ) What am I doing wrong? Thanks, Nathan Lemoine __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Computing line= for mtext
I would use the regular text function instead of mtext (remembering to set par(xpd=...)), then use the grconvertX and grconvertY functions to find the location to plot at (possibly adding in the results from strwidth or stheight). On Thu, Mar 1, 2012 at 4:52 PM, Frank Harrell f.harr...@vanderbilt.edu wrote: Rich's pointers deals with lattice/grid graphics. Does anyone have a solution for base graphics? Thanks Frank Richard M. Heiberger wrote Frank, This can be done directly with a variant of the panel.axis function. See function panel.axis.right in the HH package. This was provided for me by David Winsemius in response to my query on this list in October 2011 https://stat.ethz.ch/pipermail/r-help/2011-October/292806.html The email thread also includes comments by Deepayan Sarkar and Paul Murrell. Rich On Wed, Feb 29, 2012 at 8:48 AM, Frank Harrell lt;f.harrell@gt;wrote: I want to right-justify a vector of numbers in the right margin of a low-level plot. For this I need to compute the line parameter to give to mtext. Is this the correct scalable calculation? par(mar=c(4,3,1,5)); plot(1:20) s - 'abcde'; w=strwidth(s, units='inches')/par('cin')[1] mtext(s, side=4, las=1, at=5, adj=1, line=w-.5, cex=1) mtext(s, side=4, las=1, at=7, adj=1, line=2*(w-.5), cex=2) Thanks Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Computing-line-for-mtext-tp4431554p4431554.html Sent from the R help mailing list archive at Nabble.com. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmllt;http://www.r-project.org/posting-guide.htmlgt; and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Computing-line-for-mtext-tp4431554p4436923.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory issue. XXXX
On 02/03/2012 23:36, steven mosher wrote: 1. How much RAM do you have (looks like 2GB ) . If you have more than 2GB then you can allocate more memory with memory.size() Actually, this looks like 32-bit Windows (unstated), so you cannot. See the rw-FAQ for things your sysadmin can do even there. 2. If you have 2GB or less then you have a couple options a) make sure your session is clean of unnecessary objects. b) Dont read in all the data if you dont need to ( see colClasses to control this ) c) use the bigmemory package or ff package d) buy more RAM Most importantly, use a 64-bit OS to get a larger real address space. (bigmemory and ff are mainly palliative measures for those whose OS does not provide a good implementation of out-of-memory objects). On Fri, Mar 2, 2012 at 6:57 AM, Dan Abnerdan.abne...@gmail.com wrote: Hi everyone, Any ideas on troubleshooting this memory issue: d1-read.csv(arrears.csv) Error: cannot allocate vector of size 77.3 Mb In addition: Warning messages: 1: In class(data)- data.frame : Reached total allocation of 1535Mb: see help(memory.size) 2: In class(data)- data.frame : Reached total allocation of 1535Mb: see help(memory.size) 3: In class(data)- data.frame : Reached total allocation of 1535Mb: see help(memory.size) 4: In class(data)- data.frame : Reached total allocation of 1535Mb: see help(memory.size) Thanks! Dan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 回复: Bayesian Hidden Markov Models
Dear Oscar,  I have used the the following codes to perform a Bayesian HMM for the exchange rate data. But, one intresting result is that the model fits a 6-state HMM with a common variance. This is very hard to understand. Because, from the plot graph, we could see there are obviously differents with high and low volatility.  So, could you please help me to take a look at this? Attached is the exchange rate data. I am really grateful for your help and time.  Best Regards,  James LAN   #input exchange rate data exrt-read.table(file=exrt.txt,header=F) plot(exrt$V2) library(RJaCGH) y-exrt$V2 Pos- 1:length(y) Chrom - rep(1, length(y)) res-RJaCGH(y=y, Pos=Pos, Chrom=Chrom) summary(res) Q.NH(summary(res)[[1]]$beta, x=0) Summary for ARRAY array1: Distribution of the number of hidden states: 1 2 3 4 5 6 0 0 0 0 0 1 Model with 6 states: Distribution of the posterior means of hidden states:            10%   25%   50%   75%   90% Loss-1  -0.298 -0.284 -0.284 -0.279 -0.279 Loss-2  -0.144 -0.142 -0.142 -0.135 -0.135 Normal-1 -0.045 -0.043 -0.043 -0.040 -0.040 Normal-2 -0.004 -0.003 -0.003 0.000 0.000 Normal-3 0.047 0.056 0.056 0.059 0.059 Gain     0.177 0.197 0.197 0.198 0.198 Distribution of the posterior variances of hidden states:           10%  25%  50%  75%  90% Loss-1  0.001 0.001 0.001 0.001 0.001 Loss-2  0.001 0.001 0.001 0.001 0.001 Normal-1 0.001 0.001 0.001 0.001 0.001 Normal-2 0.001 0.001 0.001 0.001 0.001 Normal-3 0.001 0.001 0.001 0.001 0.001 Gain    0.001 0.001 0.001 0.001 0.001 Parameters of the transition functions:         Loss-1 Loss-2 Normal-1 Normal-2 Normal-3 Gain Loss-1   0.000 0.217   0.192   1.229   0.185 0.857 Loss-2   2.104 0.000   0.305   2.190   0.132 1.424 Normal-1 2.728 1.472   0.000   4.606   0.293 2.423 Normal-2 5.919 4.746   5.518   0.000   5.067 5.834 Normal-3 2.295 0.537   0.115   4.329   0.000 2.514 Gain     1.519 0.247   0.036   1.263   0.132 0.000 Q.NH(summary(res)[[1]]$beta, x=0)              Loss-1     Loss-2   Normal-1   Normal-2   Normal-3 Loss-1  0.239381248 0.192598942 0.197535790 0.070058386 0.198853168 Loss-2  0.039503637 0.323847484 0.238632024 0.036241348 0.283843424 Normal-1 0.030559504 0.107234801 0.467453369 0.004669696 0.348627295 Normal-2 0.002624349 0.008474303 0.003915585 0.975979222 0.006151494 Normal-3 0.037727330 0.218834862 0.333794793 0.004936521 0.374412381 Gain    0.053064705 0.189481114 0.233947328 0.068592117 0.212423356                Gain Loss-1  0.101572465 Loss-2  0.077932083 Normal-1 0.041455335 Normal-2 0.002855048 Normal-3 0.030294113 Gain    0.242491380 åä»¶äººï¼ Oscar Rueda [via R] ml-node+s789695n4431468...@n4.nabble.com æ¶ä»¶äººï¼ monkeylan lanjin...@yahoo.com.cn åéæ¥æï¼ 2012å¹´2æ29æ¥, ææä¸, ä¸å 9:21 主é¢: Re: Bayesian Hidden Markov Models Dear James, The distances are normalized between zero and 1, so in your case all of them will be zero. You can check that with res$Dist.for.model And do Q.NH(summary(res)[[1]]$beta, x=0) To obtain the common transition matrix. Cheers, Oscar On 29/2/12 03:59, monkeylan [hidden email] wrote: Dear Oscar,  I am extremely grateful to your help and detailed explanation of the use of RJaCGH package. But, when runing the sample codes you listed, another issue I am a little confused is as following: After runing summary(res), I have got the estimation of the random matrix Beta: Parameters of the transition functions:     Normal  Gain Normal  0.000 4.258 Gain   2.001 0.000  But, the transition probabilty matrix Q based on the aboving Beta is more concerned in my modeling. Here, I am not sure how can I get the  matrix Q. I did try the Q.NH functions.However, Shoud I set the distance parameter x be 1 or 0? I am not sure.   If 1( according to my own understanding), the following result seems not reseanable.  tran-matrix(c(0,2.001,4.528,0),2,2) Q.NH(beta=tran, x=1)    [,1] [,2] [1,]  0.5  0.5 [2,]  0.5  0.5  Many thanks for your further help and time.  James Allan --- 12å¹´2æ28æ¥ï¼å¨äº, Oscar Rueda [via R] [hidden email] åéï¼ å件人: Oscar Rueda [via R] [hidden email] 主é¢: Re: Bayesian Hidden Markov Models æ¶ä»¶äºº: monkeylan [hidden email] æ¥æ: 2012å¹´2æ28æ¥,å¨äº,ä¸å7:02 Dear James, Basically you just need the values (y) and the positions (in your case it would be the index of the times series). The chromosome argument does not apply to your case so it can be a vector of ones. If the positions are at the same distance between (equally spaced) then the
[R] interpreting the output of a glm with an ordered categorical predictor.
Greetings. I'm a Master's student working on an analysis of herbivore damage on plants. I have a tried running a glm with one categorical predictor (aphid abundance) and a binomial response (presence/absence of herbivore damage). My predictor has four categories: high, medium, low, and none. I used the ordered function to sort my categories for a glm. ah - read.csv(http://depot.northwestern.edu/class/2012WI_PBC_435-0_AND_BIOL_SCI_313/muller/herbivoryEdit.csv;) ah1- ah[ah$date==110810,] ah2-ah[ah$date==110904,] aphidOrder - ordered(ah2$aphidLevelMax,levels=c(none, low, med, high)) ordAph - glm(chewholebinom~aphidOrder,family=binomial,data=ah2) When I ran the summary for the glm (output pasted below), I could not tell which intercept referred to which factor level. My question is, what do .L, .Q, and .C mean and how can I relate these factors to my original factors (none, low, med, high)? Thank you for your help, Katherine summary(ordAph) Call: glm(formula = chewholebinom ~ aphidOrder, family = binomial, data = ah2) Deviance Residuals: Min 1Q Median 3Q Max -1.6512 -0.9817 0.7687 0.7687 1.5353 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -0.055670.25097 -0.222 0.8245 aphidOrder.L -1.367550.49366 -2.770 0.0056 ** aphidOrder.Q 0.368240.50195 0.734 0.4632 aphidOrder.C -0.098400.51011 -0.193 0.8470 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 137.99 on 99 degrees of freedom Residual deviance: 124.05 on 96 degrees of freedom AIC: 132.05 Number of Fisher Scoring iterations: 4 -- View this message in context: http://r.789695.n4.nabble.com/interpreting-the-output-of-a-glm-with-an-ordered-categorical-predictor-tp4440383p4440383.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Noob question - Identity argument within aggregate function?
z is a zoo object as a result from reading in the following series z = suppressWarnings(zoo(1:8), c(1, 2, 2, 2, 3, 4, 5, 5)) This is what z is in the aggregate function. So then that brings us to aggregate(z, identity, tail, 1). All I was trying to accomplish was trying to reproduce an example shown on the zoo faq. I've read ?aggregate via terminal and used /identity to search through the documentation for the specific term identity. I'm just trying to understand what identity is used for because I do not understand the error statement. -- View this message in context: http://r.789695.n4.nabble.com/Noob-question-Identity-argument-within-aggregate-function-tp4439806p4440413.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Noob question - Identity argument within aggregate function?
I've also searched ?identity in the R shell and it doesn't seem to be the definition I'm looking for for this particular usage of identity as an argument in the aggregate function. I simply would appreciate a conceptual explanation of what it does here and how it relates to the error. -- View this message in context: http://r.789695.n4.nabble.com/Noob-question-Identity-argument-within-aggregate-function-tp4439806p4440420.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Noob question - Identity argument within aggregate function?
Sorry, in regards to the previous post where I said aggregate(z, identity, tail, 1), replace it with aggregate(z, identity, mean) -- View this message in context: http://r.789695.n4.nabble.com/Noob-question-Identity-argument-within-aggregate-function-tp4439806p4440424.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.