Re: [R] Uncorrelated random vectors - Thank you!
Thanks to all for the answers! I solved my problem now by sufficient iteration! Have a nice day! Luba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Thanks: Mathematical annotation axis in lattice
Hello, thanks for the two replies. The following code worked as expected: pos - 1:10 lab - letters[pos] ll - parse(text = paste(pos,*phi[,lab,],sep = )) xyplot(1:10~1:10,scales = list(x = list(labels = ll,at = 1:10))) Best regards, Albart -Original Message- From: Coster, Albart Sent: Tue 7/7/2009 1:27 PM To: r-help-requ...@r-project.org Subject: Mathematical annotation axis in lattice Dear list, making mathematical expressions in plots is not difficult: expression(phi[1]) for example. At this moment I am stuck in creating a vector of expressions: pos - 1:10 lab - letters[pos] Now, I would like to create a vector of expressions which I could use for labeling the x-axis of a lattice plot. ll - as.expression(paste(pos, phi[,lab,],sep = ) xyplot(1:10~11:10,scales = list(x = list(labels = ll,at = 1:10))) does not work. I read about the function substitute, but that did not solve it. Could you recommend me how I should do this? Thanks in advance, Albart Coster Wageningen Universiteit Netherlands __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rdsm, a DSM package for parallel R programming
As I mentioned last week, I've been developing a package that I call Rdsm (R distributed shared memory), modeled after a similar package, PerlDSM, I wrote for Perl some years ago. It is now in alpha form, so I'm not uploading to CRAN yet, but it is definitely usable, and I am releasing it at http://heather.cs.ucdavis.edu/~matloff/R/Rdsm I hope many try it out, and give me some feedback. Note that the word distributed here means that the memory is not really shared, but instead is an abstraction, to give the programmer a shared-memory view even though the program may be running on several separate machines. For C/C++ this is generally accomplished by manipulation of the virtual memory hardware. For R, I do this by redefining functions such as [ and [- for a new class. Rdsm is intended as an alternative for those who favor the shared-memory view of things. In the parallel processing community, there has always been a debate between advocates of the two main programming paradigms, shared memory and message passing. Shared memory advocates claim greater clarity of code, while the message passing people point to that paradigm's greater flexibility. I happen to be of the shared-memory school. Given the popularity of OpenMP for C/C++/FORTRAN, I believe Rdsm will be of interest to many for R. Indeed, in the next few months, I will be extending Rdsm with functions that give it the look and feel of OpenMP. Norm Matloff UC Davis - End forwarded message - __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can't get rJava to install on Linux
On 7 July 2009 at 21:28, Mark Kimpel wrote: | Having difficulties getting rJava to install on my Debian Squeeze box. Did you try the binary package? A simple sudo apt-get install r-cran-rjava should do; if not you can at least use its Build-Depeds via sudo apt-get build-depends r-cran-rjava Hth, Dirk -- Three out of two people have difficulties with fractions. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] clogit comparison between Stata and R
Hello all I'm moving back and forth between stata and R at the moment - of course, using R whenever possible :-) I'm running conditional logits on some panel data and I get slightly different results and different N in the two programs. In R I run clogit(trans.dem ~ I(avg.gle_rgdp.500/gle_rgdp) + log(gle_rgdp) + timesince.dem + I(timesince.dem^2) + timesince.dict + I(timesince.dict^2) + p_polity2 + I(p_polity2^2) + strata(ccodecow) + cluster(ccodecow), method=approximate, data=univ) and I get an n of 3747. In Stata, I run clogit trans_dem avg_gle_rgdp_ratio loggle_rgdp timesince_dem timesince_demsq timesince_dict timesince_dictsq p_polity2 pol2sq, group(ccodecow) vce(cluster ccodecow) which I hope is the same model. I get a message 29 groups (935 obs) dropped because of all positive or all negative outcomes, and an n of 2812. Also, the coefficients are slightly different. I understand why Stata is dropping the groups with all outcomes the same... this is inevitable in a conditional logit, right? Is R doing the same? And what might be the cause of the difference in coefficients? Cheers David Hugh-Jones Post-doctoral Researcher Max Planck Institute of Economics, Jena http://davidhughjones.googlepages.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to read point shp file to R?
You might also find the R wiki useful: http://wiki.r-project.org/rwiki/doku.php?id=tips:spatial-data http://wiki.r-project.org/rwiki/doku.php?id=tips:stats-spatial David Hugh-Jones Post-doctoral Researcher Max Planck Institute of Economics, Jena http://davidhughjones.googlepages.com 2009/7/7 Sunny sunshineab...@gmail.com I am new with R and want do some analysis with a point vector data file. Any help is appreciate. Sunny [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R 2.9.0 plot still forcing current time zone
the help page for plot.POSIXct says As from R 2.9.0 the date-times for a 'POSIXct' input are interpreted in the timwzonw give by the 'tzone' attribute it there is one, otherwise the current timezone. (Earlier vrsions always used the current timezone.) however I am using 2.9.0 on linux and the following still happily produces an x-axis in local (MDT) time x=strptime(paste('09-01-01 00:00:00',sep=''),format='%y-%m-%d %H:%M:%S',tz=GMT)+60*60*24*(seq(0.5,1.5,.1)) x [1] 2009-01-01 12:00:00 GMT 2009-01-01 14:24:00 GMT [3] 2009-01-01 16:48:00 GMT 2009-01-01 19:12:00 GMT [5] 2009-01-01 21:36:00 GMT 2009-01-02 00:00:00 GMT [7] 2009-01-02 02:24:00 GMT 2009-01-02 04:48:00 GMT [9] 2009-01-02 07:12:00 GMT 2009-01-02 09:36:00 GMT [11] 2009-01-02 12:00:00 GMT attributes(x) $class [1] POSIXt POSIXct $tzone [1] GMT plot(x,rep(1,11)) Is this a bug, or am I missing something? Thanks a lot! Britt -- Britton B. Stephens National Center for Atmospheric Research P.O. Box 3000, 1850 Table Mesa Drive Boulder, CO 80307-3000 Phone: (303) 497-1018 Fax: (303) 497-1092 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dump plots to powerpoint?
On Tue, Jul 7, 2009 at 11:32 PM, Ben Bolkerbol...@ufl.edu wrote: Why not directly generate a large PNG file (which will be much better for line art than JPG anyway)? Or EMF? See http://wiki.r-project.org/rwiki/doku.php?id=tips:graphics-misc:export [Of course, this doesn't answer the original question ... to which I suspect the answer is no.] So image generation is done, now we want to put them all into a presentation (One image per slide? Titles?) Suggestions: 1. Dump Powerpoint, learn LaTeX and beamer, your audience will be happy. Including a bunch of image files? Trivial. 2. Dump Powerpoint, use OpenOffice - the OO Impress file is a zip file, one file of which is an XML description of the presentation, so then you just have to create an XML file a bit like that that specifies all your images. You could do this in R. It just needs a bit of simple reverse engineering. Create a simple presentation like the one you want to do with a few images in, then save, then unzip it, figure it out, write a little template (using R's brew package perhaps), then write a new XML file with all your images specified, zip up, job done. Save it from OpenOffice as a Powerpoint file if you really need to use Powerpoint. 3. Okay, so you really want to use Powerpoint, in which case the latest file format (the one with the 'x' at the end) should be some kind of XML file which you might be able to reverse engineer in a similar way to (2). Good luck with that. Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help resolving error in quantcut
I am trying to use the quantcut function to create deciles, but I am getting the error below. I am new to using this function and do not know how to properly use the options or some other conversion that is necessary. #initial summary using describe function in Hmisc library DegreeBurn4th n missing uniqueMean .05 .10 .25 .50 .75 .90 .95 76 133 16 0.0325 0. 0. 0. 0. 0.0225 0.0900 0.1725 0 0.01 0.02 0.03 0.04 0.05 0.06 0.08 0.09 0.12 0.16 0.17 0.18 0.24 0.36 0.5 Frequency 486342211211111 1 1 % 63845331131111 11 1 degree.quant = quantcut(DegreeBurn4th, q=seq(0, 1, 0.1), labels=F,na.rm=TRUE) Error in if (sum(flag) == 0) return(cut) else return(min(x[flag], na.rm = na.rm)) : missing value where TRUE/FALSE needed #orignal data print(DegreeBurn4th) [1] 0.09 0.00 0.00 NA NA 0.03 NA 0.02 NA 0.00 0.01 0.00 NA NA NA NA NA 0.00 NA 0.05 0.03 0.00 NA 0.02 0.00 NA 0.00 NA NA 0.16 NA NA 0.24 [34] NA NA 0.00 NA 0.00 0.08 NA NA NA 0.00 0.00 NA NA 0.01 NA 0.09 NA 0.00 0.00 0.00 0.06 NA 0.00 NA NA 0.00 NA NA 0.00 0.01 NA NA 0.00 [67] NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0.00 0.00 NA NA 0.00 NA 0.05 0.00 NA NA NA 0.00 0.02 0.18 NA NA NA 0.03 NA [100] NA 0.00 NA NA NA 0.36 NA NA NA NA 0.00 0.00 0.00 NA 0.00 NA 0.17 NA 0.00 NA NA NA 0.00 NA 0.00 0.00 0.00 NA NA 0.12 0.00 NA 0.01 [133] 0.00 NA NA NA NA 0.00 0.00 NA 0.01 0.00 0.00 NA NA 0.00 0.04 NA NA NA 0.00 0.00 NA 0.03 NA 0.00 NA 0.00 NA NA 0.01 0.00 0.00 NA NA [166] NA NA NA NA NA NA NA NA NA NA NA NA 0.00 NA NA NA NA NA NA NA NA NA NA 0.50 NA NA NA NA NA NA NA NA 0.04 [199] NA NA NA NA NA NA NA NA NA NA NA #convert missing to zero DegreeBurn4th[is.na(DegreeBurn4th)]-0.00 print(DegreeBurn4th) [1] 0.09 0.00 0.00 0.00 0.00 0.03 0.00 0.02 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.03 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.16 0.00 0.00 0.24 [34] 0.00 0.00 0.00 0.00 0.00 0.08 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.09 0.00 0.00 0.00 0.00 0.06 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 [67] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.02 0.18 0.00 0.00 0.00 0.03 0.00 [100] 0.00 0.00 0.00 0.00 0.00 0.36 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.12 0.00 0.00 0.01 [133] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 [166] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.04 [199] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 degree.quant = quantcut(DegreeBurn4th, q=seq(0, 1, 0.1), labels=F,na.rm=TRUE) Error in if (pairs[1, i] == pairs[1, i - 1] pairs[1, i] == pairs[2, : missing value where TRUE/FALSE needed degree.quant = quantcut(DegreeBurn4th, q=seq(0, 1, 0.1), labels=F,include.lowest=TRUE) Error in cut.default(x[!flag], breaks = newquant, include.lowest = TRUE, : formal argument include.lowest matched by multiple actual arguments degree.quant = quantcut(DegreeBurn4th, q=seq(0, 1, 0.1), labels=F,include.lowest=F) Error in cut.default(x[!flag], breaks = newquant, include.lowest = TRUE, : formal argument include.lowest matched by multiple actual arguments degree.quant = quantcut(DegreeBurn4th, q=seq(0, 1, 0.1), labels=F,include.lowest=T) Error in cut.default(x[!flag], breaks = newquant, include.lowest = TRUE, : formal argument include.lowest matched by multiple actual arguments Chris Anderson http://www.seocodebreaker.com/?thankyou-page=429 Criminal Lawyers - Click here. http://thirdpartyoffers.netzero.net/TGL2241/fc/BLSrjpYbd6xeB8PyC2qYcdt9oup93MpUqzGGHKa4mySkwS9XNfuLPlvlNq4/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: error: no such index at level 2
Hi r-help-boun...@r-project.org napsal dne 07.07.2009 19:06:17: Hi, I am confused about how to select elements from a list. I'm trying to select all rows of a table 'crossRsorted' such that the mean of a related vector is 0. The related vector is accessible as a list element l[[i]] where i is the row index. I thought this would work: crossRsorted[mean(q[[ crossRsorted[,1] ]], na.rm = TRUE) 0, ] Error in q[[crossRsorted[, 1]]] : no such index at level 2 Strange, I got completely different error. Couldn't be that only ***you*** have crossRsorted? crossRsorted[mean(q[[ crossRsorted[,1] ]], na.rm = TRUE) 0, ] Error: object 'crossRsorted' not found What is crossRsorted? Data frame?, List? What is q? List? You need to provide at least output from str(q) and str(crossRsorted) to get some reasonable answers. and far better to provide artificial data to demonstrate the problem. with 2 data frames df1[rowMeans(df2)0,] selects rows of df1 which correspond to rows with row mean df20 with data frame and list df1[sapply(list1,mean)0,] selects rows of df1 which correspond to list elements with mean 0 But without knowing structure of your data? Nobody knows. Regards Petr How can I express: select only those rows 'r_i' from crossRsorted where mean(q[[r_i[1]]]) 0? Thanks, - Godmar __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bigglm() results different from glm()+Another question
Hi Greg, Many thanks for your precious time. Here is a workable code: set.seed(1) xx = data.frame(x1=runif(1,0,10), x2=runif(1,0,10), x3=runif(1,0,10)) xx$y = 3 + xx$x1 + 2*xx$x2 + 3*xx$x3 + rnorm(1) chunksize = 500 fit = biglm(y~x1+x2+x3, data=xx[1:chunksize,]) for(i in seq(chunksize,1,chunksize)) fit=update(fit, moredata=xx[(i+1):(i+chunksize),]) AIC(fit) [1] 28956.91 And the AIC for other chunksizes: chunksizeAIC 500 28956.91 100027956.91 200025956.91 250024956.91 500019956.91 19956.91 Also I noted that the estimated coefficients are not dependent on chunksize and AIC is exactly a linear function of chunksize. So I guess it is some problem with the calculation of AIC, may be in some degree of freedom or adding some constant somewhere. And my comments below. Regards Utkarsh Greg Snow wrote: How many rows does xx have? Let's look at your example for chunksize 1, you initially fit the first 1 observations, then the seq results in just the value 1 which means that you do the update based on vaues 10001 through 2, if xx only has 1 rows, then this should give at least one error. If xx has 2 or more rows, then only chunksize 1 will ever see the 2^th value, the other chunksizes will use less of the data. Understood your point and apologize that you had to spend time going into the logic inside for loop. I definitely thought of that but my actual problem was the variation in AICs (which I was sure about), so to ignore this loop problem (temporarily), I deliberately chose the chunksizes such that the number of rows is a multiple of chunksize. I knew there is still one extra iteration happening and I checked that it was not causing any problem, the moredata in the last iteration will be all NA's and update does nothing in such a case. For example: Let's say chunksize=5000, even though xx has only 1 rows, fit2 and fit3 below are exactly same. fit1 = biglm(y~x1+x2+x3, data=xx[1:5000,]) fit2 = update(fit1, moredata=xx[5001:1,]) fit3 = update(fit2, moredata=xx[10001:15000,]) AIC(fit1); AIC(fit2); AIC(fit3) [1] 5018.282 [1] 19956.91 [1] 19956.91 (The AIC matches with the table above and no warnings at all) I checked all these things before sending my first mail and dropped the idea of refining the for loop as this will save me a few lines of code and also the loop looks good and easy to understand. Moreover it is neither taking any extra run time nor producing any warnings or errors. Also looking at the help for update.biglm, the 2^nd argument is moredata not data, so if the code below is the code that you actually ran, then the new data chunks are going into the ... argument (and being ignored as that is there for future expansion and does nothing yet) and the moredata argument is left empty, which should also be giving an error. For the code below, the model is only being fit to the initial chunk and never updated, so with different chunk sizes, there is different amounts of data per model. You can check this by doing summary(fit) and looking at the sample size in the 2^nd line. My fault in writing the mail. In the actual code, I gave update(fit, xx[(i+1):(i+chunksize),]) ,i.e., I just passed the new chunk as the 2nd argument without mentioning the argument name, which is correct, but while writing the mail I added the argument name as data without checking what it is. It is easier for us to help you if you provide code that can be run by copying and pasting (we don't have xx, so we can't just run the code below, you could include a line to randomly generate an xx, or a link to where a copy of xx can be downloaded from). It also helps if you mention any errors or warnings that you receive in the process of running your code. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 *From:* utkarshsinghal [mailto:utkarsh.sing...@global-analytics.com] *Sent:* Tuesday, July 07, 2009 12:10 AM *To:* Greg Snow *Cc:* Thomas Lumley; r help *Subject:* Re: [R] bigglm() results different from glm()+Another question Trust me, it is the same total data I am using, even the chunksizes are all equal. I also crosschecked by manually creating the chunks and updating as in example given on biglm help page. ?biglm Regards Utkarsh Greg Snow wrote: Are you sure that you are fitting all the models on the same total data? A first glance looks like you may be including more data in some of the chunk sizes, or be producing an error that update does not know how to deal with. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org mailto:greg.s...@imail.org 801.408.8111 *From:* utkarshsinghal
[R] Plotting the PDF and the Cumulative Probability
Hallo, I have to fit my distribution with Beta-prime. I found the parameters now I need to plot the Cumulative probability and the Probability density of my fitted data. With gamma for exemple is easy: PDF: plot(x,dgamma(x, shape,rate)) Cumulative probability: plot(x,pgamma(x, shape,rate)) How can I do with beta-prime? Can I use the pbeta and dbeta defined for Beta distribution even though my function is a Beta-prime? Thanks a lot! Ale -- View this message in context: http://www.nabble.com/Plotting-the-PDF-and-the-Cumulative-Probability-tp24387435p24387435.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dump plots to powerpoint?
Another suggestion: Create your presentation with an OpenDocument Presentation format compatible application (e.g. OpenOffice Impress), create the plots with Sweave chunks, and process the file with odfWeave() (package odfWeave). If necessary, you can export to other formats such as PowerPoint. Best wishes Thomas Zumbrunn Quoting Barry Rowlingson b.rowling...@lancaster.ac.uk: On Tue, Jul 7, 2009 at 11:32 PM, Ben Bolkerbol...@ufl.edu wrote: Why not directly generate a large PNG file (which will be much better for line art than JPG anyway)? Or EMF? See http://wiki.r-project.org/rwiki/doku.php?id=tips:graphics-misc:export [Of course, this doesn't answer the original question ... to which I suspect the answer is no.] So image generation is done, now we want to put them all into a presentation (One image per slide? Titles?) Suggestions: 1. Dump Powerpoint, learn LaTeX and beamer, your audience will be happy. Including a bunch of image files? Trivial. 2. Dump Powerpoint, use OpenOffice - the OO Impress file is a zip file, one file of which is an XML description of the presentation, so then you just have to create an XML file a bit like that that specifies all your images. You could do this in R. It just needs a bit of simple reverse engineering. Create a simple presentation like the one you want to do with a few images in, then save, then unzip it, figure it out, write a little template (using R's brew package perhaps), then write a new XML file with all your images specified, zip up, job done. Save it from OpenOffice as a Powerpoint file if you really need to use Powerpoint. 3. Okay, so you really want to use Powerpoint, in which case the latest file format (the one with the 'x' at the end) should be some kind of XML file which you might be able to reverse engineer in a similar way to (2). Good luck with that. Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to separate the string?
Hi everyone, Thanks alot. Its work with help of you all. regards, Hema On Tue, Jul 7, 2009 at 5:00 PM, Petr PIKAL petr.pi...@precheza.cz wrote: Hi If you have data frame like this test=data.frame(x=c(abcd, abc, abcde)) than strsplit(as.matrix(test), ) makes a list with splitted character vectors. If you want them in data frame you would need to combine vectors of unequal length. However I would try reading your text file with read.fwf(file, 1) Regards Petr Hemavathi Ramulu hema.ram...@gmail.com napsal dne 07.07.2009 10:36:40: Hi Petr, The data in text file and not csv format. The word separate which I mean in this content is like split/separate the string to each alphabet where each alphabet will be in different column. thanks alot. regards, Hema. On Tue, Jul 7, 2009 at 4:12 PM, Petr PIKAL petr.pi...@precheza.cz wrote: Hi r-help-boun...@r-project.org napsal dne 07.07.2009 09:54:30: Hi everyone, Hi want to separate the string(column1) for example Well, how did you get the data in R? Are they in separated columns of data.frame? What do you mean by separate? column1 column2 column3 column4 column5 column6 bear b e a r cat c a t tigert i g e r I know how to do this in excel where using MID function. As Microsoft is more user friendly and uses translated functions in language specific versions of Excel I do not have function MID. I suspect it takes values from middle of string set by some identifiers. If it is the case see ?substr However I would start with ?read.table and related read.* functions to get the data into R in appropriate shape. Regards Petr Now I want to solve it using R. The list of strings is in text file. I looked up the help but did not find it. Can someone help me here? Thank you very much. Regards, Hema [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Hemavathi Ramulu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fitting a trend-line
Hi all, I am new to R. How does one go about fitting a trend-line to a scatter plot? Any help is appreciated. Thanks and regards, Anupam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: Fitting a trend-line
Hi see ?lm and ?abline Regards Petr r-help-boun...@r-project.org napsal dne 08.07.2009 11:31:19: Hi all, I am new to R. How does one go about fitting a trend-line to a scatter plot? Any help is appreciated. Thanks and regards, Anupam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fitting a trend-line
Why dont you do a linear regression? Date: Wed, 8 Jul 2009 15:01:19 +0530 From: anupam.cont...@gmail.com To: r-help@r-project.org Subject: [R] Fitting a trend-line Hi all, I am new to R. How does one go about fitting a trend-line to a scatter plot? Any help is appreciated. Thanks and regards, Anupam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] system() how to make a program run a specific file
I'd like to know how to call a program to run or open a specific file. something like this: system('C:\\Program Files (x86)\\IrfanView\\i_view32.exe','-A:\\ teste.jpg') is not working. any help will be appreciated Paulo E. Cardoso [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dump plots to powerpoint?
Check out the R2PPT package on CRAN. On Tue, Jul 7, 2009 at 4:38 PM, Thomasaikto...@yahoo.com wrote: Hi, Is it possible to dump a series of plots directly into a powerpoint presentation (as is possible in Splus)? Thank you, Thomas [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RODBC and sqlSave issue
Hello, I contact you after having unsuccessfully asked my question to R mailing list. I use the package RODBC to connect to a MS-SQL server. I am able to getQuery from the database. I am now studying the sqlSave some data into the database. Unfortunetly, I meet some issues relating to the format of the data that arrives into the database. I have three columns. The first one should be in the MS-SQL format datetime. The second one in the MS-SQL format varchar(50), and the third one in the MS-SQl format numeric(20,8). I use the following command line: sqlSave(channel, DF, tablename=essai_global, rownames=FALSE, oldstyle=FALSE) The data is indeed send to the database. But the types are wrong (varchar(255) pour les trois colonnes.) I have then tried to use the varTypes argument, but I do not manage to use it. If I use the following command lines: varTypes=c(datetime,varchar(50),numeric(20,8)) sqlSave(channel, DF, tablename=essai_global, rownames=FALSE, oldstyle=FALSE) I have the following resturn: Warning message: In sqlSave(channel, DF, tablename = essai_global, rownames = FALSE, : argument 'varTypes' has no names and will be ignored and the types are still wrong.. How can I use the varTypes??? I have read the documentation, but I dd not manage to find out. Thank you very much Wapita _ r ! Téléchargez-le maintenant ! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dump plots to powerpoint?
Hi, On windows, you can use a COM client (with packages like rcom or RDCOMClient) to control powerpoint from R and insert the generated image using powerpoint's object model. You can either use the clipboard or an intermediate image file saved to disk. Not hard to do, but this seems to be already implemented in package RPPT recently released to CRAN, so have a look at it: http://stat.ethz.ch/CRAN/web/packages/R2PPT/index.html About the image format, using windows metafiles allows you to double-click the image in powerpoint, ungroup, and then edit each of its components (text, lines, etc.) Regards, Enrique -- Date: Tue, 7 Jul 2009 13:38:48 -0700 (PDT) From: Thomas aikto...@yahoo.com Subject: [R] Dump plots to powerpoint? To: r-help@r-project.org Message-ID: 254923.51562...@web110511.mail.gq1.yahoo.com Content-Type: text/plain Hi, Is it possible to dump a series of plots directly into a powerpoint presentation (as is possible in Splus)? Thank you, Thomas [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] system() how to make a program run a specific file
After all it's very easy: system(paste('C:\\Program Files (x86)\\IrfanView\\i_view32.exe','A:\\test.jpg')) Paulo E. Cardoso -Mensagem original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Em nome de Paulo E. Cardoso Enviada: quarta-feira, 8 de Julho de 2009 10:59 Para: r-help@r-project.org Cc: r-h...@stat.math.ethz.ch Assunto: [R] system() how to make a program run a specific file I'd like to know how to call a program to run or open a specific file. something like this: system('C:\\Program Files (x86)\\IrfanView\\i_view32.exe','-A:\\ teste.jpg') is not working. any help will be appreciated Paulo E. Cardoso [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. Checked by AVG - www.avg.com Version: 8.5.375 / Virus Database: 270.13.8/2223 - Release Date: 07/07/09 17:54:00 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fitting a trend-line
anupam sinha wrote: Hi all, I am new to R. How does one go about fitting a trend-line to a scatter plot? Any help is appreciated. Hi Anupam, Have a look at the help page for the abline function in the graphics package. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] stats::decompose - Problem finding seasonal component without trend
Hi R-help, I'd like to extract the seasonal component of a short timeseries, and was hoping to use stats::decompose. I don't want to decompose the 'trend' component so I thought I should call decompose(x,filter=0). I think I've either misunderstood the filter argument or come upon a bug/feature in decompose. # EXAMPLE x-ts(c(2:12,rep(1,12),1:12),start=c(2009,2),frequency=12);x # Starts in Feb # Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec #2009 2 3 4 5 6 7 8 9 10 11 12 #2010 1 1 1 1 1 1 1 1 1 1 1 1 #2011 1 2 3 4 5 6 7 8 9 10 11 12 decompose(x) #ok, got some answer for seasonal component, but I don't want to split the residual into trend and random. decompose(x,filter=0) #this seems broken, ignoring some of the data in seasonal calculation, and losing some points in the random component # END EXAMPLE I've debug-stepped through decompose and, as far as I can understand the manipulation, it appears to ignore the first and last period. And only the middle 12 points (all 1 in my example) are used in the calculation of the seasonal averages. Unrelated, but it also seems to duplicate one value during the calculation, and throw a warning due to a seemingly unnecessary 'end' argument to window. I can probably get away with using some function like sweep or scale instead, but please let me know if I'm just misusing decompose. If it's a bug, I hope the above helps.. Regards, Mike P.S. I see this comment in the R 2.8.0 release notes: o HoltWinters() and decompose() use a (statistically) more efficient computation for seasonal fits (they used to waste one period). I'm on R 2.80: _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 8.0 year 2008 month 10 day20 svn rev46754 language R version.string R version 2.8.0 (2008-10-20) -- View this message in context: http://www.nabble.com/stats%3A%3AdecomposeProblem-finding-seasonal-component-without-trend-tp24389771p24389771.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Import xlsx file in Ubuntu 9.04
Hi list, By the entire last 2 weeks I was looking for a way to directly import xlsx files to R in a Linux OS (Ubuntu 9.04). I already read the R Import/Export guide, and I know how to use gdata to import xls files and read.table to import .csv. My problem is that all data that I receive is in the xlsx format, and I have to convert all the files to xls. Well, when I was using Windows Vista OS, RODBC did the trick with the odbcConnectExcel2007 function (which I know is not present in the Linux RODBC package, probably due to drivers issue). Isn't there a way to import this xlsx files directly to R without any previous conversion (.csv or .xls)? Thank you for the attention, it's probable that some one already asked it. I even remember seen that somewhere, but without a definitive answer. Rodrigo. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R 2.9.0 plot still forcing current time zone
Try this: set the timezone to what you want before plotting: tzsave - Sys.getenv(TZ) # save current Sys.setenv(TZ=GMT) # set to whatever plot(x,rep(1,11)) # plot Sys.setenv(TZ=tzsave) # restore plot(x,rep(1,11)) # plot in original time zone On Wed, Jul 8, 2009 at 2:21 AM, Britton Stephenssteph...@ucar.edu wrote: the help page for plot.POSIXct says As from R 2.9.0 the date-times for a 'POSIXct' input are interpreted in the timwzonw give by the 'tzone' attribute it there is one, otherwise the current timezone. (Earlier vrsions always used the current timezone.) however I am using 2.9.0 on linux and the following still happily produces an x-axis in local (MDT) time x=strptime(paste('09-01-01 00:00:00',sep=''),format='%y-%m-%d %H:%M:%S',tz=GMT)+60*60*24*(seq(0.5,1.5,.1)) x [1] 2009-01-01 12:00:00 GMT 2009-01-01 14:24:00 GMT [3] 2009-01-01 16:48:00 GMT 2009-01-01 19:12:00 GMT [5] 2009-01-01 21:36:00 GMT 2009-01-02 00:00:00 GMT [7] 2009-01-02 02:24:00 GMT 2009-01-02 04:48:00 GMT [9] 2009-01-02 07:12:00 GMT 2009-01-02 09:36:00 GMT [11] 2009-01-02 12:00:00 GMT attributes(x) $class [1] POSIXt POSIXct $tzone [1] GMT plot(x,rep(1,11)) Is this a bug, or am I missing something? Thanks a lot! Britt -- Britton B. Stephens National Center for Atmospheric Research P.O. Box 3000, 1850 Table Mesa Drive Boulder, CO 80307-3000 Phone: (303) 497-1018 Fax: (303) 497-1092 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] linear regression and testing the slope
Dear All, First of all I would like to say I do not have much knowledge about this subject, so most of you can find it really easy. I am doing a linear regression and I want to test if the slope of the curve is 0. R gives the summary statistics: Call: lm(formula = x ~ s) Residuals: Min1QMedian3Q Max -0.025096 -0.020316 -0.001203 0.011658 0.044970 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 0.005567 0.016950 0.3280.750 s -0.001599 0.002499 -0.6400.538 Residual standard error: 0.02621 on 9 degrees of freedom Multiple R-squared: 0.04352,Adjusted R-squared: -0.06276 F-statistic: 0.4095 on 1 and 9 DF, p-value: 0.5382 what is this t-value for? The explanation in the help file was unfortunately not clear to me. How can I test my hypotheses that if the slope is 0? Thank you in advance, regards, Evrim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] transform multi skew-t to uniform distribution
RHRPO RHRPO Hi R-users, RHRPO _I have a data from multi skew t and would like to transform each of the data to uniform data._ I tried using 'pmst' but only got one output: RHRPO _ RHRPO rr1 - as.vector(r1);rr1 RHRPO _[1]_ 0.7207582_ 5.2250906_ 1.7422237_ 0.5677233_ 0.7473555 -0.6020626 -2.1947872 -1.1128313 -0.6587316 -1.1409261 RHRPO _ RHRPO _ RHRPO pmst(rr1, xi=rep(0,10), Omega=diag(10), alpha=rep(1,10), df=5) RHRPO [1] 3.676525e-09 you are computing a 10-dimensional distribution function at a a 10-dimensional point; so you get a single number out -- this is as expected. I presume that actually you want to compute a 1-dimensional distribution at 10 different points, which is achieved by pst(rr1, dp=c(0,1,1,5)) [1] 0.564580 0.996707 0.867177 0.497123 0.575915 0.085922 0.004127 0.030807 [9] 0.076839 0.029117 Best regards, Adelchi Azzalini -- Adelchi Azzalini azzal...@stat.unipd.it Dipart.Scienze Statistiche, Università di Padova, Italia tel. +39 049 8274147, http://azzalini.stat.unipd.it/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ReShape to create Time from Observations?
On Tue, Jul 7, 2009 at 4:22 PM, jim holtmanjholt...@gmail.com wrote: Does something like this work for you; it uses the reshape package: X-data.frame(A=1:10, B=0, C=1, Ob1=1:10, Ob2=2:11, Ob3=3:12, + Ob4=4:13, Ob5=3:12, Ob6=2:11) Y-data.frame(A=1:20, B=0, C=1, D=5, Ob1=1:10, Ob2=2:11, Ob3=3:12, + Ob4=4:13, Ob5=3:12, Ob6=2:11, Ob7=5:9) Z-data.frame(A=1:30, B=0, C=1, D=6, E=1:2, Ob1=1:10, Ob2=2:11, + Ob3=3:12, Ob4=4:13, Ob5=3:12, Ob6=2:11, Ob7=1:10, Ob8=3:12) f.melt - + function(df) + { + # get the starting column number of Ob1, then extend for rest of columns + require(reshape) + melt(df, measure=seq(match(Ob1, names(df)), ncol(df))) + } x.m - f.melt(X) y.m - f.melt(Y) z.m - f.melt(Z) # sample data head(x.m, 20) A B C variable value 1 1 0 1 Ob1 1 2 2 0 1 Ob1 2 3 3 0 1 Ob1 3 4 4 0 1 Ob1 4 5 5 0 1 Ob1 5 6 6 0 1 Ob1 6 7 7 0 1 Ob1 7 8 8 0 1 Ob1 8 9 9 0 1 Ob1 9 10 10 0 1 Ob1 10 11 1 0 1 Ob2 2 12 2 0 1 Ob2 3 13 3 0 1 Ob2 4 14 4 0 1 Ob2 5 15 5 0 1 Ob2 6 16 6 0 1 Ob2 7 17 7 0 1 Ob2 8 18 8 0 1 Ob2 9 19 9 0 1 Ob2 10 20 10 0 1 Ob2 11 SNIP Jim, It wasn't exactly what I was looking for but I think the ideas plus a bit of off-list help from another member helped me get much closer. The idea of using match is very helpful in my case because I'm able to leverage the fact that in my data files everything to the right is also an observation to easily create list to the end of the row. Try the following: X-data.frame(A=1:10, B=0, C=1, Ob1=1:10, Ob2=2:11, Ob3=3:12,Ob4=4:13, Ob5=3:12, Ob6=2:11) BrkPnt-match(Ob1,names(X)) Ob_Group - list(names(X)[BrkPnt:ncol(X)]) # Give to reshape to turn ObX into time answerX1- reshape(X, varying=Ob_Group, direction='long') and at this point I can subset based on id or some other variable: subset(answerX1, A==1) A B C time Ob1 id 1.1 1 0 11 1 1 1.2 1 0 12 2 1 1.3 1 0 13 3 1 1.4 1 0 14 4 1 1.5 1 0 15 3 1 1.6 1 0 16 2 1 I *think* this is data that I can sent to matplot/qplot and get charts that I'm interested in. I'll work on that today to verify but it looks about right to me using this simple case: PlotData-subset(answerX1, A==1) matplot(PlotData$time,PlotData$Ob1) I really like the match idea. The first observation should generally be in about the first 20 columns of my files which can potentially be thousands of columns wide. There's no reason in my case to match every other column to the right as I already know they will match. I can get a list of all the observations with BrkPnt:ncol(X) or all the independent variables using 1:BrkPnt-1. I could also, if I chose, extract a specific group of observations by matching Ob20 and Ob40 to potentially find observations taken in a certain time period every day, etc. Nice! I'll put it back in a function as you did for use in my actual code. Cheers, Mark -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RDCOMClient: how to close Excel process?
Hi, I’m using R package RDCOMClient (http://www.omegahat.org/RDCOMClient/) to retrieve data from MS Excel workbook. I’m using the code below to count the number of sheets in the workbook and then loop the data from sheets in to a list. # R code ### library(gdata) library(RDCOMClient) xl - COMCreate(Excel.Application) sh - xl$Workbooks()$Open(normalizePath(sample_file.xls))$Sheets()$Count() DF.list - list() for (i in 1:sh) { DF.list[[i]] - read.xls(sample_file.xls, sheet=i, stringsAsFactors = FALSE) } ## COMCreate opens Excel process and it can be seen from Windows Task Manager. When I try to open sample_file.xls in Excel, it just flashes in the screen and shuts down. When I kill (via task manager) the Excel process COMCreate started, sample_file.xls will open normally. The question is, how can I close the Excel process COMCreate started. xl$Close() doesn’t seem to work. The same problem have been presented in this post to R-help: http://tolstoy.newcastle.edu.au/R/help/06/04/25990.html -L __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R regular expression to extract words with the query string.
Hi, Is there a way in R to get the string which matches the expression, where the expression is a substring of the parent string. Lets say, I have $i - transcript:ENST112334 pid:ENSP12345 What I need is the string pid:ENSP12345 from $i using the query ENSP. Appreciate your comments. Praveen Surendran School of Medicine and Medical Sciences University College Dublin Belfiled, Dublin 4 Ireland. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] system() how to make a program run a specific file - RUN and Output directory issues
I have a particular case where the program I'm calling needs a additional instructions, to click a RUN button and set a output directory. Could these options be controlled with system() function? Paulo E. Cardoso -Mensagem original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Em nome de Paulo E. Cardoso Enviada: quarta-feira, 8 de Julho de 2009 12:08 Para: r-help@r-project.org Cc: r-h...@stat.math.ethz.ch Assunto: Re: [R] system() how to make a program run a specific file After all it's very easy: system(paste('C:\\Program Files (x86)\\IrfanView\\i_view32.exe','A:\\test.jpg')) Paulo E. Cardoso -Mensagem original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] Em nome de Paulo E. Cardoso Enviada: quarta-feira, 8 de Julho de 2009 10:59 Para: r-help@r-project.org Cc: r-h...@stat.math.ethz.ch Assunto: [R] system() how to make a program run a specific file I'd like to know how to call a program to run or open a specific file. something like this: system('C:\\Program Files (x86)\\IrfanView\\i_view32.exe','-A:\\ teste.jpg') is not working. any help will be appreciated Paulo E. Cardoso [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. Checked by AVG - www.avg.com Version: 8.5.375 / Virus Database: 270.13.8/2223 - Release Date: 07/07/09 17:54:00 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. Checked by AVG - www.avg.com Version: 8.5.375 / Virus Database: 270.13.8/2223 - Release Date: 07/07/09 17:54:00 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RDCOMClient: how to close Excel process?
Try this: xl$Quit() On Wed, Jul 8, 2009 at 10:06 AM, Lauri Nikkinen lauri.nikki...@iki.fiwrote: Hi, Im using R package RDCOMClient (http://www.omegahat.org/RDCOMClient/) to retrieve data from MS Excel workbook. Im using the code below to count the number of sheets in the workbook and then loop the data from sheets in to a list. # R code ### library(gdata) library(RDCOMClient) xl - COMCreate(Excel.Application) sh - xl$Workbooks()$Open(normalizePath(sample_file.xls))$Sheets()$Count() DF.list - list() for (i in 1:sh) { DF.list[[i]] - read.xls(sample_file.xls, sheet=i, stringsAsFactors = FALSE) } ## COMCreate opens Excel process and it can be seen from Windows Task Manager. When I try to open sample_file.xls in Excel, it just flashes in the screen and shuts down. When I kill (via task manager) the Excel process COMCreate started, sample_file.xls will open normally. The question is, how can I close the Excel process COMCreate started. xl$Close() doesnt seem to work. The same problem have been presented in this post to R-help: http://tolstoy.newcastle.edu.au/R/help/06/04/25990.html -L __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] linear regression and testing the slope
On 08-Jul-09 12:29:40, evrim akar wrote: Dear All, First of all I would like to say I do not have much knowledge about this subject, so most of you can find it really easy. I am doing a linear regression and I want to test if the slope of the curve is 0. R gives the summary statistics: Call: lm(formula = x ~ s) Residuals: Min1QMedian3Q Max -0.025096 -0.020316 -0.001203 0.011658 0.044970 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 0.005567 0.016950 0.3280.750 s -0.001599 0.002499 -0.6400.538 Residual standard error: 0.02621 on 9 degrees of freedom Multiple R-squared: 0.04352,Adjusted R-squared: -0.06276 F-statistic: 0.4095 on 1 and 9 DF, p-value: 0.5382 what is this t-value for? The explanation in the help file was unfortunately not clear to me. How can I test my hypotheses that if the slope is 0? Thank you in advance, regards, Evrim The quantity 't' is the estimated value (-0.001599 for the slope 's') divided by its estimated standard error (0.002499). Taking the values as reported by the summary: t = -0.001599/0.002499 = -0.639856 which R has reported (to 3 significant figures) as -0.640 The Pr(|t|) is the probability, assuming the null hypothesis that the slope (coefficient of 's') is zero, that data could arise at random giving rise to a t-value which, in absolute value, would exceed the absolute value |t| = |-0.639856| = 0.639856 which you got from your data. The relevance of this for testing the hypothesis that the slope is 0 is that, if the slope really is 0, then large values (either way) of the coefficient of 's' (reported by R as Estimate) are unlikely. So if you got a value of Pr(|t|) which was small (conventionally less that 0.05, or 0.01, etc.) then you would have a value so large that getting a value at least as large as this if the hypothesis were true would be unlikely. Therefore it would be more plausible that the null hypothesis was false. In your case, the P-value Pr(|t|) = 0.538, so you would be more likely than not to get an estimate at least as deviant from 0 as the one you did get, if the null hypothesis were true. Hence the data do not provide grounds for rejecting the null hypothesis. Note that not having grounds for rejection does not mean that you must accept it: a non-signifcant t-value is not proof that the null hypothesis is true. There is a good basic outline of the t-test in the Wikipedia article Student's t-test: http://en.wikipedia.org/wiki/Student%27s_t-test Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 08-Jul-09 Time: 14:17:52 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R regular expression to extract words with the query string.
Try this: sapply(strsplit(i, ' '), grep, pattern='ENSP', value = T) On Wed, Jul 8, 2009 at 10:04 AM, Praveen Surendran praveen.surend...@ucd.ie wrote: Hi, Is there a way in R to get the string which matches the expression, where the expression is a substring of the parent string. Lets say, I have $i - transcript:ENST112334 pid:ENSP12345 What I need is the string pid:ENSP12345 from $i using the query ENSP. Appreciate your comments. Praveen Surendran School of Medicine and Medical Sciences University College Dublin Belfiled, Dublin 4 Ireland. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RDCOMClient: how to close Excel process?
Then, you can try this: xl - COMCreate(Excel.Application) wk - xl$Workbooks() sh - wk$Open(normalizePath(sample_file.xls))$Sheets()$Count() wk$Close() xl$Quit() On Wed, Jul 8, 2009 at 10:19 AM, Lauri Nikkinen lauri.nikki...@iki.fiwrote: Thanks but that did not work. xl$Quit() does not kill the Excel process and sample_file.xls will not open. I'm using Windows XP SP2 and R 2.8.1 -L 2009/7/8 Henrique Dallazuanna www...@gmail.com: Try this: xl$Quit() On Wed, Jul 8, 2009 at 10:06 AM, Lauri Nikkinen lauri.nikki...@iki.fi wrote: Hi, Im using R package RDCOMClient (http://www.omegahat.org/RDCOMClient/) to retrieve data from MS Excel workbook. Im using the code below to count the number of sheets in the workbook and then loop the data from sheets in to a list. # R code ### library(gdata) library(RDCOMClient) xl - COMCreate(Excel.Application) sh - xl$Workbooks()$Open(normalizePath(sample_file.xls))$Sheets()$Count() DF.list - list() for (i in 1:sh) { DF.list[[i]] - read.xls(sample_file.xls, sheet=i, stringsAsFactors = FALSE) } ## COMCreate opens Excel process and it can be seen from Windows Task Manager. When I try to open sample_file.xls in Excel, it just flashes in the screen and shuts down. When I kill (via task manager) the Excel process COMCreate started, sample_file.xls will open normally. The question is, how can I close the Excel process COMCreate started. xl$Close() doesnt seem to work. The same problem have been presented in this post to R-help: http://tolstoy.newcastle.edu.au/R/help/06/04/25990.html -L __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RDCOMClient: how to close Excel process?
Thanks but that did not work. xl$Quit() does not kill the Excel process and sample_file.xls will not open. I'm using Windows XP SP2 and R 2.8.1 -L 2009/7/8 Henrique Dallazuanna www...@gmail.com: Try this: xl$Quit() On Wed, Jul 8, 2009 at 10:06 AM, Lauri Nikkinen lauri.nikki...@iki.fi wrote: Hi, I’m using R package RDCOMClient (http://www.omegahat.org/RDCOMClient/) to retrieve data from MS Excel workbook. I’m using the code below to count the number of sheets in the workbook and then loop the data from sheets in to a list. # R code ### library(gdata) library(RDCOMClient) xl - COMCreate(Excel.Application) sh - xl$Workbooks()$Open(normalizePath(sample_file.xls))$Sheets()$Count() DF.list - list() for (i in 1:sh) { DF.list[[i]] - read.xls(sample_file.xls, sheet=i, stringsAsFactors = FALSE) } ## COMCreate opens Excel process and it can be seen from Windows Task Manager. When I try to open sample_file.xls in Excel, it just flashes in the screen and shuts down. When I kill (via task manager) the Excel process COMCreate started, sample_file.xls will open normally. The question is, how can I close the Excel process COMCreate started. xl$Close() doesn’t seem to work. The same problem have been presented in this post to R-help: http://tolstoy.newcastle.edu.au/R/help/06/04/25990.html -L __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R regular expression to extract words with the query string.
Thanks Henrique. This is indeed short and quite simple compared to what I was using which goes like... unlist(strsplit(i,split= ))[grep(ENSP,unlist(strsplit(i,split= )))] J Cheers, Praveen. From: Henrique Dallazuanna [mailto:www...@gmail.com] Sent: 08 July 2009 14:18 To: praveen.surend...@ucd.ie Cc: r-help@r-project.org Subject: Re: [R] R regular expression to extract words with the query string. Try this: sapply(strsplit(i, ' '), grep, pattern='ENSP', value = T) On Wed, Jul 8, 2009 at 10:04 AM, Praveen Surendran praveen.surend...@ucd.ie wrote: Hi, Is there a way in R to get the string which matches the expression, where the expression is a substring of the parent string. Lets say, I have $i - transcript:ENST112334 pid:ENSP12345 What I need is the string pid:ENSP12345 from $i using the query ENSP. Appreciate your comments. Praveen Surendran School of Medicine and Medical Sciences University College Dublin Belfiled, Dublin 4 Ireland. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error: no such index at level 2
On Wed, Jul 8, 2009 at 4:22 AM, Petr PIKAL petr.pi...@precheza.cz wrote: Hi r-help-boun...@r-project.org napsal dne 07.07.2009 19:06:17: Hi, I am confused about how to select elements from a list. I'm trying to select all rows of a table 'crossRsorted' such that the mean of a related vector is 0. The related vector is accessible as a list element l[[i]] where i is the row index. I thought this would work: crossRsorted[mean(q[[ crossRsorted[,1] ]], na.rm = TRUE) 0, ] Error in q[[crossRsorted[, 1]]] : no such index at level 2 Strange, I got completely different error. Couldn't be that only ***you*** have crossRsorted? Ok, fair enough. I'm still thinking of a language in which the meaning of operators is apparent from their syntactical structure - probably need to read more of The R Inferno. Here's an example that reproduces the problem, I think (though the error message is slightly different): q-list() q[[105]] - as.numeric(c(0,0,1)) q[[104]] - as.numeric(c(1,1,1)) q[[10]] - as.integer(c(3,3,1)) crossRsorted - data.frame(i = c(105, 104,10)) q[[ crossRsorted[,1] ]] Error in q[[crossRsorted[, 1]]] : recursive indexing failed at level 2 Even though the list 'q' has component 105, 104, and 10, the expression q[[ crossRsorted[,1] ]] causes an error. Why? And why does this work: q[[c(105)]] [1] 0 0 1 but not this: q[[c(105,104)]] Error in q[[c(105, 104)]] : subscript out of bounds q[[c(105,104,10)]] Error in q[[c(105, 104, 10)]] : recursive indexing failed at level 2 even though q[[105]], q[[104], and q[[10]] are perfectly legitimate items? Coming back to my question, how to I express select all i in a vector for which q[[i]] meets some predicate, where q is a list? Thank you for the tip about 'str' - that's the typeof function I've been craving. (I thought 'attributes' or 'summary' was all there was...) The output for str in the original problem: In my original problem, the output is: str(crossRsorted) 'data.frame': 15750 obs. of 5 variables: $ i : num 105 104 9 8 10 9 98 97 10 8 ... $ j : num 104 105 8 9 9 10 97 98 8 10 ... $ r : num -0.973 -0.973 0.764 0.764 0.744 ... $ n : num 135 135 138 138 138 138 136 136 138 138 ... $ pvalue: num 2.90e-86 2.90e-86 0.00 0.00 0.00 ... and str(q) List of 165 $ : NULL $ : NULL $ : NULL $ : NULL $ :'data.frame': 138 obs. of 1 variable: ..$ howdidyouhear: chr [1:138] 0 3 3 3 3 ... $ :'data.frame': 138 obs. of 1 variable: ..$ approximatelywhendidyoustart: int [1:138] 0 0 5 1 5 5 1 2 6 0 ... [ main body deleted ] $ :'data.frame': 138 obs. of 1 variable: ..$ revisiontestpage: num [1:138] 0 0 0 0 0 0 0 0 0 0 ... basically - a heterogeneous sparse list of NULL and data.frames of types character, num, and int. However - by construction - the q[[i]] for i in crossRsorted[,1] are all non-NULL, as in my small reproducible example above. with data frame and list df1[sapply(list1,mean)0,] selects rows of df1 which correspond to list elements with mean 0 I can't run 'sapply' over my list because sapply will also iterate over the NULLs. I want to access only those components in list1 that occur in df1[1,]. - Godmar [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error: no such index at level 2
Its because '[[' accept only element, so you need use '[': q[crossRsorted[,1]] On Wed, Jul 8, 2009 at 10:28 AM, Godmar Back god...@gmail.com wrote: On Wed, Jul 8, 2009 at 4:22 AM, Petr PIKAL petr.pi...@precheza.cz wrote: Hi r-help-boun...@r-project.org napsal dne 07.07.2009 19:06:17: Hi, I am confused about how to select elements from a list. I'm trying to select all rows of a table 'crossRsorted' such that the mean of a related vector is 0. The related vector is accessible as a list element l[[i]] where i is the row index. I thought this would work: crossRsorted[mean(q[[ crossRsorted[,1] ]], na.rm = TRUE) 0, ] Error in q[[crossRsorted[, 1]]] : no such index at level 2 Strange, I got completely different error. Couldn't be that only ***you*** have crossRsorted? Ok, fair enough. I'm still thinking of a language in which the meaning of operators is apparent from their syntactical structure - probably need to read more of The R Inferno. Here's an example that reproduces the problem, I think (though the error message is slightly different): q-list() q[[105]] - as.numeric(c(0,0,1)) q[[104]] - as.numeric(c(1,1,1)) q[[10]] - as.integer(c(3,3,1)) crossRsorted - data.frame(i = c(105, 104,10)) q[[ crossRsorted[,1] ]] Error in q[[crossRsorted[, 1]]] : recursive indexing failed at level 2 Even though the list 'q' has component 105, 104, and 10, the expression q[[ crossRsorted[,1] ]] causes an error. Why? And why does this work: q[[c(105)]] [1] 0 0 1 but not this: q[[c(105,104)]] Error in q[[c(105, 104)]] : subscript out of bounds q[[c(105,104,10)]] Error in q[[c(105, 104, 10)]] : recursive indexing failed at level 2 even though q[[105]], q[[104], and q[[10]] are perfectly legitimate items? Coming back to my question, how to I express select all i in a vector for which q[[i]] meets some predicate, where q is a list? Thank you for the tip about 'str' - that's the typeof function I've been craving. (I thought 'attributes' or 'summary' was all there was...) The output for str in the original problem: In my original problem, the output is: str(crossRsorted) 'data.frame': 15750 obs. of 5 variables: $ i : num 105 104 9 8 10 9 98 97 10 8 ... $ j : num 104 105 8 9 9 10 97 98 8 10 ... $ r : num -0.973 -0.973 0.764 0.764 0.744 ... $ n : num 135 135 138 138 138 138 136 136 138 138 ... $ pvalue: num 2.90e-86 2.90e-86 0.00 0.00 0.00 ... and str(q) List of 165 $ : NULL $ : NULL $ : NULL $ : NULL $ :'data.frame': 138 obs. of 1 variable: ..$ howdidyouhear: chr [1:138] 0 3 3 3 3 ... $ :'data.frame': 138 obs. of 1 variable: ..$ approximatelywhendidyoustart: int [1:138] 0 0 5 1 5 5 1 2 6 0 ... [ main body deleted ] $ :'data.frame': 138 obs. of 1 variable: ..$ revisiontestpage: num [1:138] 0 0 0 0 0 0 0 0 0 0 ... basically - a heterogeneous sparse list of NULL and data.frames of types character, num, and int. However - by construction - the q[[i]] for i in crossRsorted[,1] are all non-NULL, as in my small reproducible example above. with data frame and list df1[sapply(list1,mean)0,] selects rows of df1 which correspond to list elements with mean 0 I can't run 'sapply' over my list because sapply will also iterate over the NULLs. I want to access only those components in list1 that occur in df1[1,]. - Godmar [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] functions to calculate t-stats, etc. for lm.fit objects?
I'm running a huge number of regressions in a loop, so I tried lm.fit for a speedup. However, I would like to be able to calculate the t-stats for the coefficients. Does anyone have some functions for calculating the regression summary stats of an lm.fit object? Thanks, Whit __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error: no such index at level 2
On Wed, Jul 8, 2009 at 9:40 AM, Henrique Dallazuanna www...@gmail.comwrote: Its because '[[' accept only element, so you need use '[': q[crossRsorted[,1]] This appears to be doing something different. For instance, my 'q' has 165 components, but what you suggest has 15750: length(q) [1] 165 length(q[ crossRsorted[,1] ]) [1] 15750 hardly what I want. Meanwhile, it looks as though [[ ]] does not vectorize its arguments, it curries them! Note that: q[[c(105,104)]] Error in q[[c(105, 104)]] : subscript out of bounds gives the same error as: q[[105]][[104]] Error in q[[105]][[104]] : subscript out of bounds Very mysterious, though, in all fairness, explained in help([[) where it says: '[[' can be applied recursively to lists, so that if the single index 'i' is a vector of length 'p', 'alist[[i]]' is equivalent to 'alist[[i1]]...[[ip]]' providing all but the final indexing results in a list. which leads to square one: how to express select all r[i] where q[[i]] fulfills some predicate? - Godmar [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Import xlsx file in Ubuntu 9.04
On Jul 8, 2009, at 6:56 AM, Rodrigo Aluizio wrote: Hi list, By the entire last 2 weeks I was looking for a way to directly import xlsx files to R in a Linux OS (Ubuntu 9.04). I already read the R Import/ Export guide, and I know how to use gdata to import xls files and read.table to import .csv. My problem is that all data that I receive is in the xlsx format, and I have to convert all the files to xls. Well, when I was using Windows Vista OS, RODBC did the trick with the odbcConnectExcel2007 function (which I know is not present in the Linux RODBC package, probably due to drivers issue). Isn't there a way to import this xlsx files directly to R without any previous conversion (.csv or .xls)? Thank you for the attention, it's probable that some one already asked it. I even remember seen that somewhere, but without a definitive answer. Rodrigo. Your best bet on Linux would be to open the Excel 2007 files using OpenOffice's Calc and save them to CSV files. The latest versions of OpenOffice will open Office 2007 files. An alternative of course would be to see if it is reasonable for the providers of the files to save them in the older XLS format instead, or to see if they have other file formats that they can send you rather than using Excel at all. There is a very preliminary Perl module in progress, that should eventually provide for a more efficient path: http://search.cpan.org/dist/Spreadsheet-XLSX/ But from what I have seen, there are enough problems with it (including data integrity issues), that I would not use it in production work. Unfortunately, I don't believe that you have a lot of options on Linux at the moment. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading from Google Docs
I have previously read R Installation and Administration. I read it again. It does not help me The relevant paragraph is below. But I need lower level instructions. Where can I find them. R CMD INSTALL works in Windows to install source packages if you have the source-code package files (option Source Package Installation Files in the installer) and toolset (see The Windows toolsetfile:///C:/Program%20Files/R/R-2.9.1/doc/manual/R-admin.html#The-Windows-toolset) installed. Installation of binary packages must be done by install.packages . R CMD INSTALL --help will tell you the current options under Windows (which differ from those on a Unix-alike): in particular there is a choice of the types of documentation to be installed. Farrel Buchinsky Google Voice Tel: (412) 567-7870 2009/6/19 Uwe Ligges lig...@statistik.tu-dortmund.de See the manual R Installation and Administration for information on how to install source packages on Windows. Uwe Ligges Farrel Buchinsky wrote: After issuing tar xvfz RgoogleDocs_0.2.2-src.tar.gzI am getting an error message 'tar' is not recongnized as an internal or external command, operable program or batch file. Should I use my 7-zip to open up the archive? Where should I be doing this? For instance can I do it all in my download directory or should I do it in C:\Program Files\R\R-2.9.0\library or should I manually create C:\Program Files\R\R-2.9.0\library\RGoogleDocs and do it all there or will the Rcmd INSTALL RGoogleDocs_0.2-2.tar.gz command do that for me. Yes, you assumed correctly. I am using Windows XP. Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Thu, Jun 18, 2009 at 20:17, Gabor Grothendieck ggrothendi...@gmail.comwrote: I have haven't neen following this thread but: 1. if RGoogleDocs_0.2-2.tar.gz is a source distribution (as opposed to built source) then the first line renames it so that its not the same name as the built file about to be created. The second line detars it into the RGoogleDocs directory. The third builds the built source file, RGoogleDocs_0.2-2.tar.gz. The fourth installs the built source file into R. I've assumed Windows. If you are on Linux replace rename with mv. rename RGoogleDocs_0.2-2.tar.gz RgoogleDocs_0.2.2-src.tar.gz tar xvfz RgoogleDocs_0.2.2-src.tar.gz Rcmd build RGoogleDocs Rcmd INSTALL RGoogleDocs_0.2-2.tar.gz or 2. if RGoogleDocs_0.2-2.tar.gz is already a built source file then you can just issue the last of the above lines and don't need the others. On Thu, Jun 18, 2009 at 7:52 PM, Farrel Buchinskyfjb...@gmail.com wrote: What do you mean by cd the.directory.containing.RGoogleDocs Do you mean the directory where I downloaded the RGoogleDocs_0.2-2.tar.gz to? Or do you mean that I must create a directory called RGoogleDocs under Library and then change to that directory? Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Mon, Mar 2, 2009 at 22:16, Gabor Grothendieck ggrothendi...@gmail.com wrote: Finally enter into the Windows console: cd the.directory.containing.RGoogleDocs Rcmd build RGoogleDocs Rcmd INSTALL RGoogleDocs_1.0.0.tar.gz except replace RGoogleDocs_1.0.0.tar.gz with the filename created by the build. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R regular expression to extract words with the query string.
Dear Praveen, Try also: strsplit(i,' ')[[1]][2] # [1] pid:ENSP12345 HTH, Jorge On Wed, Jul 8, 2009 at 9:04 AM, Praveen Surendran praveen.surend...@ucd.iewrote: Hi, Is there a way in R to get the string which matches the expression, where the expression is a substring of the parent string. Lets say, I have $i - transcript:ENST112334 pid:ENSP12345 What I need is the string pid:ENSP12345 from $i using the query ENSP. Appreciate your comments. Praveen Surendran School of Medicine and Medical Sciences University College Dublin Belfiled, Dublin 4 Ireland. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R regular expression to extract words with the query string.
Try this: library(gsubfn) i - transcript:ENST112334 pid:ENSP12345 strapply(i, paste(\\w*, ENSP, \\w*, sep = ), c, simplify = unlist) This says to match any number (possibly zero) of word characters followed by ENSP followed by more word characters. c just returns the match without further processing and unlist unlists the result giving a character vector (which otherwise would be a list). See http://gsubfn.googlecode.com for more info. On Wed, Jul 8, 2009 at 9:04 AM, Praveen Surendranpraveen.surend...@ucd.ie wrote: Hi, Is there a way in R to get the string which matches the expression, where the expression is a substring of the parent string. Lets say, I have $i - transcript:ENST112334 pid:ENSP12345 What I need is the string pid:ENSP12345 from $i using the query ENSP. Appreciate your comments. Praveen Surendran School of Medicine and Medical Sciences University College Dublin Belfiled, Dublin 4 Ireland. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading from Google Docs
On 08/07/2009 10:02 AM, Farrel Buchinsky wrote: I have previously read R Installation and Administration. I read it again. It does not help me The relevant paragraph is below. But I need lower level instructions. Where can I find them. Follow the link. If Windows can't find tar, your toolset is installed incorrectly. Duncan Murdoch R CMD INSTALL works in Windows to install source packages if you have the source-code package files (option “Source Package Installation Files” in the installer) and toolset (see The Windows toolsetfile:///C:/Program%20Files/R/R-2.9.1/doc/manual/R-admin.html#The-Windows-toolset) installed. Installation of binary packages must be done by install.packages . R CMD INSTALL --help will tell you the current options under Windows (which differ from those on a Unix-alike): in particular there is a choice of the types of documentation to be installed. Farrel Buchinsky Google Voice Tel: (412) 567-7870 2009/6/19 Uwe Ligges lig...@statistik.tu-dortmund.de See the manual R Installation and Administration for information on how to install source packages on Windows. Uwe Ligges Farrel Buchinsky wrote: After issuing tar xvfz RgoogleDocs_0.2.2-src.tar.gzI am getting an error message 'tar' is not recongnized as an internal or external command, operable program or batch file. Should I use my 7-zip to open up the archive? Where should I be doing this? For instance can I do it all in my download directory or should I do it in C:\Program Files\R\R-2.9.0\library or should I manually create C:\Program Files\R\R-2.9.0\library\RGoogleDocs and do it all there or will the Rcmd INSTALL RGoogleDocs_0.2-2.tar.gz command do that for me. Yes, you assumed correctly. I am using Windows XP. Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Thu, Jun 18, 2009 at 20:17, Gabor Grothendieck ggrothendi...@gmail.comwrote: I have haven't neen following this thread but: 1. if RGoogleDocs_0.2-2.tar.gz is a source distribution (as opposed to built source) then the first line renames it so that its not the same name as the built file about to be created. The second line detars it into the RGoogleDocs directory. The third builds the built source file, RGoogleDocs_0.2-2.tar.gz. The fourth installs the built source file into R. I've assumed Windows. If you are on Linux replace rename with mv. rename RGoogleDocs_0.2-2.tar.gz RgoogleDocs_0.2.2-src.tar.gz tar xvfz RgoogleDocs_0.2.2-src.tar.gz Rcmd build RGoogleDocs Rcmd INSTALL RGoogleDocs_0.2-2.tar.gz or 2. if RGoogleDocs_0.2-2.tar.gz is already a built source file then you can just issue the last of the above lines and don't need the others. On Thu, Jun 18, 2009 at 7:52 PM, Farrel Buchinskyfjb...@gmail.com wrote: What do you mean by cd the.directory.containing.RGoogleDocs Do you mean the directory where I downloaded the RGoogleDocs_0.2-2.tar.gz to? Or do you mean that I must create a directory called RGoogleDocs under Library and then change to that directory? Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Mon, Mar 2, 2009 at 22:16, Gabor Grothendieck ggrothendi...@gmail.com wrote: Finally enter into the Windows console: cd the.directory.containing.RGoogleDocs Rcmd build RGoogleDocs Rcmd INSTALL RGoogleDocs_1.0.0.tar.gz except replace RGoogleDocs_1.0.0.tar.gz with the filename created by the build. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading from Google Docs
Forgive my naivte, but how do I make windows find tar. In other words from where do I issue the command and what is the command. Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Wed, Jul 8, 2009 at 10:09, Duncan Murdoch murd...@stats.uwo.ca wrote: On 08/07/2009 10:02 AM, Farrel Buchinsky wrote: I have previously read R Installation and Administration. I read it again. It does not help me The relevant paragraph is below. But I need lower level instructions. Where can I find them. Follow the link. If Windows can't find tar, your toolset is installed incorrectly. Duncan Murdoch R CMD INSTALL works in Windows to install source packages if you have the source-code package files (option Source Package Installation Files in the installer) and toolset (see The Windows toolsetfile:///C:/Program%20Files/R/R-2.9.1/doc/manual/R-admin.html#The-Windows-toolset) installed. Installation of binary packages must be done by install.packages . R CMD INSTALL --help will tell you the current options under Windows (which differ from those on a Unix-alike): in particular there is a choice of the types of documentation to be installed. Farrel Buchinsky Google Voice Tel: (412) 567-7870 2009/6/19 Uwe Ligges lig...@statistik.tu-dortmund.de See the manual R Installation and Administration for information on how to install source packages on Windows. Uwe Ligges Farrel Buchinsky wrote: After issuing tar xvfz RgoogleDocs_0.2.2-src.tar.gzI am getting an error message 'tar' is not recongnized as an internal or external command, operable program or batch file. Should I use my 7-zip to open up the archive? Where should I be doing this? For instance can I do it all in my download directory or should I do it in C:\Program Files\R\R-2.9.0\library or should I manually create C:\Program Files\R\R-2.9.0\library\RGoogleDocs and do it all there or will the Rcmd INSTALL RGoogleDocs_0.2-2.tar.gz command do that for me. Yes, you assumed correctly. I am using Windows XP. Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Thu, Jun 18, 2009 at 20:17, Gabor Grothendieck ggrothendi...@gmail.comwrote: I have haven't neen following this thread but: 1. if RGoogleDocs_0.2-2.tar.gz is a source distribution (as opposed to built source) then the first line renames it so that its not the same name as the built file about to be created. The second line detars it into the RGoogleDocs directory. The third builds the built source file, RGoogleDocs_0.2-2.tar.gz. The fourth installs the built source file into R. I've assumed Windows. If you are on Linux replace rename with mv. rename RGoogleDocs_0.2-2.tar.gz RgoogleDocs_0.2.2-src.tar.gz tar xvfz RgoogleDocs_0.2.2-src.tar.gz Rcmd build RGoogleDocs Rcmd INSTALL RGoogleDocs_0.2-2.tar.gz or 2. if RGoogleDocs_0.2-2.tar.gz is already a built source file then you can just issue the last of the above lines and don't need the others. On Thu, Jun 18, 2009 at 7:52 PM, Farrel Buchinskyfjb...@gmail.com wrote: What do you mean by cd the.directory.containing.RGoogleDocs Do you mean the directory where I downloaded the RGoogleDocs_0.2-2.tar.gz to? Or do you mean that I must create a directory called RGoogleDocs under Library and then change to that directory? Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Mon, Mar 2, 2009 at 22:16, Gabor Grothendieck ggrothendi...@gmail.com wrote: Finally enter into the Windows console: cd the.directory.containing.RGoogleDocs Rcmd build RGoogleDocs Rcmd INSTALL RGoogleDocs_1.0.0.tar.gz except replace RGoogleDocs_1.0.0.tar.gz with the filename created by the build. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fitting a trend-line
Thanks a lot for all your suggestions. Regards, Anupam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] truncated regression out-of-sample predictions
Dear all, I am trying to implement Simar Wilson's (2007) second algorithm and have the following question: If I use a truncated regression on the mn observations, how do I get fitted values for all n observations, instead of for m observations, which is what the command fitted returns; I would need these to construct the left-truncation needed to draw n random deviates. Thanks for your help, Fleur Fleur Wouterse, Ph.D. Post-Doctoral Fellow IFPRI-Dakar Immeuble Ousseynou Thiam Gueye Rue de Thies Point E, BP 15702 CP 12524 Dakar Fann Senegal Phone: +221 33 869 3986 Email: f.woute...@cgiar.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] recoding strings containing colons
Curious to know if recode can work with strings containing colons. I haven't gotten it to work yet, but perhaps there is a way? Donald Braman http://www.culturalcognition.com/braman/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error: no such index at level 2
On Wed, Jul 8, 2009 at 9:40 AM, Henrique Dallazuanna www...@gmail.comwrote: Its because '[[' accept only element, so you need use '[': q[crossRsorted[,1]] Henrique, I figured out what q[crossRsorted[,1]] does - it produces q[i] for all i in crossRsorted[,1]. Ok. Since a given index 'k' of q[[k]] can occur in multiple rows in crossRsorted[,1], this is not what I want. Meanwhile, I was able to express what I do want like so: crossRsorted[Filter(function (idx) mean(q[[idx]], na.rm = TRUE), unique(crossRsorted[,1])), ] but, I'm afraid, that's not really R style. Or is it? But perhaps the only way? I think I'm starting to see the allure of R: every indexing task ends up a challenging puzzle. Which prevents Alzheimer's [1]. - Godmar [1] http://www.timesonline.co.uk/tol/life_and_style/article508785.ece [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] please remove me from this list
[[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] functions to calculate t-stats, etc. for lm.fit objects?
On Jul 8, 2009, at 8:51 AM, Whit Armstrong wrote: I'm running a huge number of regressions in a loop, so I tried lm.fit for a speedup. However, I would like to be able to calculate the t-stats for the coefficients. Does anyone have some functions for calculating the regression summary stats of an lm.fit object? Thanks, Whit Whit, depending upon just how much time savings you are realizing by using lm.fit() and not lm(), the approach to your question may vary. Do you need all of the models, or only a subset? If the latter, then I would narrow down your model set and re-run them with lm() so that you can use summary.lm() directly. That would entail less custom coding, which may otherwise offset any time savings from using lm.fit() If the former, then there are two choices as I see them. The first would be to restructure the object resulting from lm.fit() by adding the elements required to run summary.lm(). However, I would think that this overhead would bring you back to a point where just using lm() would be a better approach from a time standpoint. The second would be to cook up a function that only provides the subset of results that you need from summary.lm() and then use that on the results of lm.fit(). Here again, there remains the question of just how much time are you saving using lm.fit() versus the additional overhead of calculating even a subset of the output. Here is a very simple approach to a function that would get you a subset of the output that you would get using, for example, coef(summary(lm.object)). This is using a selective approach of copying and slightly editing code from summary.lm(). Note that there is other code in summary.lm() to handle weights and such, if your models are more complex. You would need to add that in if that is the case. If you need much more summary output than this on each model, then I think you would be better off just using lm() and summary.lm(). # Use at your own risk...untested on more complex models :-) # 'x' is an lm.fit object calc.lm.t - function(x) { Qr - x$qr r - x$residuals p - x$rank p1 - 1L:p rss - sum(r^2) n - NROW(Qr$qr) rdf - n - p resvar - rss/rdf R - chol2inv(Qr$qr[p1, p1, drop = FALSE]) se - sqrt(diag(R) * resvar) est - x$coefficients[Qr$pivot[p1]] tval - est/se res - cbind(est = est, se = se, tval = tval) res } Here is some simple example data: set.seed(1) y - rnorm(100) x - rnorm(100) # Get the default coefficient output using summary.lm() coef(summary(lm(y ~ x))) Estimate Std. Error t value Pr(|t|) (Intercept) 0.1088521158 0.09034800 1.20480938 0.2311784 x -0.0009323697 0.09472155 -0.00984327 0.9921663 # Now use calc.lm.t lmf - lm.fit(model.matrix(y ~ x), y) calc.lm.t(lmf) est setval (Intercept) 0.1088521158 0.09034800 1.20480938 x -0.0009323697 0.09472155 -0.00984327 I'll leave it to you to see whether this approach may or may not be helpful from a time perspective. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Passing arguments to with()
Hi, I've been wondering how to write a function that will produce results from multiple tests (eg. paired t-tests) for all or several variables in some data frame. I'd like it to do t-test for each variable ('x') in 'data' by 'y'. I'm stuck in here: function(data,y) { for (x in names(data)) { with(data, t.test(x~y)) }} How to tell 'with' that 'x' and 'y' are names of columns in 'data'? Or pass similar arguments? I probably understand the logic why this is not working, but still don't know how to make it work. Thanks in advance for any help! Timo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Formatting a Table
I've created a short program to print a table of learning curve factors. However, I cannot figure out how to format the table to: 1) Get rid of the [1]s in the first column and replace it with the values of N. 2) Line up the first row with the factors (decimal fractions). Thanks for any help. The complete program and output is as follows: Lc-seq(0.70,0.95,0.05) #Specify learning curves T-function(N,Lc) #Create a function to calc.time for Nth unit + { + N^(log(Lc,10)/log(2,10)) #Function + } for (N in seq(2,10,2)) + {if (N==2){print(T(N,Lc)*100)}else{print(T(N,Lc),digits=3)}} [1] 70 75 80 85 90 95 [1] 0.490 0.562 0.640 0.722 0.810 0.902 [1] 0.398 0.475 0.562 0.657 0.762 0.876 [1] 0.343 0.422 0.512 0.614 0.729 0.857 [1] 0.306 0.385 0.477 0.583 0.705 0.843 -- View this message in context: http://www.nabble.com/Formatting-a-Table-tp24391433p24391433.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Two-way ANOVA gives different results using anova(lm()) than doing it by hand
Hey! Could you please take a quick look at what I have done? Somehow I get wrong results using the anova(lm()) combination compared to doing a two way ANOVA by hand. Running: Data-read.table(Data.txt); g-lm(ExM~S1*S2,Data); anova(g); Gives: Analysis of Variance Table Response: ExM Df Sum Sq Mean Sq F valuePr(F) S1 1 4.3679 4.3679 167.045 2.2e-16 *** S2 1 0.9427 0.9427 36.053 8.236e-09 *** S1:S2 1 0.3231 0.3231 12.357 0.0005371 *** Residuals 212 5.5434 0.0261 I compared it to the work done by hand, ie calculated all the different square sums using sum() and tapply(). So I know that anova(lm()) gets the degrees of freedom equal two 1, 1, 1 and 212 when it should be 5, 5, 25 and 180. Also, the square sums are quite different ... I get 4.xx, 4.xx, 1.xx, 0.xx ... as you see, what anova(lm()) gets is different. The data: S1 has 6 levels, so has S2. On average, each cell has 6 values, most cells have actually 6 values, and there are two of each: 5, 7, 4, 8 - so average 6. Could you please help me, why it does not work with anova(lm())? I tried quite a few thinks found with Google, but it all gave me the same result as anova(lm()) ... Thanks a lot! Lars _ S1 S2 ExO ExM 1.000 0.000 0.000 0.819 0.830 2.000 0.000 0.000 0.835 0.846 3.000 0.000 0.000 0.891 0.902 4.000 0.000 0.000 0.905 0.916 5.000 0.000 0.000 0.839 0.850 6.000 2.500 0.000 0.863 0.874 7.000 2.500 0.000 0.898 0.909 8.000 2.500 0.000 0.887 0.898 9.000 2.500 0.000 0.909 0.920 10.000 2.500 0.000 0.892 0.903 11.000 2.500 0.000 0.886 0.897 12.000 5.000 0.000 0.841 0.852 13.000 5.000 0.000 0.881 0.892 14.000 5.000 0.000 0.874 0.885 15.000 5.000 0.000 0.873 0.884 16.000 5.000 0.000 0.886 0.897 17.000 5.000 0.000 0.858 0.869 18.000 10.000 0.000 0.709 0.720 19.000 10.000 0.000 0.702 0.713 20.000 10.000 0.000 0.727 0.738 21.000 10.000 0.000 0.737 0.748 22.000 10.000 0.000 0.762 0.773 23.000 10.000 0.000 0.716 0.727 24.000 20.000 0.000 0.381 0.392 25.000 20.000 0.000 0.437 0.448 26.000 20.000 0.000 0.443 0.454 27.000 20.000 0.000 0.412 0.423 28.000 20.000 0.000 0.414 0.425 29.000 20.000 0.000 0.362 0.373 30.000 40.000 0.000 0.034 0.045 31.000 40.000 0.000 0.030 0.041 32.000 40.000 0.000 0.036 0.047 33.000 40.000 0.000 0.062 0.073 34.000 40.000 0.000 0.063 0.074 35.000 40.000 0.000 0.085 0.096 36.000 0.000 0.039 0.573 0.584 37.000 0.000 0.039 0.337 0.348 38.000 0.000 0.039 0.557 0.568 39.000 0.000 0.039 0.422 0.433 40.000 0.000 0.039 0.542 0.553 41.000 0.000 0.039 0.428 0.439 42.000 0.000 0.078 0.293 0.304 43.000 0.000 0.078 0.346 0.357 44.000 0.000 0.078 0.241 0.252 45.000 0.000 0.078 0.261 0.272 46.000 0.000 0.078 0.298 0.309 47.000 0.000 0.156 0.223 0.234 48.000 0.000 0.156 0.215 0.226 49.000 0.000 0.156 0.196 0.207 50.000 0.000 0.156 0.238 0.249 51.000 0.000 0.156 0.276 0.287 52.000 0.000 0.156 0.294 0.305 53.000 0.000 0.156 0.291 0.302 54.000 0.000 0.313 0.194 0.205 55.000 0.000 0.313 0.186 0.197 56.000 0.000 0.313 0.204 0.215 57.000 0.000 0.313 0.336 0.347 58.000 0.000 0.313 0.315 0.326 59.000 0.000 0.313 0.251 0.262 60.000 0.000 0.625 0.211 0.222 61.000 0.000 0.625 0.203 0.214 62.000 0.000 0.625 0.182 0.193 63.000 0.000 0.625 0.336 0.347 64.000 0.000 0.625 0.383 0.394 65.000 0.000 0.625 0.364 0.375 66.000 0.000 0.625 0.255 0.266 67.000 2.500 0.039 0.519 0.530 68.000 2.500 0.039 0.503 0.514 69.000 2.500 0.039 0.491 0.502 70.000 2.500 0.039 0.490 0.501 71.000 2.500 0.039 0.509 0.520 72.000 2.500 0.039 0.546 0.557 73.000 5.000 0.039 0.483 0.494 74.000 5.000 0.039 0.462 0.473 75.000 5.000 0.039 0.449 0.460 76.000 5.000 0.039 0.422 0.433 77.000 5.000 0.039 0.418 0.429 78.000 5.000 0.039 0.428 0.439 79.000 10.000 0.039 0.321 0.332 80.000 10.000 0.039 0.296 0.307 81.000 10.000 0.039 0.273 0.284 82.000 10.000 0.039 0.275 0.286 83.000 10.000 0.039 0.308 0.319 84.000 10.000 0.039 0.325 0.336 85.000 20.000 0.039 0.146 0.157 86.000 20.000 0.039 0.129 0.140 87.000 20.000 0.039 0.122 0.133 88.000 20.000 0.039 0.096 0.107 89.000 20.000 0.039 0.113 0.124 90.000 20.000 0.039 0.119 0.130 91.000 40.000 0.039 0.031 0.042 92.000 40.000 0.039 0.035 0.046 93.000 40.000 0.039 0.034 0.045 94.000 40.000 0.039 0.035 0.046 95.000 40.000 0.039 0.072
Re: [R] functions to calculate t-stats, etc. for lm.fit objects?
Marc, Thanks very much for your detailed reply. I'll give your code a try and post back the time difference. Cheers, Whit On Wed, Jul 8, 2009 at 10:50 AM, Marc Schwartzmarc_schwa...@me.com wrote: On Jul 8, 2009, at 8:51 AM, Whit Armstrong wrote: I'm running a huge number of regressions in a loop, so I tried lm.fit for a speedup. However, I would like to be able to calculate the t-stats for the coefficients. Does anyone have some functions for calculating the regression summary stats of an lm.fit object? Thanks, Whit Whit, depending upon just how much time savings you are realizing by using lm.fit() and not lm(), the approach to your question may vary. Do you need all of the models, or only a subset? If the latter, then I would narrow down your model set and re-run them with lm() so that you can use summary.lm() directly. That would entail less custom coding, which may otherwise offset any time savings from using lm.fit() If the former, then there are two choices as I see them. The first would be to restructure the object resulting from lm.fit() by adding the elements required to run summary.lm(). However, I would think that this overhead would bring you back to a point where just using lm() would be a better approach from a time standpoint. The second would be to cook up a function that only provides the subset of results that you need from summary.lm() and then use that on the results of lm.fit(). Here again, there remains the question of just how much time are you saving using lm.fit() versus the additional overhead of calculating even a subset of the output. Here is a very simple approach to a function that would get you a subset of the output that you would get using, for example, coef(summary(lm.object)). This is using a selective approach of copying and slightly editing code from summary.lm(). Note that there is other code in summary.lm() to handle weights and such, if your models are more complex. You would need to add that in if that is the case. If you need much more summary output than this on each model, then I think you would be better off just using lm() and summary.lm(). # Use at your own risk...untested on more complex models :-) # 'x' is an lm.fit object calc.lm.t - function(x) { Qr - x$qr r - x$residuals p - x$rank p1 - 1L:p rss - sum(r^2) n - NROW(Qr$qr) rdf - n - p resvar - rss/rdf R - chol2inv(Qr$qr[p1, p1, drop = FALSE]) se - sqrt(diag(R) * resvar) est - x$coefficients[Qr$pivot[p1]] tval - est/se res - cbind(est = est, se = se, tval = tval) res } Here is some simple example data: set.seed(1) y - rnorm(100) x - rnorm(100) # Get the default coefficient output using summary.lm() coef(summary(lm(y ~ x))) Estimate Std. Error t value Pr(|t|) (Intercept) 0.1088521158 0.09034800 1.20480938 0.2311784 x -0.0009323697 0.09472155 -0.00984327 0.9921663 # Now use calc.lm.t lmf - lm.fit(model.matrix(y ~ x), y) calc.lm.t(lmf) est se tval (Intercept) 0.1088521158 0.09034800 1.20480938 x -0.0009323697 0.09472155 -0.00984327 I'll leave it to you to see whether this approach may or may not be helpful from a time perspective. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] #INCLUDE
What is R's equivalent to a C-like #include to incorporate external files. I have a 2k line function that is generated and need to include it at runtime but not manage it as a package (as it changes hourly.) Any ideas? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading from Google Docs
On 08/07/2009 10:13 AM, Farrel Buchinsky wrote: Forgive my naivte, but how do I make windows find tar. In other words from where do I issue the command and what is the command. You need to install the toolset, and let the installer set your path. Duncan Murdoch Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Wed, Jul 8, 2009 at 10:09, Duncan Murdoch murd...@stats.uwo.ca wrote: On 08/07/2009 10:02 AM, Farrel Buchinsky wrote: I have previously read R Installation and Administration. I read it again. It does not help me The relevant paragraph is below. But I need lower level instructions. Where can I find them. Follow the link. If Windows can't find tar, your toolset is installed incorrectly. Duncan Murdoch R CMD INSTALL works in Windows to install source packages if you have the source-code package files (option “Source Package Installation Files” in the installer) and toolset (see The Windows toolsetfile:///C:/Program%20Files/R/R-2.9.1/doc/manual/R-admin.html#The-Windows-toolset) installed. Installation of binary packages must be done by install.packages . R CMD INSTALL --help will tell you the current options under Windows (which differ from those on a Unix-alike): in particular there is a choice of the types of documentation to be installed. Farrel Buchinsky Google Voice Tel: (412) 567-7870 2009/6/19 Uwe Ligges lig...@statistik.tu-dortmund.de See the manual R Installation and Administration for information on how to install source packages on Windows. Uwe Ligges Farrel Buchinsky wrote: After issuing tar xvfz RgoogleDocs_0.2.2-src.tar.gzI am getting an error message 'tar' is not recongnized as an internal or external command, operable program or batch file. Should I use my 7-zip to open up the archive? Where should I be doing this? For instance can I do it all in my download directory or should I do it in C:\Program Files\R\R-2.9.0\library or should I manually create C:\Program Files\R\R-2.9.0\library\RGoogleDocs and do it all there or will the Rcmd INSTALL RGoogleDocs_0.2-2.tar.gz command do that for me. Yes, you assumed correctly. I am using Windows XP. Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Thu, Jun 18, 2009 at 20:17, Gabor Grothendieck ggrothendi...@gmail.comwrote: I have haven't neen following this thread but: 1. if RGoogleDocs_0.2-2.tar.gz is a source distribution (as opposed to built source) then the first line renames it so that its not the same name as the built file about to be created. The second line detars it into the RGoogleDocs directory. The third builds the built source file, RGoogleDocs_0.2-2.tar.gz. The fourth installs the built source file into R. I've assumed Windows. If you are on Linux replace rename with mv. rename RGoogleDocs_0.2-2.tar.gz RgoogleDocs_0.2.2-src.tar.gz tar xvfz RgoogleDocs_0.2.2-src.tar.gz Rcmd build RGoogleDocs Rcmd INSTALL RGoogleDocs_0.2-2.tar.gz or 2. if RGoogleDocs_0.2-2.tar.gz is already a built source file then you can just issue the last of the above lines and don't need the others. On Thu, Jun 18, 2009 at 7:52 PM, Farrel Buchinskyfjb...@gmail.com wrote: What do you mean by cd the.directory.containing.RGoogleDocs Do you mean the directory where I downloaded the RGoogleDocs_0.2-2.tar.gz to? Or do you mean that I must create a directory called RGoogleDocs under Library and then change to that directory? Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Mon, Mar 2, 2009 at 22:16, Gabor Grothendieck ggrothendi...@gmail.com wrote: Finally enter into the Windows console: cd the.directory.containing.RGoogleDocs Rcmd build RGoogleDocs Rcmd INSTALL RGoogleDocs_1.0.0.tar.gz except replace RGoogleDocs_1.0.0.tar.gz with the filename created by the build. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
Re: [R] #INCLUDE
?source ? On Wed, Jul 8, 2009 at 11:16 AM, Idgaradidga...@gmail.com wrote: What is R's equivalent to a C-like #include to incorporate external files. I have a 2k line function that is generated and need to include it at runtime but not manage it as a package (as it changes hourly.) Any ideas? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Comparing GAMMs
Greetings! I am looking for advice regarding the best way to compare GAMMs. I know other model outputs return enough information for R's AIC, ANOVA, etc. commands to function, but this is not the case with GAMM unless one specifies the gam or lme portion. I know these parts of the gamm contain items that will facilitate comparisons between gamms. Is it correct to simply use these values for this purpose? For example, the lme portion of the gamm returns a log liklihood value that could be used to calculate information criteria. However, I am wondering whether entire gamms be compared using this, or only the lme part. Maybe my thinking about the lme and gam portions of gamms is incorrect? If this appears to be the case, let me know! In general, if someone could clarify my understanding in any way it would be much appreciated. Thank you very much! Sincerely, Paul Simonin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Passing arguments to with()
On 08/07/2009 10:01 AM, Tymek Wo?odz'ko wrote: Hi, I've been wondering how to write a function that will produce results from multiple tests (eg. paired t-tests) for all or several variables in some data frame. I'd like it to do t-test for each variable ('x') in 'data' by 'y'. I'm stuck in here: function(data,y) { for (x in names(data)) { with(data, t.test(x~y)) }} How to tell 'with' that 'x' and 'y' are names of columns in 'data'? Or pass similar arguments? Don't use with. Use t.test(data[[x]] ~ data[[y]]). Duncan Murdoch I probably understand the logic why this is not working, but still don't know how to make it work. Thanks in advance for any help! Timo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Formatting a Table
You could use 'cat(sprintf())', C-style: for (N in seq(2,10,2)) + {if (N==2){cat(sprintf(%5d, T(N,Lc)*100),\n)}else{cat(sprintf(%5.3f, T(N,Lc)), \n)}} 707580858995 0.490 0.562 0.640 0.722 0.810 0.902 0.398 0.475 0.562 0.657 0.762 0.876 0.343 0.422 0.512 0.614 0.729 0.857 0.306 0.385 0.477 0.583 0.705 0.843 On Wed, Jul 8, 2009 at 9:20 AM, cvandycvand...@gmail.com wrote: I've created a short program to print a table of learning curve factors. However, I cannot figure out how to format the table to: 1) Get rid of the [1]s in the first column and replace it with the values of N. 2) Line up the first row with the factors (decimal fractions). Thanks for any help. The complete program and output is as follows: Lc-seq(0.70,0.95,0.05) #Specify learning curves T-function(N,Lc) #Create a function to calc.time for Nth unit + { + N^(log(Lc,10)/log(2,10)) #Function + } for (N in seq(2,10,2)) + {if (N==2){print(T(N,Lc)*100)}else{print(T(N,Lc),digits=3)}} [1] 70 75 80 85 90 95 [1] 0.490 0.562 0.640 0.722 0.810 0.902 [1] 0.398 0.475 0.562 0.657 0.762 0.876 [1] 0.343 0.422 0.512 0.614 0.729 0.857 [1] 0.306 0.385 0.477 0.583 0.705 0.843 -- View this message in context: http://www.nabble.com/Formatting-a-Table-tp24391433p24391433.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Formatting a Table
Cvandy, is this close to what you need: printT - function ( .seq = seq ( 2 , 10 , 2 ) ) { + x - t ( sapply ( .seq , T , Lc ) ) + x - cbind ( + .seq + , rbind ( + format ( x [ 1 , ] * 100 ) + , format ( x [ -1 , ] , digits = 3 ) + ) + ) + dimnames ( x ) [[2]] - NULL + print ( x , quote = FALSE ) + } printT ( ) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,] 2707580859095 [2,] 40.490 0.562 0.640 0.722 0.810 0.902 [3,] 60.398 0.475 0.562 0.657 0.762 0.876 [4,] 80.343 0.422 0.512 0.614 0.729 0.857 [5,] 10 0.306 0.385 0.477 0.583 0.705 0.843 Im not really sure what you mean by Line up the first row with the factors (decimal fractions). -- David - David Huffer, Ph.D. Senior Statistician CSOSA/Washington, DC david.huf...@csosa.gov - -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of cvandy Sent: Wednesday, July 08, 2009 9:21 AM To: r-help@r-project.org Subject: [R] Formatting a Table I've created a short program to print a table of learning curve factors. However, I cannot figure out how to format the table to: 1) Get rid of the [1]s in the first column and replace it with the values of N. 2) Line up the first row with the factors (decimal fractions). Thanks for any help. The complete program and output is as follows: Lc-seq(0.70,0.95,0.05) #Specify learning curves T-function(N,Lc) #Create a function to calc.time for Nth unit + { + N^(log(Lc,10)/log(2,10)) #Function + } for (N in seq(2,10,2)) + {if (N==2){print(T(N,Lc)*100)}else{print(T(N,Lc),digits=3)}} [1] 70 75 80 85 90 95 [1] 0.490 0.562 0.640 0.722 0.810 0.902 [1] 0.398 0.475 0.562 0.657 0.762 0.876 [1] 0.343 0.422 0.512 0.614 0.729 0.857 [1] 0.306 0.385 0.477 0.583 0.705 0.843 -- View this message in context: http://www.nabble.com/Formatting-a-Table-tp24391433p24391433.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparing GAMMs
On Wed, 2009-07-08 at 11:24 -0400, Paul Simonin wrote: Greetings! I am looking for advice regarding the best way to compare GAMMs. I know other model outputs return enough information for R's AIC, ANOVA, etc. commands to function, but this is not the case with GAMM unless one specifies the gam or lme portion. I know these parts of the gamm contain items that will facilitate comparisons between gamms. Is it correct to simply use these values for this purpose? For example, the lme portion of the gamm returns a log liklihood value that could be used to calculate information criteria. However, I am wondering whether entire gamms be compared using this, or only the lme part. Maybe my thinking about the lme and gam portions of gamms is incorrect? If this appears to be the case, let me know! In general, if someone could clarify my understanding in any way it would be much appreciated. Thank you very much! Sincerely, Paul Simonin Hi Paul, Are your GAMMs Guassian (i.e. AMM) or non-Gaussian? If they are Gaussian, then anova(mod1$lme, mod2$lme) gives an approximate LRT for the two models. That will also yield AIC and BIC which might also be used for inference. Your AMM in this case is just a linear mixed model and these usual forms of inference apply, with the caveat that the hypothesis testing is approximate. You end up using both the $lme and the $gam components for various aspects of model inspection, interrogation etc, but for hypothesis testing, the lme bit is sufficient. You can also use things like intervals(mod1$lme) to look at confidence on the smoothing parameters. See Simon Wood's book [1] section 6.7 for more details, and preceding sections on how the smoothers can be formulated as a mixed model. If your GAMMS are generalised then I'm not sure what the best approach for comparison or hypothesis testing might be - especially as this is an ongoing research topic for GLMMs, and also because of the method by which GAMMs are fitted in mgcv. Simon Wood says as much in his 2006 monograph [1, page 318, section 6.6.2]. The non-Gaussian case uses glmmPQL from package MASS, and this doesn't return a likelihood and hence no AIC (in the same way that quasi families in glm() fits don't return likelihoods). So having said that, if you do have a likelihood, then you must be fitting AMM via gamm() and the first half of my reply would seem most appropriate. [1] Wood, S.N. (2006) Generalized Additive Models; an Introduction with R. Chapman Hall/CRC. HTH G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Randomizing a dataframe
Hi R-helpers, I have a dataframe (called data) with trees in rows (n=100) and insect species (n=10) in columns. My tree IDs are in a column called TREE and each species has a column labeled SPEC1, SPEC2, SPEC3, etc... I wish to randomize the values in my dataframe such that row and column totals are held constant, i.e. in my randomized data each tree will have the same number of individual insects as in the real data (constant row totals) and each species will have the same number of individuals as in the real data (constant column totals). I will eventually want to do this many times, but I would appreciate help getting started with the randomization. Thank you, Mark Na [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error: no such index at level 2
Godmar, I don't follow... q - list ( ) q [[ 105 ]] - as.numeric ( c ( 0 , 0 , 1 ) ) q [[ 104 ]] - as.numeric ( c ( 1 , 1 , 1 ) ) q [[ 10 ]] - as.integer ( c ( 3 , 3 , 1 ) ) crossRsorted - data.frame ( i = c ( 105 , 104 , 10 ) ) q [ crossRsorted [ , 1 ] ] [[1]] [1] 0 0 1 [[2]] [1] 1 1 1 [[3]] [1] 3 3 1 length ( q [ crossRsorted [ , 1 ] ] ) [1] 3 How'd you come up with length(q) [1] 165 length(q[ crossRsorted[,1] ]) [1] 15750 I must be missing something. -- David - David Huffer, Ph.D. Senior Statistician CSOSA/Washington, DC david.huf...@csosa.gov - -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Godmar Back Sent: Wednesday, July 08, 2009 9:58 AM To: Henrique Dallazuanna Cc: r-help@r-project.org; Petr PIKAL Subject: Re: [R] error: no such index at level 2 On Wed, Jul 8, 2009 at 9:40 AM, Henrique Dallazuanna www...@gmail.comwrote: Its because '[[' accept only element, so you need use '[': q[crossRsorted[,1]] This appears to be doing something different. For instance, my 'q' has 165 components, but what you suggest has 15750: length(q) [1] 165 length(q[ crossRsorted[,1] ]) [1] 15750 hardly what I want. Meanwhile, it looks as though [[ ]] does not vectorize its arguments, it curries them! Note that: q[[c(105,104)]] Error in q[[c(105, 104)]] : subscript out of bounds gives the same error as: q[[105]][[104]] Error in q[[105]][[104]] : subscript out of bounds Very mysterious, though, in all fairness, explained in help([[) where it says: '[[' can be applied recursively to lists, so that if the single index 'i' is a vector of length 'p', 'alist[[i]]' is equivalent to 'alist[[i1]]...[[ip]]' providing all but the final indexing results in a list. which leads to square one: how to express select all r[i] where q[[i]] fulfills some predicate? - Godmar [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Godmar Back Sent: Wednesday, July 08, 2009 9:58 AM To: Henrique Dallazuanna Cc: r-help@r-project.org; Petr PIKAL Subject: Re: [R] error: no such index at level 2 On Wed, Jul 8, 2009 at 9:40 AM, Henrique Dallazuanna www...@gmail.comwrote: Its because '[[' accept only element, so you need use '[': q[crossRsorted[,1]] This appears to be doing something different. For instance, my 'q' has 165 components, but what you suggest has 15750: length(q) [1] 165 length(q[ crossRsorted[,1] ]) [1] 15750 hardly what I want. Meanwhile, it looks as though [[ ]] does not vectorize its arguments, it curries them! Note that: q[[c(105,104)]] Error in q[[c(105, 104)]] : subscript out of bounds gives the same error as: q[[105]][[104]] Error in q[[105]][[104]] : subscript out of bounds Very mysterious, though, in all fairness, explained in help([[) where it says: '[[' can be applied recursively to lists, so that if the single index 'i' is a vector of length 'p', 'alist[[i]]' is equivalent to 'alist[[i1]]...[[ip]]' providing all but the final indexing results in a list. which leads to square one: how to express select all r[i] where q[[i]] fulfills some predicate? - Godmar [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error: no such index at level 2
Sorry, I mixed my toy example to recreate the problem with the actual data set. The 'crossRsorted' in the toy and in the actual are different. See my latest posting in this thread. - Godmar On Wed, Jul 8, 2009 at 11:55 AM, David Hufferdavid.huf...@csosa.gov wrote: Godmar, I don't follow... q - list ( ) q [[ 105 ]] - as.numeric ( c ( 0 , 0 , 1 ) ) q [[ 104 ]] - as.numeric ( c ( 1 , 1 , 1 ) ) q [[ 10 ]] - as.integer ( c ( 3 , 3 , 1 ) ) crossRsorted - data.frame ( i = c ( 105 , 104 , 10 ) ) q [ crossRsorted [ , 1 ] ] [[1]] [1] 0 0 1 [[2]] [1] 1 1 1 [[3]] [1] 3 3 1 length ( q [ crossRsorted [ , 1 ] ] ) [1] 3 How'd you come up with length(q) [1] 165 length(q[ crossRsorted[,1] ]) [1] 15750 I must be missing something. -- David - David Huffer, Ph.D. Senior Statistician CSOSA/Washington, DC david.huf...@csosa.gov - -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Godmar Back Sent: Wednesday, July 08, 2009 9:58 AM To: Henrique Dallazuanna Cc: r-help@r-project.org; Petr PIKAL Subject: Re: [R] error: no such index at level 2 On Wed, Jul 8, 2009 at 9:40 AM, Henrique Dallazuanna www...@gmail.comwrote: Its because '[[' accept only element, so you need use '[': q[crossRsorted[,1]] This appears to be doing something different. For instance, my 'q' has 165 components, but what you suggest has 15750: length(q) [1] 165 length(q[ crossRsorted[,1] ]) [1] 15750 hardly what I want. Meanwhile, it looks as though [[ ]] does not vectorize its arguments, it curries them! Note that: q[[c(105,104)]] Error in q[[c(105, 104)]] : subscript out of bounds gives the same error as: q[[105]][[104]] Error in q[[105]][[104]] : subscript out of bounds Very mysterious, though, in all fairness, explained in help([[) where it says: '[[' can be applied recursively to lists, so that if the single index 'i' is a vector of length 'p', 'alist[[i]]' is equivalent to 'alist[[i1]]...[[ip]]' providing all but the final indexing results in a list. which leads to square one: how to express select all r[i] where q[[i]] fulfills some predicate? - Godmar [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Godmar Back Sent: Wednesday, July 08, 2009 9:58 AM To: Henrique Dallazuanna Cc: r-help@r-project.org; Petr PIKAL Subject: Re: [R] error: no such index at level 2 On Wed, Jul 8, 2009 at 9:40 AM, Henrique Dallazuanna www...@gmail.comwrote: Its because '[[' accept only element, so you need use '[': q[crossRsorted[,1]] This appears to be doing something different. For instance, my 'q' has 165 components, but what you suggest has 15750: length(q) [1] 165 length(q[ crossRsorted[,1] ]) [1] 15750 hardly what I want. Meanwhile, it looks as though [[ ]] does not vectorize its arguments, it curries them! Note that: q[[c(105,104)]] Error in q[[c(105, 104)]] : subscript out of bounds gives the same error as: q[[105]][[104]] Error in q[[105]][[104]] : subscript out of bounds Very mysterious, though, in all fairness, explained in help([[) where it says: '[[' can be applied recursively to lists, so that if the single index 'i' is a vector of length 'p', 'alist[[i]]' is equivalent to 'alist[[i1]]...[[ip]]' providing all but the final indexing results in a list. which leads to square one: how to express select all r[i] where q[[i]] fulfills some predicate? - Godmar [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading from Google Docs
Its safer just to temporarily add it to your path. Unfortunately Rtools has a find command that conflicts with the find command in Windows so if you add the Rtools bin directory to your path permanently then you could find other programs stop working. That actually happened to me once and it took the longest time until I discovered that Rtools was the culprit. If you follow the advice I gave you normally won't have that problem. On Wed, Jul 8, 2009 at 11:21 AM, Duncan Murdochmurd...@stats.uwo.ca wrote: On 08/07/2009 10:13 AM, Farrel Buchinsky wrote: Forgive my naivte, but how do I make windows find tar. In other words from where do I issue the command and what is the command. You need to install the toolset, and let the installer set your path. Duncan Murdoch Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Wed, Jul 8, 2009 at 10:09, Duncan Murdoch murd...@stats.uwo.ca wrote: On 08/07/2009 10:02 AM, Farrel Buchinsky wrote: I have previously read R Installation and Administration. I read it again. It does not help me The relevant paragraph is below. But I need lower level instructions. Where can I find them. Follow the link. If Windows can't find tar, your toolset is installed incorrectly. Duncan Murdoch R CMD INSTALL works in Windows to install source packages if you have the source-code package files (option “Source Package Installation Files” in the installer) and toolset (see The Windows toolsetfile:///C:/Program%20Files/R/R-2.9.1/doc/manual/R-admin.html#The-Windows-toolset) installed. Installation of binary packages must be done by install.packages . R CMD INSTALL --help will tell you the current options under Windows (which differ from those on a Unix-alike): in particular there is a choice of the types of documentation to be installed. Farrel Buchinsky Google Voice Tel: (412) 567-7870 2009/6/19 Uwe Ligges lig...@statistik.tu-dortmund.de See the manual R Installation and Administration for information on how to install source packages on Windows. Uwe Ligges Farrel Buchinsky wrote: After issuing tar xvfz RgoogleDocs_0.2.2-src.tar.gzI am getting an error message 'tar' is not recongnized as an internal or external command, operable program or batch file. Should I use my 7-zip to open up the archive? Where should I be doing this? For instance can I do it all in my download directory or should I do it in C:\Program Files\R\R-2.9.0\library or should I manually create C:\Program Files\R\R-2.9.0\library\RGoogleDocs and do it all there or will the Rcmd INSTALL RGoogleDocs_0.2-2.tar.gz command do that for me. Yes, you assumed correctly. I am using Windows XP. Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Thu, Jun 18, 2009 at 20:17, Gabor Grothendieck ggrothendi...@gmail.comwrote: I have haven't neen following this thread but: 1. if RGoogleDocs_0.2-2.tar.gz is a source distribution (as opposed to built source) then the first line renames it so that its not the same name as the built file about to be created. The second line detars it into the RGoogleDocs directory. The third builds the built source file, RGoogleDocs_0.2-2.tar.gz. The fourth installs the built source file into R. I've assumed Windows. If you are on Linux replace rename with mv. rename RGoogleDocs_0.2-2.tar.gz RgoogleDocs_0.2.2-src.tar.gz tar xvfz RgoogleDocs_0.2.2-src.tar.gz Rcmd build RGoogleDocs Rcmd INSTALL RGoogleDocs_0.2-2.tar.gz or 2. if RGoogleDocs_0.2-2.tar.gz is already a built source file then you can just issue the last of the above lines and don't need the others. On Thu, Jun 18, 2009 at 7:52 PM, Farrel Buchinskyfjb...@gmail.com wrote: What do you mean by cd the.directory.containing.RGoogleDocs Do you mean the directory where I downloaded the RGoogleDocs_0.2-2.tar.gz to? Or do you mean that I must create a directory called RGoogleDocs under Library and then change to that directory? Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Mon, Mar 2, 2009 at 22:16, Gabor Grothendieck ggrothendi...@gmail.com wrote: Finally enter into the Windows console: cd the.directory.containing.RGoogleDocs Rcmd build RGoogleDocs Rcmd INSTALL RGoogleDocs_1.0.0.tar.gz except replace RGoogleDocs_1.0.0.tar.gz with the filename created by the build. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] Uncorrelated random vectors
The mvrnorm function in the MASS package has an argument to force the generated data to have the exact mean/variance structure as specified which when used with a diagonal variance matrix will generate data that has a 0 (within round off error) correlation in the data. No post processing by Gramm-Schmidt or other methods needed. The author(s) of the function cleverly hid this feature by placing the information on the help page for the function. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Moshe Olshansky Sent: Tuesday, July 07, 2009 9:10 PM To: r-help@r-project.org; Luba (AIM SE)Stein Subject: Re: [R] Uncorrelated random vectors As mentioned by somebody before, there is no problem for the normal case - use mvrnorm function from MASS package with any mu and make Sigma be any diagonal matrix (with strictly positive diagonal). Note that even though all the correlations are 0, the SAMPLE correlations won't be 0. If you want to create a set of vectors whose SAMPLE correlations are 0 you will have to use a variant of Gramm-Schmidt. I do not know whether a variant of mvrnorm exists for logistic distribution (my guess is that it does not). --- On Tue, 7/7/09, Stein, Luba (AIM SE) luba.st...@allianz.com wrote: From: Stein, Luba (AIM SE) luba.st...@allianz.com Subject: [R] Uncorrelated random vectors To: r-help@r-project.org r-help@r-project.org Received: Tuesday, 7 July, 2009, 11:45 PM Hello, is it possible to create two uncorrelated random vectors for a given distribution. In fact, I would like to have something like the function rnorm or rlogis with the extra property that they are uncorrelated. Thanks for your help, Luba [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Import xlsx file in Ubuntu 9.04
I did some preliminary work on xslx (and docx and pptx) files some time ago and will hopefully finish things off by the end of summer. We can read these with a combination of the Rcompression and XML package. I have put versions of two packages (ROOXML and RExcelXML) at http://www.omegahat.org/Prerelease/ (ROOXML_0.1-0.tar.gz and RExcelXML_0.1-0.tar.gz) There are no guarantees about how they work at this point, but the basic structures are there. I'd be happy to hear about any problems and to try to add functionality. Given the framework, it should be relatively easy to add support for additional cell types, etc. D. Marc Schwartz wrote: On Jul 8, 2009, at 6:56 AM, Rodrigo Aluizio wrote: Hi list, By the entire last 2 weeks I was looking for a way to directly import xlsx files to R in a Linux OS (Ubuntu 9.04). I already read the R Import/Export guide, and I know how to use gdata to import xls files and read.table to import .csv. My problem is that all data that I receive is in the xlsx format, and I have to convert all the files to xls. Well, when I was using Windows Vista OS, RODBC did the trick with the odbcConnectExcel2007 function (which I know is not present in the Linux RODBC package, probably due to drivers issue). Isn't there a way to import this xlsx files directly to R without any previous conversion (.csv or .xls)? Thank you for the attention, it's probable that some one already asked it. I even remember seen that somewhere, but without a definitive answer. Rodrigo. Your best bet on Linux would be to open the Excel 2007 files using OpenOffice's Calc and save them to CSV files. The latest versions of OpenOffice will open Office 2007 files. An alternative of course would be to see if it is reasonable for the providers of the files to save them in the older XLS format instead, or to see if they have other file formats that they can send you rather than using Excel at all. There is a very preliminary Perl module in progress, that should eventually provide for a more efficient path: http://search.cpan.org/dist/Spreadsheet-XLSX/ But from what I have seen, there are enough problems with it (including data integrity issues), that I would not use it in production work. Unfortunately, I don't believe that you have a lot of options on Linux at the moment. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Randomizing a dataframe
On Wed, Jul 8, 2009 at 8:54 AM, Mark Namtb...@gmail.com wrote: Hi R-helpers, I have a dataframe (called data) with trees in rows (n=100) and insect species (n=10) in columns. My tree IDs are in a column called TREE and each species has a column labeled SPEC1, SPEC2, SPEC3, etc... I wish to randomize the values in my dataframe such that row and column totals are held constant, i.e. in my randomized data each tree will have the same number of individual insects as in the real data (constant row totals) and each species will have the same number of individuals as in the real data (constant column totals). I will eventually want to do this many times, but I would appreciate help getting started with the randomization. Thank you, Mark Na [[alternative HTML version deleted]] Sounds like maybe you're looking for some form of Monte Carlo experiments in R which is on my list of to-do for the next month. I need to do something like rearrange the dates in one database as in Monte Carlo but then rearrange all my other databases so that dates still match up. It's just not bubbled to the top of the list yet. I took a quick look in Google and found MCMCpack pretty quickly. There's some documentation out there which is easy to find if it's of interest. Good luck and I'll be following the thread. cheers, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading from Google Docs
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Gabor Grothendieck Sent: Wednesday, July 08, 2009 9:04 AM To: Duncan Murdoch Cc: R; Uwe Ligges; Farrel Buchinsky Subject: Re: [R] Reading from Google Docs Its safer just to temporarily add it to your path. I recommend that also. Here is the SETPATH.BAT file that I put into my Rtools directory that sets up PATH so it can be used for building R and R packages. I run it from within the cmd window I will use for building packages. Note that it totally replaces the current value of PATH with a new one; it does not append or prepend entries to the existing one. You will have to adjust the entries for you own machine. It is safe to add other entries (like e:\cygwin\bin) to the end of this PATH, but you might run into trouble putting entries at the front of PATH. (I have a similar script to run before building packages for S+, whose package building system uses the Microsoft compilers and ActiveState perl but no cygwin tools.) E:\type e:\Rtools\SETPATH.BAT set RTOOLS=E:\Rtools REM RHOME is for use in this script, R_HOME will be set by R itself. set RHOME=E:\R-svn\r-devel set PATH=C:\WINDOWS\system32;C:\WINDOWS set PATH=%RTOOLS%\bin;%RTOOLS%\perl\bin;%RTOOLS%\MinGW\bin;%PATH% set PATH=%RHOME%\bin;%PATH% set PATH=%PATH%;E:\Program Files\MiKTeX 2.7\miktex\bin set PATH=%PATH%;E:\Program Files\Inno Setup 5 set PATH=%PATH%;C:\Program Files\HTML Help Workshop set PATH=%PATH%;E:\Program Files\CollabNet Subversion Server Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com Unfortunately Rtools has a find command that conflicts with the find command in Windows so if you add the Rtools bin directory to your path permanently then you could find other programs stop working. That actually happened to me once and it took the longest time until I discovered that Rtools was the culprit. If you follow the advice I gave you normally won't have that problem. On Wed, Jul 8, 2009 at 11:21 AM, Duncan Murdochmurd...@stats.uwo.ca wrote: On 08/07/2009 10:13 AM, Farrel Buchinsky wrote: Forgive my naivte, but how do I make windows find tar. In other words from where do I issue the command and what is the command. You need to install the toolset, and let the installer set your path. Duncan Murdoch Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Wed, Jul 8, 2009 at 10:09, Duncan Murdoch murd...@stats.uwo.ca wrote: On 08/07/2009 10:02 AM, Farrel Buchinsky wrote: I have previously read R Installation and Administration. I read it again. It does not help me The relevant paragraph is below. But I need lower level instructions. Where can I find them. Follow the link. If Windows can't find tar, your toolset is installed incorrectly. Duncan Murdoch R CMD INSTALL works in Windows to install source packages if you have the source-code package files (option Source Package Installation Files in the installer) and toolset (see The Windows toolsetfile:///C:/Program%20Files/R/R-2.9.1/doc/manual/R-admi n.html#The-Windows-toolset) installed. Installation of binary packages must be done by install.packages . R CMD INSTALL --help will tell you the current options under Windows (which differ from those on a Unix-alike): in particular there is a choice of the types of documentation to be installed. Farrel Buchinsky Google Voice Tel: (412) 567-7870 2009/6/19 Uwe Ligges lig...@statistik.tu-dortmund.de See the manual R Installation and Administration for information on how to install source packages on Windows. Uwe Ligges Farrel Buchinsky wrote: After issuing tar xvfz RgoogleDocs_0.2.2-src.tar.gzI am getting an error message 'tar' is not recongnized as an internal or external command, operable program or batch file. Should I use my 7-zip to open up the archive? Where should I be doing this? For instance can I do it all in my download directory or should I do it in C:\Program Files\R\R-2.9.0\library or should I manually create C:\Program Files\R\R-2.9.0\library\RGoogleDocs and do it all there or will the Rcmd INSTALL RGoogleDocs_0.2-2.tar.gz command do that for me. Yes, you assumed correctly. I am using Windows XP. Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Thu, Jun 18, 2009 at 20:17, Gabor Grothendieck ggrothendi...@gmail.comwrote: I have haven't neen following this thread but: 1. if RGoogleDocs_0.2-2.tar.gz is a source distribution (as opposed to built source) then the first line renames it so that its not the same name as the built file about to be created. The second line detars it into the RGoogleDocs directory. The third builds the built source file, RGoogleDocs_0.2-2.tar.gz. The fourth installs the built source file into R. I've assumed Windows.
Re: [R] bigglm() results different from glm()+Another question
OK, it appears that the problem is the df.resid component of the biglm object. Everything else is being updated by the update function except the df.resid piece, so it is based solely on the initial fit and the chunksize used there. The df.resid piece is then used in the computation of the AIC and hence the differences that you see. There could also be a difference in the p-values and confidence intervals, but at those high of numbers, the differences are smaller than can be seen at the level of rounding done. This appears to be a bug/overlooked piece to me, Thomas is cc'd on this so he should be able to fix this. A work around in the meantime is to do something like: fit$df.resid - 1-4 Then compute the AIC. Also as an aside, if you change your seq to: seq(chunksize, 1-chunksize, chunksize) then you won't get the error messages. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 From: utkarshsinghal [mailto:utkarsh.sing...@global-analytics.com] Sent: Wednesday, July 08, 2009 2:24 AM To: Greg Snow Cc: Thomas Lumley; r help Subject: Re: [R] bigglm() results different from glm()+Another question Hi Greg, Many thanks for your precious time. Here is a workable code: set.seed(1) xx = data.frame(x1=runif(1,0,10), x2=runif(1,0,10), x3=runif(1,0,10)) xx$y = 3 + xx$x1 + 2*xx$x2 + 3*xx$x3 + rnorm(1) chunksize = 500 fit = biglm(y~x1+x2+x3, data=xx[1:chunksize,]) for(i in seq(chunksize,1,chunksize)) fit=update(fit, moredata=xx[(i+1):(i+chunksize),]) AIC(fit) [1] 28956.91 And the AIC for other chunksizes: chunksizeAIC 500 28956.91 100027956.91 200025956.91 250024956.91 500019956.91 19956.91 Also I noted that the estimated coefficients are not dependent on chunksize and AIC is exactly a linear function of chunksize. So I guess it is some problem with the calculation of AIC, may be in some degree of freedom or adding some constant somewhere. And my comments below. Regards Utkarsh Greg Snow wrote: How many rows does xx have? Let's look at your example for chunksize 1, you initially fit the first 1 observations, then the seq results in just the value 1 which means that you do the update based on vaues 10001 through 2, if xx only has 1 rows, then this should give at least one error. If xx has 2 or more rows, then only chunksize 1 will ever see the 2th value, the other chunksizes will use less of the data. Understood your point and apologize that you had to spend time going into the logic inside for loop. I definitely thought of that but my actual problem was the variation in AICs (which I was sure about), so to ignore this loop problem (temporarily), I deliberately chose the chunksizes such that the number of rows is a multiple of chunksize. I knew there is still one extra iteration happening and I checked that it was not causing any problem, the moredata in the last iteration will be all NA's and update does nothing in such a case. For example: Let's say chunksize=5000, even though xx has only 1 rows, fit2 and fit3 below are exactly same. fit1 = biglm(y~x1+x2+x3, data=xx[1:5000,]) fit2 = update(fit1, moredata=xx[5001:1,]) fit3 = update(fit2, moredata=xx[10001:15000,]) AIC(fit1); AIC(fit2); AIC(fit3) [1] 5018.282 [1] 19956.91 [1] 19956.91 (The AIC matches with the table above and no warnings at all) I checked all these things before sending my first mail and dropped the idea of refining the for loop as this will save me a few lines of code and also the loop looks good and easy to understand. Moreover it is neither taking any extra run time nor producing any warnings or errors. Also looking at the help for update.biglm, the 2nd argument is moredata not data, so if the code below is the code that you actually ran, then the new data chunks are going into the ... argument (and being ignored as that is there for future expansion and does nothing yet) and the moredata argument is left empty, which should also be giving an error. For the code below, the model is only being fit to the initial chunk and never updated, so with different chunk sizes, there is different amounts of data per model. You can check this by doing summary(fit) and looking at the sample size in the 2nd line. My fault in writing the mail. In the actual code, I gave update(fit, xx[(i+1):(i+chunksize),]) ,i.e., I just passed the new chunk as the 2nd argument without mentioning the argument name, which is correct, but while writing the mail I added the argument name as data without checking what it is. It is easier for us to help you if you provide code that can be run by copying and pasting (we don't have xx, so we can't just run the code below, you could include a line to randomly generate an xx, or a link to where a copy of xx can be downloaded from). It also helps if you mention
[R] matching each row
I have two dataframes, the first column of each dataframe is a unique id number (the rest of the columns are data variables). I would like to figure out how many times each id number appears in each dataframe. So far I can use: length( match (dataframeA$unique.id[1], dataframeB$unique.id) ) but this only works on each row of dataframe A one-at-a-time. I would like to do this for all of the rows in dataframe A, and then put the results in a new variable: dataframeA$count I'm new to R, so please be patient with me! Sorry if this question has already been answered, my search of the archives only brought up one relevant post, and I didn't understand the answer to it http://www.nabble.com/match-to20799206.html#a20799206 thx -- View this message in context: http://www.nabble.com/matching-each-row-tp24393051p24393051.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extracting a column name in loop?
Hi, I am writing a script that will address columns using syntax like: data_set[,1] to extract the data from the first column of my data set, for example. This code will be placed in a loop (where the column reference will be placed by a variable). What I also need to do is extract the column NAME for a given column being processed in the loop. The dataframe has been set so that R knows that the top line refers to column headers. Can anyone help me understand how to do this? Thanks. -- View this message in context: http://www.nabble.com/Extracting-a-column-name-in-loop--tp24393160p24393160.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting a column name in loop?
On Wed, Jul 8, 2009 at 8:41 AM, mister_bluesmanmister_blues...@hotmail.com wrote: Hi, I am writing a script that will address columns using syntax like: data_set[,1] to extract the data from the first column of my data set, for example. This code will be placed in a loop (where the column reference will be placed by a variable). What I also need to do is extract the column NAME for a given column being processed in the loop. The dataframe has been set so that R knows that the top line refers to column headers. Can anyone help me understand how to do this? Thanks. Possibly something like names(data_set)[i] ? HTH, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading from Google Docs
On 08/07/2009 12:04 PM, Gabor Grothendieck wrote: Its safer just to temporarily add it to your path. Unfortunately Rtools has a find command that conflicts with the find command in Windows so if you add the Rtools bin directory to your path permanently then you could find other programs stop working. That actually happened to me once and it took the longest time until I discovered that Rtools was the culprit. That's true, but there is a workaround: you can manually rename the find.exe in Rtools, and adjust the entry in one of the R makefiles (MkRules), and it will use the new name instead of find. The reason you might not want to do this is you might expect find to act the way it does on Unix: the Rtools basically try to make Windows look a little bit like Unix. Duncan Murdoch If you follow the advice I gave you normally won't have that problem. On Wed, Jul 8, 2009 at 11:21 AM, Duncan Murdochmurd...@stats.uwo.ca wrote: On 08/07/2009 10:13 AM, Farrel Buchinsky wrote: Forgive my naivte, but how do I make windows find tar. In other words from where do I issue the command and what is the command. You need to install the toolset, and let the installer set your path. Duncan Murdoch Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Wed, Jul 8, 2009 at 10:09, Duncan Murdoch murd...@stats.uwo.ca wrote: On 08/07/2009 10:02 AM, Farrel Buchinsky wrote: I have previously read R Installation and Administration. I read it again. It does not help me The relevant paragraph is below. But I need lower level instructions. Where can I find them. Follow the link. If Windows can't find tar, your toolset is installed incorrectly. Duncan Murdoch R CMD INSTALL works in Windows to install source packages if you have the source-code package files (option “Source Package Installation Files” in the installer) and toolset (see The Windows toolsetfile:///C:/Program%20Files/R/R-2.9.1/doc/manual/R-admin.html#The-Windows-toolset) installed. Installation of binary packages must be done by install.packages . R CMD INSTALL --help will tell you the current options under Windows (which differ from those on a Unix-alike): in particular there is a choice of the types of documentation to be installed. Farrel Buchinsky Google Voice Tel: (412) 567-7870 2009/6/19 Uwe Ligges lig...@statistik.tu-dortmund.de See the manual R Installation and Administration for information on how to install source packages on Windows. Uwe Ligges Farrel Buchinsky wrote: After issuing tar xvfz RgoogleDocs_0.2.2-src.tar.gzI am getting an error message 'tar' is not recongnized as an internal or external command, operable program or batch file. Should I use my 7-zip to open up the archive? Where should I be doing this? For instance can I do it all in my download directory or should I do it in C:\Program Files\R\R-2.9.0\library or should I manually create C:\Program Files\R\R-2.9.0\library\RGoogleDocs and do it all there or will the Rcmd INSTALL RGoogleDocs_0.2-2.tar.gz command do that for me. Yes, you assumed correctly. I am using Windows XP. Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Thu, Jun 18, 2009 at 20:17, Gabor Grothendieck ggrothendi...@gmail.comwrote: I have haven't neen following this thread but: 1. if RGoogleDocs_0.2-2.tar.gz is a source distribution (as opposed to built source) then the first line renames it so that its not the same name as the built file about to be created. The second line detars it into the RGoogleDocs directory. The third builds the built source file, RGoogleDocs_0.2-2.tar.gz. The fourth installs the built source file into R. I've assumed Windows. If you are on Linux replace rename with mv. rename RGoogleDocs_0.2-2.tar.gz RgoogleDocs_0.2.2-src.tar.gz tar xvfz RgoogleDocs_0.2.2-src.tar.gz Rcmd build RGoogleDocs Rcmd INSTALL RGoogleDocs_0.2-2.tar.gz or 2. if RGoogleDocs_0.2-2.tar.gz is already a built source file then you can just issue the last of the above lines and don't need the others. On Thu, Jun 18, 2009 at 7:52 PM, Farrel Buchinskyfjb...@gmail.com wrote: What do you mean by cd the.directory.containing.RGoogleDocs Do you mean the directory where I downloaded the RGoogleDocs_0.2-2.tar.gz to? Or do you mean that I must create a directory called RGoogleDocs under Library and then change to that directory? Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Mon, Mar 2, 2009 at 22:16, Gabor Grothendieck ggrothendi...@gmail.com wrote: Finally enter into the Windows console: cd the.directory.containing.RGoogleDocs Rcmd build RGoogleDocs Rcmd INSTALL RGoogleDocs_1.0.0.tar.gz except replace RGoogleDocs_1.0.0.tar.gz with the filename created by the build. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained,
[R] Simple monovariate classification?
I'm looking for an R function that simply recodes a quantitative variable into a number of classes according to specified break-points. Obviously I can do this using nested ifelse() commands, but I want to write it into a function where I can't pre-specify the number of classes. Is there an obvious way to do this? An example to clarify: how to convert c(0,10,5,1,9,6) to c(1,3,2,1,3,2) by specifying breaks=c(2.5,7.5) - or something like that. Thanks, Richard Gunton. INRA-Dijon, France __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading from Google Docs
Does changing the path in Windows work in real time or does one need to restart the computer for the changes to take effect. Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Wed, Jul 8, 2009 at 12:04, Gabor Grothendieck ggrothendi...@gmail.comwrote: Its safer just to temporarily add it to your path. Unfortunately Rtools has a find command that conflicts with the find command in Windows so if you add the Rtools bin directory to your path permanently then you could find other programs stop working. That actually happened to me once and it took the longest time until I discovered that Rtools was the culprit. If you follow the advice I gave you normally won't have that problem. On Wed, Jul 8, 2009 at 11:21 AM, Duncan Murdochmurd...@stats.uwo.ca wrote: On 08/07/2009 10:13 AM, Farrel Buchinsky wrote: Forgive my naivte, but how do I make windows find tar. In other words from where do I issue the command and what is the command. You need to install the toolset, and let the installer set your path. Duncan Murdoch Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Wed, Jul 8, 2009 at 10:09, Duncan Murdoch murd...@stats.uwo.ca wrote: On 08/07/2009 10:02 AM, Farrel Buchinsky wrote: I have previously read R Installation and Administration. I read it again. It does not help me The relevant paragraph is below. But I need lower level instructions. Where can I find them. Follow the link. If Windows can't find tar, your toolset is installed incorrectly. Duncan Murdoch R CMD INSTALL works in Windows to install source packages if you have the source-code package files (option Source Package Installation Files in the installer) and toolset (see The Windows toolsetfile:///C:/Program%20Files/R/R-2.9.1/doc/manual/R-admin.html#The-Windows-toolset) installed. Installation of binary packages must be done by install.packages . R CMD INSTALL --help will tell you the current options under Windows (which differ from those on a Unix-alike): in particular there is a choice of the types of documentation to be installed. Farrel Buchinsky Google Voice Tel: (412) 567-7870 2009/6/19 Uwe Ligges lig...@statistik.tu-dortmund.de See the manual R Installation and Administration for information on how to install source packages on Windows. Uwe Ligges Farrel Buchinsky wrote: After issuing tar xvfz RgoogleDocs_0.2.2-src.tar.gzI am getting an error message 'tar' is not recongnized as an internal or external command, operable program or batch file. Should I use my 7-zip to open up the archive? Where should I be doing this? For instance can I do it all in my download directory or should I do it in C:\Program Files\R\R-2.9.0\library or should I manually create C:\Program Files\R\R-2.9.0\library\RGoogleDocs and do it all there or will the Rcmd INSTALL RGoogleDocs_0.2-2.tar.gz command do that for me. Yes, you assumed correctly. I am using Windows XP. Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Thu, Jun 18, 2009 at 20:17, Gabor Grothendieck ggrothendi...@gmail.comwrote: I have haven't neen following this thread but: 1. if RGoogleDocs_0.2-2.tar.gz is a source distribution (as opposed to built source) then the first line renames it so that its not the same name as the built file about to be created. The second line detars it into the RGoogleDocs directory. The third builds the built source file, RGoogleDocs_0.2-2.tar.gz. The fourth installs the built source file into R. I've assumed Windows. If you are on Linux replace rename with mv. rename RGoogleDocs_0.2-2.tar.gz RgoogleDocs_0.2.2-src.tar.gz tar xvfz RgoogleDocs_0.2.2-src.tar.gz Rcmd build RGoogleDocs Rcmd INSTALL RGoogleDocs_0.2-2.tar.gz or 2. if RGoogleDocs_0.2-2.tar.gz is already a built source file then you can just issue the last of the above lines and don't need the others. On Thu, Jun 18, 2009 at 7:52 PM, Farrel Buchinskyfjb...@gmail.com wrote: What do you mean by cd the.directory.containing.RGoogleDocs Do you mean the directory where I downloaded the RGoogleDocs_0.2-2.tar.gz to? Or do you mean that I must create a directory called RGoogleDocs under Library and then change to that directory? Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Mon, Mar 2, 2009 at 22:16, Gabor Grothendieck ggrothendi...@gmail.com wrote: Finally enter into the Windows console: cd the.directory.containing.RGoogleDocs Rcmd build RGoogleDocs Rcmd INSTALL RGoogleDocs_1.0.0.tar.gz except replace RGoogleDocs_1.0.0.tar.gz with the filename created by the build. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] Two-way ANOVA gives different results using anova(lm()) than doing it by hand
Well, since we don't have Data.txt it is kind of hard for us to replicate what you have done. Here goes a guess as to what the problem may be. Have you told R anywhere that S1 and S2 are factors with 6 levels rather than numeric vectors? Or are you just hoping that the computer can read your mind to find out this information? (reading minds is one of the things that R and computers in general are not very good at yet. I have made a note to my future self to use the TimeTravel package to send a copy of the ESP package back to my past self, but I have not received it yet). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Lars Bergemann Sent: Wednesday, July 08, 2009 8:35 AM To: r-help@r-project.org Subject: [R] Two-way ANOVA gives different results using anova(lm()) than doing it by hand Hey! Could you please take a quick look at what I have done? Somehow I get wrong results using the anova(lm()) combination compared to doing a two way ANOVA by hand. Running: Data-read.table(Data.txt); g-lm(ExM~S1*S2,Data); anova(g); Gives: Analysis of Variance Table Response: ExM Df Sum Sq Mean Sq F valuePr(F) S1 1 4.3679 4.3679 167.045 2.2e-16 *** S2 1 0.9427 0.9427 36.053 8.236e-09 *** S1:S2 1 0.3231 0.3231 12.357 0.0005371 *** Residuals 212 5.5434 0.0261 I compared it to the work done by hand, ie calculated all the different square sums using sum() and tapply(). So I know that anova(lm()) gets the degrees of freedom equal two 1, 1, 1 and 212 when it should be 5, 5, 25 and 180. Also, the square sums are quite different ... I get 4.xx, 4.xx, 1.xx, 0.xx ... as you see, what anova(lm()) gets is different. The data: S1 has 6 levels, so has S2. On average, each cell has 6 values, most cells have actually 6 values, and there are two of each: 5, 7, 4, 8 - so average 6. Could you please help me, why it does not work with anova(lm())? I tried quite a few thinks found with Google, but it all gave me the same result as anova(lm()) ... Thanks a lot! Lars _ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] #INCLUDE
?source perhaps? --- On Wed, 7/8/09, Idgarad idga...@gmail.com wrote: From: Idgarad idga...@gmail.com Subject: [R] #INCLUDE To: r-help@r-project.org Received: Wednesday, July 8, 2009, 11:16 AM What is R's equivalent to a C-like #include to incorporate external files. I have a 2k line function that is generated and need to include it at runtime but not manage it as a package (as it changes hourly.) Any ideas? __ The new Internet Explorer® 8 - Faster, safer, easier. Optimized for Yahoo! Get it Now for Free! at http://downloads.yahoo.com/ca/internetexplorer/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Two-way ANOVA gives different results using anova(lm()) than doing it by hand
On Jul 8, 2009, at 12:11 PM, Greg Snow wrote: Well, since we don't have Data.txt it is kind of hard for us to replicate what you have done. Here goes a guess as to what the problem may be. Have you told R anywhere that S1 and S2 are factors with 6 levels rather than numeric vectors? Or are you just hoping that the computer can read your mind to find out this information? (reading minds is one of the things that R and computers in general are not very good at yet. I have made a note to my future self to use the TimeTravel package to send a copy of the ESP package back to my past self, but I have not received it yet). A definite Fortunes candidate. Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matching each row
Something like this? dataframeA - data.frame ( + unique.id= c(1,1,3,3,3,5,7,7, 9) + , x1=rnorm(9) + , x2=rnorm(9) + , x3=rnorm(9) + ) dataframeB - data.frame ( + unique.id= c(2,3,4,5,5,5,6,7,9,10,10) + , x4=rnorm(11) + , x5=rnorm(11) + , x6=rnorm(11) + ) match.counts - function ( x , y ) { + out - cbind ( + table ( x [ which ( x %in% y ) ] ) + , table ( y [ which ( y %in% x ) ] ) + ) + dimnames ( out ) [[2]] - c ( N in x , N in y ) + out + } match.counts ( dataframeA$unique.id , dataframeB$unique.id ) N in x N in y 3 3 1 5 1 3 7 2 1 9 1 1 -- David - David Huffer, Ph.D. Senior Statistician CSOSA/Washington, DC david.huf...@csosa.gov - -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of tathta Sent: Wednesday, July 08, 2009 11:10 AM To: r-help@r-project.org Subject: [R] matching each row I have two dataframes, the first column of each dataframe is a unique id number (the rest of the columns are data variables). I would like to figure out how many times each id number appears in each dataframe. So far I can use: length( match (dataframeA$unique.id[1], dataframeB$unique.id) ) but this only works on each row of dataframe A one-at-a-time. I would like to do this for all of the rows in dataframe A, and then put the results in a new variable: dataframeA$count I'm new to R, so please be patient with me! Sorry if this question has already been answered, my search of the archives only brought up one relevant post, and I didn't understand the answer to it http://www.nabble.com/match-to20799206.html#a20799206 thx -- View this message in context: http://www.nabble.com/matching-each-row-tp24393051p24393051.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Randomizing a dataframe
Here is one approach (there are others, some that are probably better, but this can get you started): 1. rearrange your data so that every insect is a single row with 2 columns: the tree id and the species (this new dataset will have as many rows as the sum of the values in the old dataset). The reshape package may be able to help with this step (you may also need the rep function). 2. randomly permute one of the 2 columns (see ?sample). 3. restructure the permuted data back to the original (the table function may be enough here, the reshape package will give more options). Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Mark Na Sent: Wednesday, July 08, 2009 9:54 AM To: r-help@r-project.org Subject: [R] Randomizing a dataframe Hi R-helpers, I have a dataframe (called data) with trees in rows (n=100) and insect species (n=10) in columns. My tree IDs are in a column called TREE and each species has a column labeled SPEC1, SPEC2, SPEC3, etc... I wish to randomize the values in my dataframe such that row and column totals are held constant, i.e. in my randomized data each tree will have the same number of individual insects as in the real data (constant row totals) and each species will have the same number of individuals as in the real data (constant column totals). I will eventually want to do this many times, but I would appreciate help getting started with the randomization. Thank you, Mark Na [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple monovariate classification?
Try ?cut Greg rgun...@dijon.inra.fr wrote: I'm looking for an R function that simply recodes a quantitative variable into a number of classes according to specified break-points. Obviously I can do this using nested ifelse() commands, but I want to write it into a function where I can't pre-specify the number of classes. Is there an obvious way to do this? An example to clarify: how to convert c(0,10,5,1,9,6) to c(1,3,2,1,3,2) by specifying breaks=c(2.5,7.5) - or something like that. Thanks, Richard Gunton. INRA-Dijon, France __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Greg Hirson ghir...@ucdavis.edu Graduate Student Agricultural and Environmental Chemistry 1106 Robert Mondavi Institute North One Shields Avenue Davis, CA 95616 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple monovariate classification?
Richard, More specifically, x = c(0,10,5,1,9,6) cut(x, breaks = c(-Inf, 2.5,7.5, Inf), labels = c(1, 2, 3)) #[1] 1 3 2 1 3 2 Hope that helps, Greg rgun...@dijon.inra.fr wrote: I'm looking for an R function that simply recodes a quantitative variable into a number of classes according to specified break-points. Obviously I can do this using nested ifelse() commands, but I want to write it into a function where I can't pre-specify the number of classes. Is there an obvious way to do this? An example to clarify: how to convert c(0,10,5,1,9,6) to c(1,3,2,1,3,2) by specifying breaks=c(2.5,7.5) - or something like that. Thanks, Richard Gunton. INRA-Dijon, France __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Greg Hirson ghir...@ucdavis.edu Graduate Student Agricultural and Environmental Chemistry 1106 Robert Mondavi Institute North One Shields Avenue Davis, CA 95616 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matching each row
Close... The output I'm looking for is more like this: output - data.frame(unique.id=c(1,3,5,7,9),N.in.x=c(2,3,1,2,1),N.in.y=c(0,1,3,1,1)) The first column can be gotten using a small change to the first table line: table ( x [ which ( x %in% x ) ] ) ##the 3rd x used to be a y but I can't modify it to make the second ideal output column, I just end up with warnings... Something like this? dataframeA - data.frame ( + unique.id= c(1,1,3,3,3,5,7,7, 9) + , x1=rnorm(9) + , x2=rnorm(9) + , x3=rnorm(9) + ) dataframeB - data.frame ( + unique.id= c(2,3,4,5,5,5,6,7,9,10,10) + , x4=rnorm(11) + , x5=rnorm(11) + , x6=rnorm(11) + ) match.counts - function ( x , y ) { + out - cbind ( + table ( x [ which ( x %in% y ) ] ) + , table ( y [ which ( y %in% x ) ] ) + ) + dimnames ( out ) [[2]] - c ( N in x , N in y ) + out + } match.counts ( dataframeA$unique.id , dataframeB$unique.id ) N in x N in y 3 3 1 5 1 3 7 2 1 9 1 1 -- David -- View this message in context: http://www.nabble.com/matching-each-row-tp24393051p24396184.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OK - I got the data - now what? :-)
Mark wrote: Currently my data is one experiment per row, but that's wasting space as most experiments only take 20% of the row and 80% of the row is filled with 0's. I might want to make the array more narrow and have a flag somewhere in the 1st 10 columns that says the this row is a continuation row from the previous row. That way I could pack the array better, use less memory and when I do finally test for 0 I have a short line to traverse? This may be a bit off track from the data manipulation you are working on, but I thought I'd point out that another way to handle this sort of data is to make a table with one measurement per row, rather than one experiment per row. experiment measurement value A 1 0.27 A 2 0.66 A 3 0.24 A 4 0.55 B 1 0.13 B 2 0.65 B 3 0.83 B 4 0.41 B 5 0.92 B 6 0.67 C 1 0.75 C 2 0.97 C 3 0.49 C 4 0.58 D 1 1.00 D 2 0.71 E 1 0.11 E 2 0.50 E 3 0.98 E 4 0.07 E 5 0.94 E 6 0.57 E 7 0.34 E 8 0.21 If you wrote the output of your calculations in this way, one value per line, it can easily be read into R as a data.frame and handled with less need for munging. No need to remove the zero-padding because the zeros aren't needed in the first place. You can subset the data with subset, as in test - read.table('test.dat',header=TRUE) expA - subset(test, experiment=='A') expB - subset(test, experiment=='B') so there is no need to deal with ragged/zero-padded arrays. Your plots can be grouped automatically with lattice: require(lattice) xyplot(value ~ measurement, data=test, group=experiment, type='b') xyplot(value ~ measurement | experiment, data=test, type='b') It is simple to do calculations by experiment using tapply. For example with(test, tapply(value, experiment, mean)) A B C D E 0.430 0.6016667 0.6975000 0.855 0.465 with(test, tapply(measurement, experiment, max)) A B C D E 4 6 4 2 8 Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] typo in ts detrending implementation in spec.pgram?
Hello! I wonder if there is a typo in detrending code of spec.pgram in spectrum.R from stats package. One can see in the code https://svn.r-project.org/R/trunk/src/library/stats/R/spectrum.R . I am afraid there is a typo and the code should look like if (detrend) { t - 1L:N - (N + 1)/2 sumt2 - N * (N^2 - 1)/12 for (i in 1L:ncol(x)) x[, i] - x[, i] - mean(x[, i]) - sum((x[, i]-mean(x[,i]) * t) * t/sumt2 } Note x[, i]-mean(x[,i]) instead of x[,i] only as in repository. Here is a quick reference http://en.wikipedia.org/wiki/Simple_linear_regression#Estimating_the_regression_line . Note $\hat b$ there. It has not x in summation, but x-mean(x). Perhaps, the even better solution would be resid(lm(x[,i] ~ seq(along = x[,i]))) . See http://tolstoy.newcastle.edu.au/R/help/05/01/10115.html Mikhail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading from Google Docs
Hooray! I got it to work. Here is what I think happened.My hold up was that the tar command was not working. If you recall, when I issued the command: tar xvfz RgoogleDocs_0.2.2-src.tar.gz cmd.exe told me it could not be found I reran Rtools29.exe which is the Rtools setup program which offered to change my path. However it still did not work. I went to lunch and took the opportunity to reboot my computer. When I retried after lunch the tar command worked and everything thereafter worked. I think that the file C:\Program Files\R\Rtools\bin\tar.exe could not be found earlier. I just looked back at my path and I see that C:\Program Files\R\Rtools\bin is on the path. RgoogleDocs 0.2-2 is amazing. I can now read data straight into a dataframe. The fact that I am always reading from realtime data is astounding. sheets.con = getGoogleDocsConnection(getGoogleAuth(fjb...@gmail.com, password here, service = wise)) ts2=getWorksheets(Consents Received,sheets.con)# put the name of the spreadsheet in the inverted commas names(ts2) sheetAsMatrix(ts2$Sheet1,header=TRUE, as.data.frame=TRUE, trim=TRUE) MAGIC Boy oh boy that process of getting source to binary was super painful. Now that I have the package as binary I can share the whole folder with my coworker and she is able to use RGoogleDocs. I intend to use the same process for the other two windows machines that I use. I really do not want to go through the same installation and path hassles all over again. Should I post my directory containing the binary files somewhere so that others do not have to experience pain. Does etiquette dictate that I should post the directory to help other or does etiquette dictate that it is Duncan Temple Lang's code and thus it his prerogative to distribute his work as he wishes? Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Wed, Jul 8, 2009 at 12:59, Farrel Buchinsky fjb...@gmail.com wrote: Does changing the path in Windows work in real time or does one need to restart the computer for the changes to take effect. Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Wed, Jul 8, 2009 at 12:04, Gabor Grothendieck ggrothendi...@gmail.comwrote: Its safer just to temporarily add it to your path. Unfortunately Rtools has a find command that conflicts with the find command in Windows so if you add the Rtools bin directory to your path permanently then you could find other programs stop working. That actually happened to me once and it took the longest time until I discovered that Rtools was the culprit. If you follow the advice I gave you normally won't have that problem. On Wed, Jul 8, 2009 at 11:21 AM, Duncan Murdochmurd...@stats.uwo.ca wrote: On 08/07/2009 10:13 AM, Farrel Buchinsky wrote: Forgive my naivte, but how do I make windows find tar. In other words from where do I issue the command and what is the command. You need to install the toolset, and let the installer set your path. Duncan Murdoch Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Wed, Jul 8, 2009 at 10:09, Duncan Murdoch murd...@stats.uwo.ca wrote: On 08/07/2009 10:02 AM, Farrel Buchinsky wrote: I have previously read R Installation and Administration. I read it again. It does not help me The relevant paragraph is below. But I need lower level instructions. Where can I find them. Follow the link. If Windows can't find tar, your toolset is installed incorrectly. Duncan Murdoch R CMD INSTALL works in Windows to install source packages if you have the source-code package files (option Source Package Installation Files in the installer) and toolset (see The Windows toolsetfile:///C:/Program%20Files/R/R-2.9.1/doc/manual/R-admin.html#The-Windows-toolset) installed. Installation of binary packages must be done by install.packages . R CMD INSTALL --help will tell you the current options under Windows (which differ from those on a Unix-alike): in particular there is a choice of the types of documentation to be installed. Farrel Buchinsky Google Voice Tel: (412) 567-7870 2009/6/19 Uwe Ligges lig...@statistik.tu-dortmund.de See the manual R Installation and Administration for information on how to install source packages on Windows. Uwe Ligges Farrel Buchinsky wrote: After issuing tar xvfz RgoogleDocs_0.2.2-src.tar.gzI am getting an error message 'tar' is not recongnized as an internal or external command, operable program or batch file. Should I use my 7-zip to open up the archive? Where should I be doing this? For instance can I do it all in my download directory or should I do it in C:\Program Files\R\R-2.9.0\library or should I manually create C:\Program Files\R\R-2.9.0\library\RGoogleDocs and do it all there or will the Rcmd INSTALL RGoogleDocs_0.2-2.tar.gz command do that for me. Yes, you assumed correctly. I am using Windows XP. Farrel Buchinsky Google Voice
Re: [R] matching each row
On Jul 8, 2009, at 10:09 AM, tathta wrote: I have two dataframes, the first column of each dataframe is a unique id number (the rest of the columns are data variables). I would like to figure out how many times each id number appears in each dataframe. So far I can use: length( match (dataframeA$unique.id[1], dataframeB$unique.id) ) but this only works on each row of dataframe A one-at-a-time. I would like to do this for all of the rows in dataframe A, and then put the results in a new variable: dataframeA$count I'm new to R, so please be patient with me! Sorry if this question has already been answered, my search of the archives only brought up one relevant post, and I didn't understand the answer to it http://www.nabble.com/match-to20799206.html#a20799206 If I am correctly understanding what you are looking for, you could do something like the following: # Create some simple data. Note that only a subset of the ID's (3:5) will match across the two DF's: set.seed(1) DF.A - data.frame(ID = sample(1:5, 10, replace = TRUE)) DF.B - data.frame(ID = sample(3:7, 10, replace = TRUE)) DF.A ID 1 2 2 2 3 3 4 5 5 2 6 5 7 5 8 4 9 4 10 1 DF.B ID 1 4 2 3 3 6 4 4 5 6 6 5 7 6 8 7 9 4 10 6 Now, create counts of the IDs in each, coercing the results to data frames and setting the count column name for each: TAB.A - as.data.frame(table(DF.A$ID), responseName = Count.A) TAB.B - as.data.frame(table(DF.B$ID), responseName = Count.B) TAB.A Var1 Count.A 11 1 22 3 33 1 44 2 55 3 TAB.B Var1 Count.B 13 1 24 3 35 1 46 4 57 1 Now, use merge() to join each of the two above. 'all = TRUE' will include non-matching keys: merge(TAB.A, TAB.B, by = Var1, all = TRUE) Var1 Count.A Count.B 11 1 NA 22 3 NA 33 1 1 44 2 3 55 3 1 66 NA 4 77 NA 1 Note that you will get NAs for any non-matching ID's (Var1). See ?table, ?as.data.frame and ?merge for more information. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OK - I got the data - now what? :-)
On Wed, Jul 8, 2009 at 10:51 AM, Michael A. Millermmill...@iupui.edu wrote: Mark wrote: Currently my data is one experiment per row, but that's wasting space as most experiments only take 20% of the row and 80% of the row is filled with 0's. I might want to make the array more narrow and have a flag somewhere in the 1st 10 columns that says the this row is a continuation row from the previous row. That way I could pack the array better, use less memory and when I do finally test for 0 I have a short line to traverse? This may be a bit off track from the data manipulation you are working on, but I thought I'd point out that another way to handle this sort of data is to make a table with one measurement per row, rather than one experiment per row. experiment measurement value A 1 0.27 A 2 0.66 A 3 0.24 A 4 0.55 B 1 0.13 B 2 0.65 B 3 0.83 B 4 0.41 B 5 0.92 B 6 0.67 C 1 0.75 C 2 0.97 C 3 0.49 C 4 0.58 D 1 1.00 D 2 0.71 E 1 0.11 E 2 0.50 E 3 0.98 E 4 0.07 E 5 0.94 E 6 0.57 E 7 0.34 E 8 0.21 If you wrote the output of your calculations in this way, one value per line, it can easily be read into R as a data.frame and handled with less need for munging. No need to remove the zero-padding because the zeros aren't needed in the first place. You can subset the data with subset, as in test - read.table('test.dat',header=TRUE) expA - subset(test, experiment=='A') expB - subset(test, experiment=='B') so there is no need to deal with ragged/zero-padded arrays. Your plots can be grouped automatically with lattice: require(lattice) xyplot(value ~ measurement, data=test, group=experiment, type='b') xyplot(value ~ measurement | experiment, data=test, type='b') It is simple to do calculations by experiment using tapply. For example with(test, tapply(value, experiment, mean)) A B C D E 0.430 0.6016667 0.6975000 0.855 0.465 with(test, tapply(measurement, experiment, max)) A B C D E 4 6 4 2 8 Mike Mike, It's not really that far off track as I didn't have any background when I started this in R. This is the first time I've used it. I simply chose to use a format that I thought would work for me in both Excel and R. I do like your examples. My impression of reshape coupled with cast is that it's pretty capable of giving me more or less the same format you suggest although it is a bit of work. Currently in my files I save only the start and finish times of the experiments and planned on calculating all the times in the middle if necessary. With this format I'd just write them out on each line and save that work in R. I suppose the files using this alternative format would be a lot larger on disk. I currently have 10 values + 500 observations per experiment with an average experiment tracking file containing maybe 500-1000 experiments. With this format in the worst I suppose I'd have (10+1) * 1000 per experiment on disk, but on average it would be less than that because as you say I wouldn't write out any zeros. Once in R in memory they'd be equivalent. Disk space doesn't matter but reading and writing the files might be slower. I suppose I don't really have to write the zeros out anyway, but at this point it's jsut one additional subset after going through reshape. It might be an advantage to get to the subset commands immediately but still I've got 10 independent variables and I suspect I'm going to be using reshape/cast more than once to get to my answers so I haven't been against learning how to work with it. Overall they are good inputs and I appreciate them. Thanks! Cheers, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] \dQuote in packages
I am in the process of submitting a package to CRAN. R CMD check ran successfully on the package on my local computer, using R version 2.1.1. However, on the computers for CRAN (with version 2.10.0), the following errors occurred: Warning in parse_Rd(./man/predict.Rd, encoding = unknown) : ./man/predict.Rd:28: unknown macro '\dquote' *** error on file ./man/predict.Rd Error : ./man/predict.Rd:28: Unrecognized macro \dquote Warning in parse_Rd(./man/print.Rd, encoding = unknown) : ./man/print.Rd:17: unexpected UNKNOWN '\sideeffects' Warning in parse_Rd(./man/simpleREEMdata.Rd, encoding = unknown) : ./man/simpleREEMdata.Rd:10: unknown macro '\item' Are \dquote, \sideeffects, and \item not supported in newer versions of R? Is there some underlying problem that I should fix that makes these show up? Thank you very much. Rebecca __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] truncated regression out-of-sample predictions
Dear all, I am trying to implement Simar Wilson's (2007) second algorithm and have the following question: If I use a truncated regression on the mn observations, how do I get fitted values for all n observations, instead of for m observations, which is what the command fitted returns; I would need these to construct the left-truncation needed to draw n random deviates. Thanks for your help, Fleur Fleur Wouterse, Ph.D. Post-Doctoral Fellow IFPRI-Dakar Immeuble Ousseynou Thiam Gueye Rue de Thies Point E, BP 15702 CP 12524 Dakar Fann Senegal Phone: +221 33 869 3986 Email: f.woute...@cgiar.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] heatmap.2: question regarding the raw z-score
Hi, I am analysing gene expression data using the heatmap.2 function in R and I was wondering what is the formula of the raw z-score bar which shows the colors for each pixel. According to that post: https://mailman.stat.ethz.ch/pipermail/r-help/2006-September/113598.html, it is the (actual value - mean of the group) / standard deviation. But, mean of which group? Mean of the gene vector? And actual value of that gene on a sample? I would be grateful if you could give me some more details about it or even if there is a book/manual that I could address to.. Thanks a lot, Chrysanthi. * * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] print() to file?
I'd like to write some objects (eg arrays) to a log file. cat() flattens them out. I'd like them formatted as in 'print' but print only writes to stdout. Is there a simple way to achieve this result? Thanks -- View this message in context: http://www.nabble.com/print%28%29-to-file--tp24397445p24397445.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matching each row
From an email suggestion, here are two sample datasets, and my ideal output: dataA - data.frame(unique.id=c(A,B,C,B),x=11:14,y=5:2) dataB - data.frame(unique.id=c(A,B,A,B,A,C,D,A),x=27:20,y=22:29) ## mystery operation(s) happen here ## ideal output would be: dataA - data.frame(unique.id=c(A,B,C,B),x=11:14,y=5:2,countA=c(1,2,1,2),countB=c(4,2,1,2)) so my mystery operation(s) would count the number of times the unique id shows up in a given dataset. my ideal outputs are as follows: countA is the mystery operation applied to dataA (counting occurrences within the same dataset) countB is applied to dataB (counting occurrences within a second dataset). My best try so far is to do: tempA - aggregate(dataA$unique.id,list(dataA$unique.id),length) which gives me a matrix with ONE instance of each unique.id and the counts... (and which I thought was kinda cute) but it only works for within a single dataset! tathta wrote: I have two dataframes, the first column of each dataframe is a unique id number (the rest of the columns are data variables). I would like to figure out how many times each id number appears in each dataframe. So far I can use: length( match (dataframeA$unique.id[1], dataframeB$unique.id) ) but this only works on each row of dataframe A one-at-a-time. I would like to do this for all of the rows in dataframe A, and then put the results in a new variable: dataframeA$count I'm new to R, so please be patient with me! thx -- View this message in context: http://www.nabble.com/matching-each-row-tp24393051p24395711.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] bootstrapping error message Error in t.star[r, ] - statistic(data, i[r, ], ...) : number of items to replace is not a multiple of replacement length
Hi, I am trying to run some bootstraps with the boot package. When I run it with 400 replicates it does it ok, but then I need to run the same analysis but with 89, 86, 102 and 106 samples (for four different environments), and then is when I get the error message: mybootstrap - boot(Datos, mystat, 2000) Error in t.star[r, ] - statistic(data, i[r, ], ...) : number of items to replace is not a multiple of replacement length Anyone familiar with this error message? Does anyone knows the minimum sample size for boot package to run properly? Is there anyway to tell R how many samples should it pick for the resampling? If it helps, this is how my model looks like: mymodel = lm(Datos[,4]~Datos[,1]+ Datos[,8]+Datos[,9]+Datos[,10]+Datos[,11]+Datos[,12]) summary(mymodel) mystat - function(a,b) f- lm(a[b,4]~a[b,1]+a[b,8]+ a[b,9]+a[b,10]+a[b,11]+a[b,12])$coef mybootstrap - boot(Datos, mystat, 2000) INT1-boot.ci(mybootstrap, conf=0.95, type=all, index=1) INT2-boot.ci(mybootstrap, conf=0.95, type=all, index=2) INT3-boot.ci(mybootstrap, conf=0.95, type=all, index=3) INT4-boot.ci(mybootstrap, conf=0.95, type=all, index=4) INT5-boot.ci(mybootstrap, conf=0.95, type=all, index=5) INT6-boot.ci(mybootstrap, conf=0.95, type=all, index=6) INT7-boot.ci(mybootstrap, conf=0.95, type=all, index=7) Thanks for your help! I am new to bootstraps and to R, and I feel pretty lonely with this Karina Boege __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.