Re: [R] accessing and preserving list names in lapply
Hi, This might be the trick you are looking for: http://tolstoy.newcastle.edu.au/R/e4/help/08/04/8720.html Romain Alexy Khrabrov wrote: res - lapply(1:length(L),do.one) Actually, I do res - lapply(:length(L),function(x)do.one(L[x])) -- this is the price of needing the element's name, so I have to both make do.one extract the name and the meat separately inside, and lapply becomes ugly. Yet the obvious alternatives -- extracting the names separately, attaching them back into list elements, etc., -- are even uglier. Something pretty? :) Cheers, Alexy -- Romain Francois Independent R Consultant +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with RBloomberg (not the usual one)
Hello, everyone! I have a problem with RBloomberg and this is not the usual no administrator rights problem. I have R 2.7.2, RBloomberg 0.1-10, RDCOMclient 0.92-0 RDCOMClient, chron, zoo, stats: these packages load OK. Then, trying to connect, I get following error message: conn - blpConnect(show.days=week, na.action=previous.days, periodicity=daily) Warning messages: 1: In getCOMInstance(name, force = TRUE, silent = TRUE) : Couldn't get clsid from the string 2: In blpConnect(show.days = week, na.action = previous.days, periodicity = daily) : Seems like this is not a Bloomberg Workstation: Error : Invalid class string Anyone encountered this problem? What is wrong and how can I solve it? Online, I found just one instance of this problem discussed, and it was in Chinese: http://cos.name/bbs/read.php?tid=12821fpage=3 Thank you for your help! Sergey __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] combining identify() and locator()
Hi, I am wondering if there might be a way to combine the two functions identify() and locator() such that if I use identify() and then click on a point outside the set tolerance, the x,y coordinates are returned as in locator(). Does anyone know of a way to do this? Thanks in advance for any help -brian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Installing different versions of R simultaneously on Linux
G'day Rainer, On Fri, 27 Feb 2009 09:34:11 +0200 Rainer M Krug r.m.k...@gmail.com wrote: I want to install some versions of R simultaneously from source on a computer (running Linux). [...] What flavour of Linux are we talking about? If it is not, how is it possible to have several versions of R on one computer, or is the only way to compile them and then call R in the directory of the version where it was compiled (~/R-2.7.2/bin/R)? For Debian based machines (I first used Debian, nowadays Kubuntu), I got into the following habit: 1) Unpack the R sources in /opt/src 2) Enter /opt/src/R-x.y.z and run configure with --prefix=/opt/R/R-x.y.z (and other options) 3) Build R with checks and documentation from source and install. 4) Run in /opt/src a script that uses update-alternative install to install the new version and creates a link from /opt/R/R-x.y.z/bin/R to /opt/bin/R-x.y.z I have /opt in my PATH, thus I can call any R version explicitly by R-x.y.z. Typing R alone, will usually start the most recently installed version (as this will have the highest priority) but I can configure that via sudo update-alternatives --config R. I.e., I can make R run a particular version. Since the update-alternative step above also registers all the *.info files and man pages, I will also access the documentation of that particular R version (e.g., C-h i in emacs will give me access to the info version of the manuals of the version of R which is run by the R command). Over time, typically when the linux system is upgraded, libraries on which old R-x.y.z binaries relied vanish. At that time I usually delete /opt/R/R-x.y.z and remove that version from the available alternatives. HTH. Let me know if you need more details. Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survival::survfit,plot.survfit
At 15:28 26.02.2009, Terry Therneau wrote: plot(survfit(fit)) should plot the survival-function for x=0 or equivalently beta'=0. This curve is independent of any covariates. This is not correct. It plots the curve for a hypothetical subject with x= mean of each covariate. Does this mean, the curve corresponds to the one you would get based on the base line hazard? Heinz This is NOT the average survival of the data set. Imagine a cohort made up of 60 year old men and their 10 year old grandsons: the expected survival of this cohort does not look that for a 35 year old male. Terry T __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] combining identify() and locator()
2009/2/27 Brian Bolt bb...@kalypsys.com: Hi, I am wondering if there might be a way to combine the two functions identify() and locator() such that if I use identify() and then click on a point outside the set tolerance, the x,y coordinates are returned as in locator(). Does anyone know of a way to do this? Thanks in advance for any help Since identify will only return the indexes of selected points, and it only takes on-screen clicks for coordinates, you'll have to leverage locator and duplicate some of the identify work. So call locator(1), then compute the distancez to your points, and if any are below your tolerance mark them using text(), otherwise keep the coordinates of the click. You can use dist() to compute a distance matrix, but if you want to totally replicate identify's tolerance behaviour I think you'll have to convert from your data coordinates to device coordinates. The grconvertX and Y functions look like they'll do that for you. Okay, that's the flatpack delivered, I think you've got all the parts, some assembly required! Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Download daily weather data
Dear Thomas, more for the sake of completeness and as an alternative to R. There are GRIB data [1] sets available (some for free) and there is the GPL software Grads [2]. Because the Grib-Format is well documented it should be possible to get it into R easily and make up your own plots/weather analyis. I do not know and have not checked if somebody has already done so. I use this information/tools aside of others during longer-dated off-shore sailing. Best, Bernhard [1] http://www.grib.us/ [2] http://www.iges.org/grads/ -Ursprüngliche Nachricht- Von: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Im Auftrag von Scillieri, John Gesendet: Donnerstag, 26. Februar 2009 22:58 An: 'James Muller'; 'r-help@r-project.org' Betreff: Re: [R] Download daily weather data Looks like you can sign up to get XML feed data from Weather.com http://www.weather.com/services/xmloap.html Hope it works out! -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of James Muller Sent: Thursday, February 26, 2009 3:57 PM To: r-help@r-project.org Subject: Re: [R] Download daily weather data Thomas, Have a look at the source code for the webpage (ctrl-u in firefox, don't know in internet explorer, etc.). That is what you'd have to parse in order to get the forecast from this page. Typically when I parse webpages such as this I use regular expressions to do so (and I would never downplay the usefulness of regular expressions, but they take a little getting used to). There are two parts to the task: find patterns that allow you to pull out the datum/data you're after; and then write a program to pull it/them out. Also, of course, download the webpage (but that's no issue). I bet you'd be able to find a comma separated value (CSV) file containing the weather report somewhere, which would probably involve a little less labor in order to produce your automatic wardrobe advice. James On Thu, Feb 26, 2009 at 3:47 PM, Thomas Levine thomas.lev...@gmail.com wrote: I'm writing a program that will tell me whether I should wear a coat, so I'd like to be able to download daily weather forecasts and daily reports of recent past weather conditions. The NOAA has very promising tabular forecasts (http://forecast.weather.gov/MapClick.php?CityName=Ithacastate =NYsite=BGMtextField1=42.4422textField2=-76.5002e=0FcstType=digital), but I can't figure out how to import them. Someone must have needed to do this before. Suggestions? Thomas Levine! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This e-mail and any attachments are confidential, may contain legal, professional or other privileged information, and are intended solely for the addressee. If you are not the intended recipient, do not use the information in this e-mail in any way, delete this e-mail and notify the sender. CEG-IP1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. * Confidentiality Note: The information contained in this ...{{dropped:10}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Axis-question
Hi there, I was wondering wether it's possible to generate an axis with groups (like in Excel). So that you can have something like this as x-axis (for example for the levelplot-method of the lattice package): --- | X1 | X2 | X3 | X1 | X2 | X3 | X1 | ... | group1 | group2 | group3 ... .. .. .. I hope you understand what I'm looking for? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] accessing and preserving list names in lapply
Hi, Perhaps Hadley's plyr package can help, library(plyr) temp - list(x=2,y=3,x=4) llply(temp, function(x) x^2 ) $x [1] 4 $y [1] 9 $x [1] 16 baptiste On 27 Feb 2009, at 03:07, Alexy Khrabrov wrote: Sometimes I'm iterating over a list where names are keys into another data structure, e.g. a related list. Then I can't use lapply as it does [[]] and loses the name. Then I do something like this: do.one - function(ldf) { # list-dataframe item key - names(ldf) meat - ldf[[1]] mydf - some.df[[key]] # related data structure r.df - cbind(meat,new.column=computed) r - list(xxx=r.df) names(r) - key r } then if I operate on the list L of those ldf's not as lapply(L,...), but res - lapply(1:length(L),do.one) Can this procedure be simplified so that names are preserved? Specifically, can the xxx=..., xxx - key part be eliminated -- how can we have a variable on the left-hand side of list(lhs=value)? Cheers, Alexy __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ Baptiste Auguié School of Physics University of Exeter Stocker Road, Exeter, Devon, EX4 4QL, UK Phone: +44 1392 264187 http://newton.ex.ac.uk/research/emag __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] combining identify() and locator()
Hi, I am wondering if there might be a way to combine the two functions identify() and locator() such that if I use identify() and then click on a point outside the set tolerance, the x,y coordinates are returned as in locator(). Does anyone know of a way to do this? Thanks in advance for any help -brian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Balanced design, differences in results using anova and lmer/anova
Hi, I am trying to do an analysis of variance for an unbalanced design. As a toy example, I use a dataset presented by K. Hinkelmann and O. Kempthorne in Design and Anaylysis of Experiments (p353-356). This example is very similar to my own dataset, with one difference: it is balanced. Thus it is possible to do an anaylsis using both: (1) anova, and (2) lmer. Furthermore, I can compare my results with the results presented in the book (the book uses SAS). In short: using anova, I can reproduce the results presented in the book. using lmer, I fail to reproduce the results However, for my real analysis, I need lmer - what do I do wrong? The example uses as randomized complete block desigh (RCBD) with a nested blocking structure and subsampling. response: height (of some trees) covariates: HSF (type of the trees) nested covariates: loc (location) block (block is nested in location) # the data (file: pine.txt) looks like this: locblockHSFheight 111210 111221 112252 112260 113197 113190 121222 121214 122265 122271 123201 123210 131220 131225 132271 132277 133205 133204 141224 141231 142270 142283 143211 143216 211178 211175 212191 212193 213182 213179 221180 221184 222198 222201 223183 223190 231189 231183 232200 232195 233197 233205 241184 241192 242197 242204 243192 243190 # # then I load the data # read.data = function() { d = read.table( pines.txt, header=TRUE ) d$loc = as.factor( d$loc ) d$block.tmp = as.factor( d$block ) d$block = ( d$loc:d$block.tmp )[drop=TRUE] # lme4 does not support implicit nesting d$HSF = as.factor( d$HSF ) return( d ) } d = read.data() # # using anova. # m.aov = aov( height ~ HSF*loc + Error(loc/block + HSF:loc/block), data=d ) summary( m.aov ) # # I get: # Error: loc Df Sum Sq Mean Sq loc 1 20336 20336 Error: loc:block Df Sum Sq Mean Sq F value Pr(F) Residuals 6 1462.33 243.72 Error: loc:HSF Df Sum Sq Mean Sq HSF 2 12170.7 6085.3 HSF:loc 2 6511.2 3255.6 Error: loc:block:HSF Df Sum Sq Mean Sq F value Pr(F) Residuals 12 301.167 25.097 Error: Within Df Sum Sq Mean Sq F value Pr(F) Residuals 24 529.00 22.04 # # which is, what I expected, however, using lmer # m.lmer = lmer( height ~ HSF*loc + HSF*(loc|block), data=d ) anova( m.lmer ) # # I get: # Analysis of Variance Table Df Sum Sq Mean Sq HSF 2 12170.7 6085.3 loc 1 1924.6 1924.6 HSF:loc 2 6511.2 3255.6 # # what is, at least not what I expected... # Thanks for your help, Lars __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using package ROCR
Just an update concerning an error message in using ROCR package. Error in as.double(y) : cannot coerce type 'S4' to vector of type 'double' I have changed the sequence of loading the packages and the problem has gone: library(ROCR) library(randomForest) The loading sequence that caused an error was: library(randomForest) library(ROCR) May be this info could be useful for somebody else who is getting the same error. wiener30 wrote: Thank you very much for the response! The plot(1,1) helped to resolve the first problem. But I am still getting a second error message when running demo(ROCR) Error in as.double(y) : cannot coerce type 'S4' to vector of type 'double' It seems it has something to do with compatibility of S4 objects. My versions of R and ROCR package are the same as you listed. But it seems something other is missing in my installation. William Doane wrote: Responding to question 1... it seems the demo assumes you already have a plot window open. library(ROCR) plot(1,1) demo(ROCR) seems to work. For question 2, my environment produces the expected results... plot doesn't generate an error: * R 2.8.1 GUI 1.27 Tiger build 32-bit (5301) * OS X 10.5.6 * ROCR 1.0-2 -Wil wiener30 wrote: I am trying to use package ROCR to analyze classification accuracy, unfortunately there are some problems right at the beginning. Question 1) When I try to run demo I am getting the following error message library(ROCR) demo(ROCR) if(dev.cur() = 1) [TRUNCATED] Error in get(getOption(device)) : wrong first argument When I issue the command dev.cur() it returns null device 1 It seems something is wrong with my R-environment ? Could somebody provide a hint, what is wrong. Question 2) When I run an example commands from the manual library(ROCR) data(ROCR.simple) pred - prediction( ROCR.simple$predictions, ROCR.simple$labels ) perf - performance( pred, tpr, fpr ) plot( perf ) the plot command issues the following error message Error in as.double(y) : cannot coerce type 'S4' to vector of type 'double' How this could be fixed ? Thanks for the support -- View this message in context: http://www.nabble.com/Using-package-ROCR-tp22198213p22242023.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bottom legends in ggplot2 ?
I would think that the lines below should work but they give an error. Hadley, can you clarify this? Cheers, Thierry library(ggplot2) qplot(mpg, wt, data=mtcars, colour=cyl) + opts(legend.position = bottom) Error in grid.Call.graphics(L_setviewport, pvp, TRUE) : Non-finite location and/or size for viewport ggplot(mtcars, aes(x = mpg, y = wt, colour = cyl)) + geom_point() + opts(legend.position= bottom) Error in grid.Call.graphics(L_setviewport, pvp, TRUE) : Non-finite location and/or size for viewport sessionInfo() R version 2.8.1 (2008-12-22) i386-pc-mingw32 locale: LC_COLLATE=Dutch_Belgium.1252;LC_CTYPE=Dutch_Belgium.1252;LC_MONETARY=Du tch_Belgium.1252;LC_NUMERIC=C;LC_TIME=Dutch_Belgium.1252 attached base packages: [1] grid stats graphics grDevices datasets utils methods [8] base other attached packages: [1] ggplot2_0.8.1 reshape_0.8.2 plyr_0.1.5proto_0.3-8 ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Avram Aelony Verzonden: donderdag 26 februari 2009 20:34 Aan: r-h...@stat.math.ethz.ch Onderwerp: [R] bottom legends in ggplot2 ? Has anyone had success with producing legends to a qplot graph such that the legend is placed on the bottom, under the abcissa rather than to the right hand side ? The following doesn't move the legend: library(ggplot2) qplot(mpg, wt, data=mtcars, colour=cyl, gpar(legend.position=bottom) ) I am using ggplot2_0.8.2. Thanks in advance, Avram __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gplot problems with faceting
Dear Pascal, I thik you need to define the facets as facets = ~ Par Instead of facets = Par ~ . The Par ~ . Syntax can be used with facet_grid and not with facet_wrap. HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens BOISSON, Pascal Verzonden: donderdag 26 februari 2009 17:08 Aan: r-help@r-project.org Onderwerp: [R] gplot problems with faceting Dear R-Listers, I am very confused with what seems to be a misuse of the faceting options with gplot function and I hope you might help me on this. z contains various simulation results from simulations with different set of parameters. I melt my data to have the following data.frame structure : str(z) 'data.frame': 12383 obs. of 5 variables: $ vID : num 1 2 3 4 5 6 7 8 9 10 ... $ Var : Factor w/ 61 levels .t,.ASU_1.Biofilm_C,..: 1 1 1 1 1 1 $ Var.Value: num 317 318 319 320 319 ... $ Par : Factor w/ 7 levels .Biostyr0d.t_K,..: 1 1 1 1 1 1 1 1 $ Par.Value: num 5 5 5 5 5 5 5 5 5 5 ... I would like to plot for each couple (Parameter(i), Variable(j)) the plot Variable(j).value = f(Parameter(i).Value. I would like to do it step wise and have one set of graphs per Variable. Then I subset z based on a single variable name eg .ASU_1.Biofilm_C Then I try the following, but I get an error message : qp- qplot(Par.Value, Var.Value, data = z[z$Var==v,], ylab=v, geom=c(point,smooth), method=lm) qp- qp + facet_wrap( facets= Par~ ., scales = free_x, ncol=length(vPar)) qp Erreur dans `[.data.frame`(plot$data, , setdiff(cond, names(df)), drop = FALSE) : colonnes non définies sélectionnées I can have this working by modifying the facets arguments to Par~Var, and it does what I want, But it is not satisfying, and I am confused with this error message. The same error message happens when I use the full data frame. Or when I try other mappings like colors = Par Any idea of what I am doing wrong? Best regards Pascal Boisson ___ Protegeons ensemble l'environnement : avez-vous besoin d'imprimer ce courrier electronique ? ___ Les informations figurant sur cet e-mail ont un caractere strictement confidentiel et sont exclusivement adressees au destinataire mentionne ci-dessus.Tout usage, reproduction ou divulgation de cet e-mail est strictement interdit si vous n'en etes pas le destinataire. Dans ce cas, veuillez nous en avertir immediatement par la meme voie et detruire l'original. Merci. This e-mail is intended only for use of the individual or entity to which it is addressed and may contain information that is privileged, confidential and exempt from disclosure under applicable law. Any use, distribution or copying of this e-mail communication is strictly prohibited if you are not the addressee. If so, please notify us immediately by e-mail, and destroy the original. Thank you. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rounding problem
hi i am creating some variables from same data, but somewhere is different rouding. look: P = abs(fft(d.zlato)/480)^2 hladane= sort(P,decreasing=T)[1:10]/480 pozicia=c(0,0,0,0,0) for (j in 1:5){ for (i in 2:239){ if (P[i]/480==hladane[2*j-1]){pozicia[j]=i-1}}} period=479/pozicia P[2]/334 [1] 0.0001279107 hladane[1] [1] 0.0001279107 P[2]/334==hladane[1] [1] FALSE abs(P[2]/334 - hladane[1]) 0.001 [1] TRUE It is possible to avoid it ? I know in this exam i can use 2x if to eliminate this rouding, but i need to fix it in general. -- View this message in context: http://www.nabble.com/rounding-problem-tp22243179p22243179.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Installing different versions of R simultaneously on Linux
This is really an R-devel question. On Fri, 27 Feb 2009, Rainer M Krug wrote: Hi I want to install some versions of R simultaneously from source on a computer (running Linux). Some programs have an option to specify a suffix for the executable (eg R would become R-2.7.2 when the suffix is specified as -2.7.2). I did not find this option for R - did I overlook it? If it is not, how is it possible to have several versions of R on one computer, or is the only way to compile them and then call R in the directory of the version where it was compiled (~/R-2.7.2/bin/R)? If this is the case, would it be possible to add this o[ptiuon to specify the suffix for the executables? 'R' is not an executable, but a shell script. You can use 'prefix' to install R anywhere, or other variables for more precise control (see the R-admin manual). For example, we use rhome to have R 2.8.x under /usr/local/lib64/R-2.8 etc. And you can rename $prefix/bin/R to, say, R-2.7.2, or link R_HOME/bin/R to anywhere in yout path, under any name you choose. Thanks Rainer -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Faculty of Science Natural Sciences Building Private Bag X1 University of Stellenbosch Matieland 7602 South Africa -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rounding problem
Hi, you probably want to use ?all.equal instead of == I couldn't run your example, though Hope this helps, baptiste On 27 Feb 2009, at 10:32, Peterko wrote: hi i am creating some variables from same data, but somewhere is different rouding. look: P = abs(fft(d.zlato)/480)^2 hladane= sort(P,decreasing=T)[1:10]/480 pozicia=c(0,0,0,0,0) for (j in 1:5){ for (i in 2:239){ if (P[i]/480==hladane[2*j-1]){pozicia[j]=i-1}}} period=479/pozicia P[2]/334 [1] 0.0001279107 hladane[1] [1] 0.0001279107 P[2]/334==hladane[1] [1] FALSE abs(P[2]/334 - hladane[1]) 0.001 [1] TRUE It is possible to avoid it ? I know in this exam i can use 2x if to eliminate this rouding, but i need to fix it in general. -- View this message in context: http://www.nabble.com/rounding-problem-tp22243179p22243179.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ Baptiste Auguié School of Physics University of Exeter Stocker Road, Exeter, Devon, EX4 4QL, UK Phone: +44 1392 264187 http://newton.ex.ac.uk/research/emag __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Installing different versions of R simultaneously on Linux
Prof Brian Ripley wrote: This is really an R-devel question. On Fri, 27 Feb 2009, Rainer M Krug wrote: Hi I want to install some versions of R simultaneously from source on a computer (running Linux). Some programs have an option to specify a suffix for the executable (eg R would become R-2.7.2 when the suffix is specified as -2.7.2). I did not find this option for R - did I overlook it? If it is not, how is it possible to have several versions of R on one computer, or is the only way to compile them and then call R in the directory of the version where it was compiled (~/R-2.7.2/bin/R)? If this is the case, would it be possible to add this o[ptiuon to specify the suffix for the executables? 'R' is not an executable, but a shell script. depending on what is meant by 'executable'. Files that contain instructions for an interpreter http://en.wikipedia.org/wiki/Interpreter_%28computing%29 or virtual machine http://en.wikipedia.org/wiki/Virtual_machine may be considered executables [1] The term might also be, but generally isn't, applied to scripts http://foldoc.org/index.cgi?scripts which are interpreted by a command line interpreter http://foldoc.org/index.cgi?command+line+interpreter. [2] try also file `which R` which is likely, system-dependently, to say that it's *executable* (independently of the access mode). vQ [1] http://en.wikipedia.org/wiki/Executable [2] http://foldoc.org/index.cgi?query=executableaction=Search vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rounding problem
all.equal is what i need, many thanks to help me baptiste auguie-2 wrote: Hi, you probably want to use ?all.equal instead of == I couldn't run your example, though Hope this helps, baptiste On 27 Feb 2009, at 10:32, Peterko wrote: hi i am creating some variables from same data, but somewhere is different rouding. look: P = abs(fft(d.zlato)/480)^2 hladane= sort(P,decreasing=T)[1:10]/480 pozicia=c(0,0,0,0,0) for (j in 1:5){ for (i in 2:239){ if (P[i]/480==hladane[2*j-1]){pozicia[j]=i-1}}} period=479/pozicia P[2]/334 [1] 0.0001279107 hladane[1] [1] 0.0001279107 P[2]/334==hladane[1] [1] FALSE abs(P[2]/334 - hladane[1]) 0.001 [1] TRUE It is possible to avoid it ? I know in this exam i can use 2x if to eliminate this rouding, but i need to fix it in general. -- View this message in context: http://www.nabble.com/rounding-problem-tp22243179p22243179.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ Baptiste Auguié School of Physics University of Exeter Stocker Road, Exeter, Devon, EX4 4QL, UK Phone: +44 1392 264187 http://newton.ex.ac.uk/research/emag __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/rounding-problem-tp22243179p22243567.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inefficiency of SAS Programming
Frank, I can't see the code you mention - Web marshall at work - but I don't think you should be too quick to run down SAS - it's a powerful and flexible language but unfortunately very expensive. Your example mentions doing a vector product in the macro language - this only suggest to me that those people writing the code need a crash course in SAS/IML (the matrix language). SAS is designed to work on records and so is inapproprorriate for matrices - macros are only an efficient code copying device. Doing matrix computations in this way is pretty mad and the code would be impossible never mind the memory problems. SAS recognise that but a lot of SAS users remain familiar with IML. In IML by contrast there are inner, cross and outer products and a raft of other useful methods for matrix work that R users would be familiar with. OLS for example is one line: b = solve(X`X, X`y) ; rss = sqrt(ssq(y - Xb)) ; And to give you a flavour of IML's capabilities I implemented a SAS version of the MARS program in it about 6 or 7 years ago. BTW SPSS also has a matrix language. Gerard Frank E Harrell Jr f.harr...@vander To bilt.edu R list r-h...@stat.math.ethz.ch Sent by: cc r-help-boun...@r- project.org Subject [R] Inefficiency of SAS Programming 26/02/2009 22:57 If anyone wants to see a prime example of how inefficient it is to program in SAS, take a look at the SAS programs provided by the US Agency for Healthcare Research and Quality for risk adjusting and reporting for hospital outcomes at http://www.qualityindicators.ahrq.gov/software.htm . The PSSASP3.SAS program is a prime example. Look at how you do a vector product in the SAS macro language to evaluate predictions from a logistic regression model. I estimate that using R would easily cut the programming time of this set of programs by a factor of 4. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ** The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer. It is the policy of the Department of Justice, Equality and Law Reform and the Agencies and Offices using its IT services to disallow the sending of offensive material. Should you consider that the material contained in this message is offensive you should contact the sender immediately and also mailminder[at]justice.ie. Is le haghaidh an duine nó an eintitis ar a bhfuil sí dírithe, agus le haghaidh an duine nó an eintitis sin amháin, a bheartaítear an fhaisnéis a tarchuireadh agus féadfaidh sé go bhfuil ábhar faoi rún agus/nó faoi phribhléid inti. Toirmisctear aon athbhreithniú, atarchur nó leathadh a dhéanamh ar an bhfaisnéis seo, aon úsáid eile a bhaint aisti nó aon ghníomh a dhéanamh ar a hiontaoibh, ag daoine nó ag eintitis seachas an faighteoir beartaithe. Má fuair tú é seo trí dhearmad, téigh i dteagmháil leis an seoltóir, le do thoil, agus scrios an t-ábhar as aon ríomhaire. Is é beartas na Roinne Dlí agus Cirt, Comhionannais agus Athchóirithe Dlí, agus na nOifígí agus na nGníomhaireachtaí a úsáideann seirbhísí TF na Roinne, seoladh ábhair cholúil a dhícheadú. Más rud é go measann tú gur ábhar colúil atá san ábhar atá sa teachtaireacht seo is ceart duit dul i dteagmháil leis an seoltóir láithreach agus
Re: [R] bottom legends in ggplot2 ?
Yes, this is a known bug which will (hopefully) be addressed in the next release. Hadley On Fri, Feb 27, 2009 at 4:15 AM, ONKELINX, Thierry thierry.onkel...@inbo.be wrote: I would think that the lines below should work but they give an error. Hadley, can you clarify this? Cheers, Thierry library(ggplot2) qplot(mpg, wt, data=mtcars, colour=cyl) + opts(legend.position = bottom) Error in grid.Call.graphics(L_setviewport, pvp, TRUE) : Non-finite location and/or size for viewport ggplot(mtcars, aes(x = mpg, y = wt, colour = cyl)) + geom_point() + opts(legend.position= bottom) Error in grid.Call.graphics(L_setviewport, pvp, TRUE) : Non-finite location and/or size for viewport sessionInfo() R version 2.8.1 (2008-12-22) i386-pc-mingw32 locale: LC_COLLATE=Dutch_Belgium.1252;LC_CTYPE=Dutch_Belgium.1252;LC_MONETARY=Du tch_Belgium.1252;LC_NUMERIC=C;LC_TIME=Dutch_Belgium.1252 attached base packages: [1] grid stats graphics grDevices datasets utils methods [8] base other attached packages: [1] ggplot2_0.8.1 reshape_0.8.2 plyr_0.1.5 proto_0.3-8 ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Avram Aelony Verzonden: donderdag 26 februari 2009 20:34 Aan: r-h...@stat.math.ethz.ch Onderwerp: [R] bottom legends in ggplot2 ? Has anyone had success with producing legends to a qplot graph such that the legend is placed on the bottom, under the abcissa rather than to the right hand side ? The following doesn't move the legend: library(ggplot2) qplot(mpg, wt, data=mtcars, colour=cyl, gpar(legend.position=bottom) ) I am using ggplot2_0.8.2. Thanks in advance, Avram __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survival::predict.coxph
Hello Therry, it´s really great to receive some feedback from a pro. I´m not sure if I´ve got the point right: You suppose that the cox-model isn´t good at forecasting an expected survival time because of the issues with the prediction of the survival-function at the right tail and one should better use parametric models like an exponential model? Or what do you mean by smooth parametric estimate? Anyways I just ordered your book at the library. Hopefully I´ll get some more insights by the lecture of it. Maybe I should point out why I even tried to do such forecasts. Following the article Quantifying climate-related risks and uncertainties using Cox regression models by Maia and Meinke I try to deduce winter-precipitation from lagged Sea-Surface-Temperatures (SSTs). So precipitation is my survival-time and and the SST-Observations at different lags are my covariates. The sample size is only 55 and I´ve got 11 covariates (Lag=0 months to Lag=10 months) to choose from. My first goal is to identify the optimal time-lag(s) between SST-Anomaly-Observation and Precipitation-Observation. Expectation was that the lag should be some months. I thought a cox-model would easily provide such a selection. At first I used the covariates individually. Coefficients for lags between 0 and 5 months were all quite big and then decreasing from 6 to 10 months. So I think 5 months could be the lag of the process and high persistence of the SST accounts for the big coefficients for 0-4 months. As the next step I used all 11 covariates at once. I hoped to gain similar results. Instead the sign of the coefficients randomly jumps from plus to minus and the magnitude as well is randomly distributed. I also tried to using sets of three covariates e.g. with lag 4,5,6. But even then the sign of the coefficients is varying. So my thought was that maybe I overfitted the model. But in fact I did not find any literature if that´s even possible. As far as my limited knowledge goes, overfitted models should reproduce the training-period very good but other periods very poor. So I first tried to reproduce the training-period. But so far with no success - as well with using 11 covariates or just 1. Regards Bernhard R. Terry Therneau wrote: You are mostly correct. Because of the censoring issue, there is no good estimate of the mean survival time. The survival curve either does not go to zero, or gets very noisy near the right hand tail (large standard error); a smooth parametric estimate is what is really needed to deal with this. For this reason the mean survival, though computed (but see the survfit.print.mean option, help(print.survfit)) is not highly regarded. It is not an option in predict.coxph. Terry T. begin included message -- Hi, if I got it right then the survival-time we expect for a subject is the integral over the specific survival-function of the subject from 0 to t_max. If I have a trained cox-model and want to make a prediction of the survival-time for a new subject I could use survfit(coxmodel, newdata=newSubject) to estimate a new survival-function which I have to integrate thereafter. Actually I thought predict(coxmodel, newSubject) would do this for me, but I?m confused which type I have to declare. If I understand the little pieces of documentation right then none of the available types is exactly the predicted survival-time. I think I have to use the mean survival-time of the baseline-function times exp(the result of type linear predictor). Am I right? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inefficiency of SAS Programming
I would like to know if we can create a package in which r functions are renamed closer to sas language.doing so will help people familiar to SAS to straight away take to R for their work,thus decreasing the threshold for acceptance - and then get into deeper understanding later. since it is a package it would be optional only for people wanting to try out R from SAS.. Do we have such a package right now..it basically masks R functions to the equivalent function in another language just for user ease /beginners for example creating function for means procmeans-function(x,y) + { summary ( subset(x,select=c(x,y)) + ) creating function for importing csv procimport -function(x,y) + { read.csv( textConnection(x),row.names=y,na.strings= + ) creating function fo describing data procunivariate-function(x)+ { summary(x) + ) regards, ajay www.decisionstats.com On Fri, Feb 27, 2009 at 4:27 AM, Frank E Harrell Jr f.harr...@vanderbilt.edu wrote: If anyone wants to see a prime example of how inefficient it is to program in SAS, take a look at the SAS programs provided by the US Agency for Healthcare Research and Quality for risk adjusting and reporting for hospital outcomes at http://www.qualityindicators.ahrq.gov/software.htm . The PSSASP3.SAS program is a prime example. Look at how you do a vector product in the SAS macro language to evaluate predictions from a logistic regression model. I estimate that using R would easily cut the programming time of this set of programs by a factor of 4. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inefficiency of SAS Programming
2009/2/27 Peter Dalgaard p.dalga...@biostat.ku.dk: Presumably, something like IF N. = 1 THEN SUB_N = 1; ELSE IF N. 5 THEN SUB_N = N.-1; ELSE IF N. 16 THEN SUB_N = N.-2; ELSE SUB_N = N.-3; would work, provided that 2, 5, 16 are impossible values. Problem is that it actually makes the code harder to grasp, so experienced SAS programmers go for the dumb but readable code like the above. I'm not sure which is easier to grasp. When I first saw the original version I thought it was an odd way of doing SUB_N = N.. Only then did I have a closer look and spot the missing 2, 5, and 16. A comment would have been very enlightening. But there was nothing relevant. In R, the cleanest I can think of is subn - match(n, setdiff(1:19, c(2,5,16))) or maybe just subn - match(n, c(1, 3:4, 6:15, 17:19)) although subn - factor(n, levels = c(1, 3:4, 6:15, 17:19)) might be what is really wanted I think the important thing with any programming is to make sure what you want is expressed in words somewhere. If not in the code, then in the comments. And operations like this should be abstracted into functions. All the examples of SAS code I've seen seem to fall into the old practices of writing great long 'scripts', with minimal code-reuse and encapsulation of useful functionality. If these SAS scripts are then given to new SAS programmers then the chances are they will follow these bad practices. Show them well-written R code (or C, or Python) and maybe they can implement those good practices into their SAS work. Assuming SAS can do that. I'm not sure. Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Installing different versions of R simultaneously on Linux
On Fri, Feb 27, 2009 at 12:37 PM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: This is really an R-devel question. sorry about the wrong list. On Fri, 27 Feb 2009, Rainer M Krug wrote: Hi I want to install some versions of R simultaneously from source on a computer (running Linux). Some programs have an option to specify a suffix for the executable (eg R would become R-2.7.2 when the suffix is specified as -2.7.2). I did not find this option for R - did I overlook it? If it is not, how is it possible to have several versions of R on one computer, or is the only way to compile them and then call R in the directory of the version where it was compiled (~/R-2.7.2/bin/R)? If this is the case, would it be possible to add this o[ptiuon to specify the suffix for the executables? 'R' is not an executable, but a shell script. You can use 'prefix' to install R anywhere, or other variables for more precise control (see the R-admin manual). For example, we use rhome to have R 2.8.x under /usr/local/lib64/R-2.8 etc. And you can rename $prefix/bin/R to, say, R-2.7.2, or link R_HOME/bin/R to anywhere in yout path, under any name you choose. OK - so the proceuder will be: if I want to install R 2.7.2 without impacting on my existing installation of R (which is done by a package manager), I use ./configure --prefix=/usr/bin/R-2.7.2 make ln -s /usr/R-2.7.2/bin/R /usr/bin/R-2.7.2 and when I use R-2.7.2 it will start R 2.7.2 I can continue with as many installed version as I want Thanks a lot, that was what I was looking for Rainer Thanks Rainer -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Faculty of Science Natural Sciences Building Private Bag X1 University of Stellenbosch Matieland 7602 South Africa -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Faculty of Science Natural Sciences Building Private Bag X1 University of Stellenbosch Matieland 7602 South Africa __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Installing different versions of R simultaneously on Linux
G'day Rainer, On Fri, 27 Feb 2009 10:53:12 +0200 Rainer M Krug r.m.k...@gmail.com wrote: What flavour of Linux are we talking about? Sorry - I am running SuSE on the machine where I need it. Sorry, I am not familiar with that flavour; before switching to Debian (and Debian based distributions), I was using RedHat. And before that Slackware. 4) Run in /opt/src a script that uses update-alternative install to install the new version and creates a link from /opt/R/R-x.y.z/bin/R to /opt/bin/R-x.y.z How do I do this? I usually call sudo make install. Do I have to use update-alternative --install R-2.7.1 R 2 if I want to have R-2.7.1 aqs the second priority installed? I do the make install step manually, the script just alerts the system that another alternative for the R command was installed. If memory serves correctly, the alternatives mechanism was developed by Debian and adopted by RedHat (or the other way round). I am not sure whether SuSE has adopted this, or a similar system. Essentially, for a command, say foo, for which several alternatives exists, is installed on the system in, say /usr/bin/, as a link to /etc/alternatives/foo and /etc/alternatives/foo is a link to the actual program that is called. E.g. on my machine I have ber...@berwin-nus1:~$ update-alternatives --list wish /usr/bin/wish8.5 /usr/bin/wish8.4 which tells me that wish 8.5 and wish8.4 are installed and I could call them explicitly. /usr/bin/wish is a link to /etc/alternatives/wish and /etc/alternatives/wish will point to either of these two programs (depending on what the system admin decided should be the default, i.e. should be used if a user just types 'wish'). A command like update-alternatives --config wish allows to configure whether wish should mean wish8.5 or wish8.4. And all that is necessary is to change the link in /etc/alternatives/wish to point at the desired program. That is what I need - but I can't find update-alternatives in SuSE As I said, I do not know whether SuSE offers this alternatives system or a similar system. If it does, perhaps it is just a matter of installing some additional packages? If it offers a different, but similar system, then you would have to ask on a SuSE list on that system is maintained and configured. On my machine I would say apt-file search update-alternatives to find out which package provides that command and to install that package if it is not yet installed. I am afraid I do not know what the equivalent command on SuSE is. Typing R alone, will usually start the most recently installed version (as this will have the highest priority) but I can configure that via sudo update-alternatives --config R. __I.e., I can make R run a particular version. __Since the update-alternative step above also registers all the *.info files and man pages, I will also access the documentation of that particular R version (e.g., C-h i in emacs will give me access to the info version of the manuals of the version of R which is run by the R command). Exactly what I would like to have. Well, if you ever use a system that has the alternatives set up and the update-alternatives command, I am happy to share my script with you. Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Installing different versions of R simultaneously on Linux
On Fri, Feb 27, 2009 at 1:49 PM, Berwin A Turlach ber...@maths.uwa.edu.au wrote: G'day Rainer, On Fri, 27 Feb 2009 10:53:12 +0200 Rainer M Krug r.m.k...@gmail.com wrote: What flavour of Linux are we talking about? Sorry - I am running SuSE on the machine where I need it. Sorry, I am not familiar with that flavour; before switching to Debian (and Debian based distributions), I was using RedHat. And before that Slackware. 4) Run in /opt/src a script that uses update-alternative install to install the new version and creates a link from /opt/R/R-x.y.z/bin/R to /opt/bin/R-x.y.z How do I do this? I usually call sudo make install. Do I have to use update-alternative --install R-2.7.1 R 2 if I want to have R-2.7.1 aqs the second priority installed? I do the make install step manually, the script just alerts the system that another alternative for the R command was installed. If memory serves correctly, the alternatives mechanism was developed by Debian and adopted by RedHat (or the other way round). I am not sure whether SuSE has adopted this, or a similar system. Essentially, for a command, say foo, for which several alternatives exists, is installed on the system in, say /usr/bin/, as a link to /etc/alternatives/foo and /etc/alternatives/foo is a link to the actual program that is called. E.g. on my machine I have ber...@berwin-nus1:~$ update-alternatives --list wish /usr/bin/wish8.5 /usr/bin/wish8.4 which tells me that wish 8.5 and wish8.4 are installed and I could call them explicitly. /usr/bin/wish is a link to /etc/alternatives/wish and /etc/alternatives/wish will point to either of these two programs (depending on what the system admin decided should be the default, i.e. should be used if a user just types 'wish'). A command like update-alternatives --config wish allows to configure whether wish should mean wish8.5 or wish8.4. And all that is necessary is to change the link in /etc/alternatives/wish to point at the desired program. That is what I need - but I can't find update-alternatives in SuSE As I said, I do not know whether SuSE offers this alternatives system or a similar system. If it does, perhaps it is just a matter of installing some additional packages? If it offers a different, but similar system, then you would have to ask on a SuSE list on that system is maintained and configured. On my machine I would say apt-file search update-alternatives to find out which package provides that command and to install that package if it is not yet installed. I am afraid I do not know what the equivalent command on SuSE is. Typing R alone, will usually start the most recently installed version (as this will have the highest priority) but I can configure that via sudo update-alternatives --config R. __I.e., I can make R run a particular version. __Since the update-alternative step above also registers all the *.info files and man pages, I will also access the documentation of that particular R version (e.g., C-h i in emacs will give me access to the info version of the manuals of the version of R which is run by the R command). Exactly what I would like to have. Well, if you ever use a system that has the alternatives set up and the update-alternatives command, I am happy to share my script with you. Thanks a lot for the offer - that would be great. I will set it up the same way on m y PC with Xubuntu. Cheers Rainer Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Faculty of Science Natural Sciences Building Private Bag X1 University of Stellenbosch Matieland 7602 South Africa __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Advice on graphics to design circle with density-shaded sectors
Hello, I am looking for some general advice on which graphics package to use to make a figure demonstrating my experimental design. I want to design a circle with 7 sectors inside. Then I will want to shade the sectors depending on densities of observations in the sectors. I will also want to draw horizontal lines at increments along the sectors to demonstrate different distances out to the end of the sector. Given this sparse description, does anyone have advice on what package or functions to use in R? Thanks for your help, John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to get input-data of ROCR
Hi I have a problem while using the ROCR package in R. I can understand the main three commands, but can't understand the input format, including ROCR.hiv,ROCR.simple and ROCR.xval (actually,not only the format,but also how to get this data) ## vectors(scores:numeric; labels:0 or 1) multiple runs (cross-validation,bootstrapping...) What is the scores? I use the randomForest in windows XP, but can't obtain such data. Would you like to give me some details about the data ? It cannot be much better if you can show me some examples about it. version: R 2.8.0, ROCR 1.0-2, randomForest 4.5-28 Best Wishes Jiamin Shaw 2009.2.28 2009-02-27 bioshaw [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] add absolute value to bars in barplot
Hello, r-h...@r-project.orgbarplot(twcons.area, beside=T, col=c(green4, blue, red3, gray), xlab=estate, ylab=number of persons, ylim=c(0, 110), legend.text=c(treated, mix, untreated, NA)) produces a barplot very fine. In addition, I'd like to get the bars' absolute values on the top of the bars. How can I produce this in an easy way? Thanks Sören __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help with correct use of function lsfit
To the purpose of fitting a 2nd order polynomial (a + b*x + c*x^2) to the chunk of signal falling in a 17 consecutive samples window I wrote the following very crude script. Since I have no previous experience of using Least Square Fit with R I would appreciate your supervision and suggestion. I guess the returned coefficients of the oolynomial are: a = -1.3191398 b = 0.1233055 c = 0.9297401 Thank you very much in advance, Maura ## ## Main tms - t(read.table(signal877cycle1.txt)) J - ilogb(length(tms), base=2) + 1 y - c(tms,rep(0,2^J - length(tms))) y.win - tms.ext[1:17] ls.mat - matrix(nrow=length(y.win),ncol=3,byrow=TRUE) dt - 0.033 ls.mat[,1] - 1 ls.mat[,2] - seq(0,dt*(length(y.win)-1),dt) ls.mat[,3] - ls.mat[,2]^2 # tms - t(read.table(signal877cycle1.txt)) J - ilogb(length(tms), base=2) + 1 y - c(tms,rep(0,2^J - length(tms))) y.win - tms.ext[1:17] ls.mat - matrix(nrow=length(y.win),ncol=3,byrow=TRUE) dt - 0.033 ls.mat[,1] - 1 ls.mat[,2] - seq(0,dt*(length(y.win)-1),dt) ls.mat[,3] - ls.mat[,2]^2 y [1] -1.29882462 -1.29816465 -1.29175902 -1.33508315 -1.31905086 -1.30246447 -1.25496640 -1.25858566 -1.19862868 [10] -1.16985809 -1.15755035 -1.15627040 -1.10929231 -1.09324296 -1.07202676 -1.03543530 -1.00609649 -0.96931799 [19] -0.96014189 -0.93879923 -0.89472101 -0.86568807 -0.86394226 -0.83804684 -0.79226517 -0.74804696 -0.69506558 [28] -0.63984135 -0.57677266 -0.52376371 -0.48793752 -0.44261935 -0.37505621 -0.30538492 -0.19309771 -0.07859412 [37] -0.01879655 0.04247391 0.09565881 0.17329566 0.29132263 0.38380712 0.45016443 0.50107765 0.57413940 [46] 0.68835476 0.78369090 0.83756871 0.87753415 0.92834503 0.99560230 1.08055356 1.17121517 1.22967280 [55] 1.25791166 1.28749046 1.31672692 1.33188866 1.35420775 1.37356226 1.38792638 1.40398573 1.41558702 [64] 1.39204622 1.39848595 1.39902593 1.40604565 1.42092504 1.41436531 1.3843 1.36012986 1.32950875 [73] 1.26507137 1.25315597 1.18249472 1.08857029 0.98782261 0.90470599 0.83081192 0.77709116 0.65228917 [82] 0.51844166 0.44530462 0.39562664 0.30153281 0.17979539 0.09895985 0.04306094 -0.03937571 -0.14150334 [91] -0.25936679 -0.31480454 -0.38806157 -0.47389691 -0.50785671 -0.58179371 -0.67538285 -0.74246719 -0.78380551 [100] -0.83894328 -0.86450224 -0.90614055 -0.93751928 -0.99679687 -1.03205956 -1.06616465 -1.06651404 -1.14997066 [109] -1.18338930 -1.21335809 -1.20208854 -1.22370767 -1.23488486 -1.25112655 -1.26942581 -1.26792234 -1.28838504 [118] -1.28799329 -1.27326566 -1.28502518 0. 0. 0. 0. 0. 0. [127] 0. 0. y.win [1] -1.298825 -1.298165 -1.291759 -1.335083 -1.319051 -1.302464 -1.254966 -1.258586 -1.198629 -1.169858 -1.157550 [12] -1.156270 -1.109292 -1.093243 -1.072027 -1.035435 -1.006096 ls.mat [,1] [,2] [,3] [1,]1 0.000 0.00 [2,]1 0.033 0.001089 [3,]1 0.066 0.004356 [4,]1 0.099 0.009801 [5,]1 0.132 0.017424 [6,]1 0.165 0.027225 [7,]1 0.198 0.039204 [8,]1 0.231 0.053361 [9,]1 0.264 0.069696 [10,]1 0.297 0.088209 [11,]1 0.330 0.108900 [12,]1 0.363 0.131769 [13,]1 0.396 0.156816 [14,]1 0.429 0.184041 [15,]1 0.462 0.213444 [16,]1 0.495 0.245025 [17,]1 0.528 0.278784 lsfit(x, y, wt = NULL, intercept = TRUE, tolerance = 1e-07, +yname = NULL lsfit(ls.mat, y.win,wt = NULL, intercept = TRUE, tolerance = 1e-07,yname = NULL) $coefficients Intercept X1 X2 X3 -1.3191398 0.1233055 0.9297401 0.000 $residuals [1] 0.020315146 0.015893550 0.015192628 -0.037263015 -0.032387216 -0.028982296 0.003309337 -0.017541342 [9] 0.023159250 0.030648485 0.019649885 -0.004401476 0.015220334 0.001888425 -0.008301609 -0.005141358 [17] -0.011258729 $intercept [1] TRUE $qr $qt [1] 4.937370523 0.409411205 -0.089144866 -0.041892736 -0.035696706 -0.031176843 0.002024443 -0.018121872 [9] 0.023077794 0.030860815 0.019950712 -0.004217443 0.015082286 0.001223006 -0.009699688 -0.007477386 [17] -0.014737995 $qr Intercept X2 X3X1 [1,] -4.1231056 -1.08849989 -0.39512546 -4.123106e+00 [2,] 0.2425356 0.66656733 0.35194755 1.558035e-17 [3,] 0.2425356 0.21973588 -0.09588149 1.787189e-17 [4,] 0.2425356 0.17022850 -0.10350966 -2.990539e-17 [5,] 0.2425356 0.12072112 -0.19811319 2.906411e-01 [6,] 0.2425356 0.07121375 -0.27000118 2.654896e-01 [7,] 0.2425356 0.02170637 -0.31917362 2.457966e-01 [8,] 0.2425356 -0.02780101 -0.34563052 2.315620e-01 [9,] 0.2425356 -0.07730838 -0.34937188 2.227859e-01 [10,] 0.2425356 -0.12681576 -0.33039769 2.194681e-01 [11,] 0.2425356 -0.17632314 -0.28870796 2.216089e-01 [12,] 0.2425356 -0.22583052 -0.22430269 2.292080e-01 [13,] 0.2425356 -0.27533789
Re: [R] Installing different versions of R simultaneously on Linux
G'day Rainer, On Fri, 27 Feb 2009 14:06:20 +0200 Rainer M Krug r.m.k...@gmail.com wrote: Thanks a lot for the offer - that would be great. I will set it up the same way on m y PC with Xubuntu. Script is attached. Ignore the comments at the beginning they are there just to remind me what ./configure line I usually use, possible variations, and whether to edit config.site or work with environment variables. After the make install step, I edit in this file the variable VERSION and PRIORITY and then ran the script as root. Note that VERSION should be the same number as the one specified in the ./configure line. As long as the the configuration of a command is set to 'auto', the alternative with the highest priority is used. So make sure that the newest version of R has highest priority, I usually set priority just to xyz for R-x.y.z (and keep my fingers crossed that there will never be a release with either y or z larger than 9, otherwise I will have to refine my scheme). To use this on a new machine, you have to create /opt/info, /opt/man/man1 and /opt/bin before running the script the first time (IIRC). It also helps to copy /opt/R/R-$VERSION/share/info/dir to /opt/info/dir so that emacs will include the info files in the list that you get with C-h i (this has to be done only once, the dir file does not seem to change between R versions). Prior to 2.5.0 the man and info files were installed in R-$VERSION/man and R-$VERSION/info instead of R-$VERSION/share/man and R-$VERSION/share/info, respectively. I have a separate script for those versions (but don't install such old versions anymore). How far do you want to go back? Also, much earlier, if memory serves correctly, R-exts.info came in 2 parts instead of 3; but I don't seem to have my script from that time anymore. I think that's all. Let me know if you run into troubles or need more help. Cheers, Berwin #!/bin/bash ##Configure with the following options: ## ## ./configure --prefix=/opt/R/R-2.8.1 --with-blas --with-lapack --enable-R-shlib r_arch=32 ## ## other possible options: ## r_arch=32 and r_arch=64 ## --enable-R-shlib ## ## export JAVA_HOME=/where/is/sun/java (/usr/lib/jvm/java-1.6-sun) ## above not necessary, use config.site instead. ## ##Then as root: ## VERSION=devel ## PRIORITY=100 VERSION=2.8.1 PRIORITY=281 update-alternatives --install /opt/bin/R R /opt/R/R-$VERSION/bin/R $PRIORITY \ --slave /opt/man/man1/R.1 R.1 /opt/R/R-$VERSION/share/man/man1/R.1 \ --slave /opt/info/R-FAQ.info.gz R-FAQ.info /opt/R/R-$VERSION/share/info/R-FAQ.info.gz \ --slave /opt/info/R-admin.info.gz R-admin.info /opt/R/R-$VERSION/share/info/R-admin.info.gz \ --slave /opt/info/R-data.info.gz R-data.info /opt/R/R-$VERSION/share/info/R-data.info.gz \ --slave /opt/info/R-exts.info.gz R-exts.info /opt/R/R-$VERSION/share/info/R-exts.info.gz \ --slave /opt/info/R-exts.info-1.gz R-exts.info-1 /opt/R/R-$VERSION/share/info/R-exts.info-1.gz \ --slave /opt/info/R-exts.info-2.gz R-exts.info-2 /opt/R/R-$VERSION/share/info/R-exts.info-2.gz \ --slave /opt/info/R-intro.info.gz R-intro.info /opt/R/R-$VERSION/share/info/R-intro.info.gz \ --slave /opt/info/R-lang.info.gz R-lang.info /opt/R/R-$VERSION/share/info/R-lang.info.gz \ --slave /opt/info/R-ints.info.gz R-ints.info /opt/R/R-$VERSION/share/info/R-ints.info.gz ln -sf /opt/R/R-$VERSION/bin/R /opt/bin/R-$VERSION __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sweave doesn't do csv.get()
Hi Everybody I use R2.8.0 on Mac OS X. I set up LyX 1.6.1 to use Sweave today. I can compile the test file I found on CRAN ( http://cran.r-project.org/contrib/extra/lyx/) without a problem and the output looks very nice. In the test file the following R code is used. myFirstChunkInLyX= xObs - 100; xMean - 10; xVar - 9 x - rnorm(n=xObs, mean=xMean, sd=sqrt(xVar)) mean(x) @ that should be the same as: xObs - 100 xMean - 10 xVar - 9 x - rnorm(n=xObs, mean=xMean, sd=sqrt(xVar)) mean(x) in the R console. My problem is that I want to import data to use in my report. In the R source I currently use to analyse my data I import it through csv.get(). I have found that I cannot use csv.get() or write.csv() or that matter. I don't seem to be able to use load() to get a .rda file in either Is this issue related to LyX, LaTeX or R? Thanks in advance Christiaan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] add absolute value to bars in barplot
On Fri, Feb 27, 2009 at 01:32:45PM +0100, soeren.vo...@eawag.ch wrote: barplot(twcons.area, beside=T, col=c(green4, blue, red3, gray), xlab=estate, ylab=number of persons, ylim=c(0, 110), legend.text=c(treated, mix, untreated, NA)) produces a barplot very fine. In addition, I'd like to get the bars' absolute values on the top of the bars. How can I produce this in an easy way? barplot() returns a vector of midpoints so you can use text() to add the annotation. There is an example in the manual page of barplot: mp - barplot(VADeaths) tot - colMeans(VADeaths) text(mp, tot + 3, format(tot), xpd = TRUE, col = blue) cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://mips.gsf.de/staff/pagel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Axis-question
solved by grouping... (see my next mail) Antje schrieb: Hi there, I was wondering wether it's possible to generate an axis with groups (like in Excel). So that you can have something like this as x-axis (for example for the levelplot-method of the lattice package): --- | X1 | X2 | X3 | X1 | X2 | X3 | X1 | ... | group1 | group2 | group3 ... .. .. .. I hope you understand what I'm looking for? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inefficiency of SAS Programming
Wensui Liu wrote: Thanks for pointing me to the SAS code, Dr Harrell After reading codes, I have to say that the inefficiency is not related to SAS language itself but the SAS programmer. An experienced SAS programmer won't use much of hard-coding, very adhoc and difficult to maintain. I agree with you that in the SAS code, it is a little too much to evaluate predictions. such complex data step actually can be replaced by simpler iml code. Agreed that the SAS code could have been much better. I programmed in SAS for 23 years and would have done it much differently. But you will find that the most elegant SAS program re-write will still be a far cry from the elegance of R. Frank On Thu, Feb 26, 2009 at 5:57 PM, Frank E Harrell Jr f.harr...@vanderbilt.edu wrote: If anyone wants to see a prime example of how inefficient it is to program in SAS, take a look at the SAS programs provided by the US Agency for Healthcare Research and Quality for risk adjusting and reporting for hospital outcomes at http://www.qualityindicators.ahrq.gov/software.htm . The PSSASP3.SAS program is a prime example. Look at how you do a vector product in the SAS macro language to evaluate predictions from a logistic regression model. I estimate that using R would easily cut the programming time of this set of programs by a factor of 4. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] levelplot help needed
Hi there, I'm looking for someone who can give me some hints how to make a nice levelplot. As an example, I have the following code: # create some example data # -- xl - 4 yl - 10 my.data - sapply(1:xl, FUN = function(x) { rnorm( yl, mean = x) }) x_label - rep(c(X Label 1, X Label 2, X Label 3, X Label 4), each = yl) y_label - rep(paste(Y Label , 1:yl, sep=), xl) df - data.frame(x_label = factor(x_label),y_label = factor(y_label), values = as.vector(my.data)) df1 - data.frame(df, group = rep(Group 1, xl*yl)) df2 - data.frame(df, group = rep(Group 2, xl*yl)) df3 - data.frame(df, group = rep(Group 3, xl*yl)) mdf - rbind(df1,df2,df3) # plot # -- graph - levelplot(mdf$values ~ mdf$x_label * mdf$y_label | mdf$group, aspect = xy, layout = c(3,1), scales = list(x = list(labels = substr(levels(factor(mdf$x_label)),0,5), rot = 45))) print(graph) # -- (I need to put this strange x-labels, because in my real data the values of the x-labels are too long and I just want to display the first 10 characters as label) My questions: * I'd like to start with Y Label 1 in the upper row (that's a more general issue, how can I have influence on the order of x,y, and groups?) * I'd like to put the groups at the bottom Can anybody give me some help? Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inefficiency of SAS Programming
Ajay ohri wrote: Sometimes for the sake of simplicity, SAS coding is created like that. One can use the concatenate function and drag and drop in an simple excel sheet for creating elaborate SAS code like the one mentioned and without any time at all. A system that requires Excel for its success is not a complete system. There are multiple ways to do this in SAS , much better and similarly in R There are many areas that SAS programmers would find R a bit not so useful ---example the equivalence of proc logistic for creating a logistic model. Really? Try this in SAS: library(Design) f - lrm(death ~ rcs(age,5)*sex) anova(f) # get test of nonlinearity of interactions among other things nomogram(f) # depict model graphically The restricted cubic spline in age, i.e., assuming the age relationship is smooth but not much else, is very easy to code in R. There are many other automatic transformations available. The lack of generality of the SAS language makes many SAS users assume linearity for more often than R users do. Also note that PROC LOGISTIC, without invocation of a special option, would make the user believe that older subjects have lower chances of dying, as SAS by default takes the even being predicted to be death=0. Frank On Fri, Feb 27, 2009 at 10:21 AM, Wensui Liu liuwen...@gmail.com mailto:liuwen...@gmail.com wrote: Thanks for pointing me to the SAS code, Dr Harrell After reading codes, I have to say that the inefficiency is not related to SAS language itself but the SAS programmer. An experienced SAS programmer won't use much of hard-coding, very adhoc and difficult to maintain. I agree with you that in the SAS code, it is a little too much to evaluate predictions. such complex data step actually can be replaced by simpler iml code. On Thu, Feb 26, 2009 at 5:57 PM, Frank E Harrell Jr f.harr...@vanderbilt.edu mailto:f.harr...@vanderbilt.edu wrote: If anyone wants to see a prime example of how inefficient it is to program in SAS, take a look at the SAS programs provided by the US Agency for Healthcare Research and Quality for risk adjusting and reporting for hospital outcomes at http://www.qualityindicators.ahrq.gov/software.htm . The PSSASP3.SAS program is a prime example. Look at how you do a vector product in the SAS macro language to evaluate predictions from a logistic regression model. I estimate that using R would easily cut the programming time of this set of programs by a factor of 4. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- === WenSui Liu Acquisition Risk, Chase Blog : statcompute.spaces.live.com http://statcompute.spaces.live.com I can calculate the motion of heavenly bodies, but not the madness of people.” -- Isaac Newton === __ R-help@r-project.org mailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inefficiency of SAS Programming
Gerard M. Keogh wrote: Frank, I can't see the code you mention - Web marshall at work - but I don't think you should be too quick to run down SAS - it's a powerful and flexible language but unfortunately very expensive. Your example mentions doing a vector product in the macro language - this only suggest to me that those people writing the code need a crash course in SAS/IML (the matrix language). SAS is designed to work on records and so is inapproprorriate for matrices - macros are only an efficient code copying device. Doing matrix computations in this way is pretty mad and the code would be impossible never mind the memory problems. SAS recognise that but a lot of SAS users remain familiar with IML. In IML by contrast there are inner, cross and outer products and a raft of other useful methods for matrix work that R users would be familiar with. OLS for example is one line: b = solve(X`X, X`y) ; rss = sqrt(ssq(y - Xb)) ; And to give you a flavour of IML's capabilities I implemented a SAS version of the MARS program in it about 6 or 7 years ago. BTW SPSS also has a matrix language. Gerard But try this: PROC IML; ... some custom user code ... ... loop over j=1 to 10 ... ... PROC GENMOD, output results back to IML ... IML is only a partial solution since it is not integrated with the PROC step. Frank Frank E Harrell Jr f.harr...@vander To bilt.edu R list r-h...@stat.math.ethz.ch Sent by: cc r-help-boun...@r- project.org Subject [R] Inefficiency of SAS Programming 26/02/2009 22:57 If anyone wants to see a prime example of how inefficient it is to program in SAS, take a look at the SAS programs provided by the US Agency for Healthcare Research and Quality for risk adjusting and reporting for hospital outcomes at http://www.qualityindicators.ahrq.gov/software.htm . The PSSASP3.SAS program is a prime example. Look at how you do a vector product in the SAS macro language to evaluate predictions from a logistic regression model. I estimate that using R would easily cut the programming time of this set of programs by a factor of 4. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ** The information transmitted is intended only for the p...{{dropped:15}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inefficiency of SAS Programming
Ajay ohri wrote: I would like to know if we can create a package in which r functions are renamed closer to sas language.doing so will help people familiar to SAS to straight away take to R for their work,thus decreasing the threshold for acceptance - and then get into deeper understanding later. since it is a package it would be optional only for people wanting to try out R from SAS.. Do we have such a package right now..it basically masks R functions to the equivalent function in another language just for user ease /beginners for example creating function for means procmeans-function(x,y) + { summary ( subset(x,select=c(x,y)) + ) creating function for importing csv procimport -function(x,y) + { read.csv( textConnection(x),row.names=y,na.strings= + ) creating function fo describing data procunivariate-function(x) + { summary(x) + ) regards, ajay Ajay, This will generate major confusion among users of all types and be hard to maintain. A better approach is to get Bob Muenchen's excellent book and keep it nearby. Frank www.decisionstats.com http://www.decisionstats.com On Fri, Feb 27, 2009 at 4:27 AM, Frank E Harrell Jr f.harr...@vanderbilt.edu mailto:f.harr...@vanderbilt.edu wrote: If anyone wants to see a prime example of how inefficient it is to program in SAS, take a look at the SAS programs provided by the US Agency for Healthcare Research and Quality for risk adjusting and reporting for hospital outcomes at http://www.qualityindicators.ahrq.gov/software.htm . The PSSASP3.SAS program is a prime example. Look at how you do a vector product in the SAS macro language to evaluate predictions from a logistic regression model. I estimate that using R would easily cut the programming time of this set of programs by a factor of 4. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ordinal Mantel-Haenszel type inference
Hello, I am searching for an R-Package that does an exentsion of the Mantel-Haenszel test for ordinal data as described in Liu and Agresti (1996) A Mantel-Haenszel type inference for cummulative odds ratios. in Biometrics. I see packages such as Epi that perform it for binary data and derives a varaince for it using the Robbins and Breslow variance method. As well as another pacakge that derives it for nominal variables but does not provide a variance or confidence limit. Does a package exist that does this? I have searched the list archives and can't seem to see such a package but I could be missing something. thank you. yours sincerely, Jourdan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how can I compare two vector by a factor
Hi, I used Wilcox.test to carry out mann whiteney test when paired=false. However, I want to see the comparison of two variables, e.g. pre and post, grouped by treatment. Anyone has this experience? Thanks! Xin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Singularity in a regression?
If collinearity exists, one of the solutions is regulazation version of regression. There are different types of regularization method. like Ridge, LASSO, elastic net etc. For example, in MASS package you can get ridge regression. Alex On Thu, Feb 26, 2009 at 1:58 PM, Bob Gotwals gotw...@ncssm.edu wrote: R friends, In a matrix of 1s and 0s, I'm getting a singularity error. Any helpful ideas? lm(formula = activity ~ metaF + metaCl + metaBr + metaI + metaMe + paraF + paraCl + paraBr + paraI + paraMe) Residuals: Min 1Q Median 3QMax -4.573e-01 -7.884e-02 3.469e-17 6.616e-02 2.427e-01 Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(|t|) (Intercept) 7.9173 0.1129 70.135 2e-16 *** metaF-0.3973 0.2339 -1.698 0.115172 metaClNA NA NA NA metaBr0.3454 0.1149 3.007 0.010929 * metaI 0.4827 0.2339 2.063 0.061404 . metaMe0.3654 0.1149 3.181 0.007909 ** paraF 0.7675 0.1449 5.298 0.000189 *** paraCl0.3400 0.1449 2.347 0.036925 * paraBr1.0200 0.1449 7.040 1.36e-05 *** paraI 1.3327 0.2339 5.697 9.96e-05 *** paraMe1.2191 0.1573 7.751 5.19e-06 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 0.2049 on 12 degrees of freedom Multiple R-squared: 0.9257, Adjusted R-squared: 0.8699 F-statistic: 16.61 on 9 and 12 DF, p-value: 1.811e-05 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about 3-d plot
Hi Deepankar The code on the following page looks kind of cool, and also seems to produce something of the type of graph you are after perhaps: https://r-forge.r-project.org/plugins/scmsvn/viewcvs.php/pkg/rgl/demo/regression.r?rev=702root=rglsortby=dateview=auto [below is a copy of the code...] library(rgl) # demo: regression # author: Daniel Adler # $Id$ rgl.demo.regression - function(n=100,xa=3,za=8,xb=0.02, zb=0.01,xlim=c(0,100),zlim=c(0,100)) { rgl.clear(all) rgl.bg(sphere = TRUE, color = c(black, green), lit = FALSE, size=2, alpha=0.2, back = lines) rgl.light() rgl.bbox() x - runif(n,min=xlim[1],max=xlim[2]) z - runif(n,min=zlim[1],max=zlim[2]) ex - rnorm(n,sd=3) ez - rnorm(n,sd=2) esty - (xa+xb*x) * (za+zb*z) + ex + ez rgl.spheres(x,esty,z,color=gray,radius=1,specular=green, texture=system.file(textures/ bump_dust.png,package=rgl), texmipmap=T, texminfilter=linear.mipmap.linear) regx - seq(xlim[1],xlim[2],len=100) regz - seq(zlim[1],zlim[2],len=100) regy - (xa+regx*xb) %*% t(za+regz*zb) rgl.surface(regx,regz,regy,color=blue,alpha=0.5,shininess=128) lx - c(xlim[1],xlim[2],xlim[2],xlim[1]) lz - c(zlim[1],zlim[1],zlim[2],zlim[2]) f - function(x,z) { return ( (xa+x*xb) * t(za+z*zb) ) } ly - f(lx,lz) rgl.quads (lx,ly,lz,color=red,size=5,front=lines,back=lines,lit=F) } rgl.open() rgl.demo.regression() On Feb 27, 5:28 am, Dipankar Basu basu...@gmail.com wrote: Hi R Users, I have produced a simulated scatter plot of y versus x tightly clustered around the 45 degree line through the origin with the following code: x - seq(1,100) y - x+rnorm(100,0,10) plot(x,y,col=blue) abline(0,1) Is there some way to generate a 3-dimensional analogue of this? Can I get a similar simulated scatter plot of points in 3 dimensions where the points are clustered around a plane through the origin where the plane in question is the 3-dimensional analogue of the 45 degree line through the origin? Deepankar [[alternative HTML version deleted]] __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave doesn't do csv.get()
christiaan pauw wrote: Hi Everybody I use R2.8.0 on Mac OS X. I set up LyX 1.6.1 to use Sweave today. I can compile the test file I found on CRAN ( http://cran.r-project.org/contrib/extra/lyx/) without a problem and the output looks very nice. In the test file the following R code is used. myFirstChunkInLyX= xObs - 100; xMean - 10; xVar - 9 x - rnorm(n=xObs, mean=xMean, sd=sqrt(xVar)) mean(x) @ that should be the same as: xObs - 100 xMean - 10 xVar - 9 x - rnorm(n=xObs, mean=xMean, sd=sqrt(xVar)) mean(x) in the R console. My problem is that I want to import data to use in my report. In the R source I currently use to analyse my data I import it through csv.get(). I have found that I cannot use csv.get() or write.csv() or that matter. I don't seem to be able to use load() to get a .rda file in either Is this issue related to LyX, LaTeX or R? Thanks in advance Christiaan I didn't see the library(Hmisc) statement in your code that would give you access to csv.get. This should be unrelated to lyx, Sweave, etc. Frank [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inefficiency of SAS Programming
Yes Frank, I accept your point but nevertheless IML is the proper place for matrix work in SAS - mixing macro-level logic and computation is another question - R is certainly more seemless in this respect. Gerard Frank E Harrell Jr f.harr...@vander To bilt.edu Gerard M. Keogh gmke...@justice.ie 27/02/2009 13:55 cc R list r-h...@stat.math.ethz.ch, r-help-boun...@r-project.org Subject Re: [R] Inefficiency of SAS Programming Gerard M. Keogh wrote: Frank, I can't see the code you mention - Web marshall at work - but I don't think you should be too quick to run down SAS - it's a powerful and flexible language but unfortunately very expensive. Your example mentions doing a vector product in the macro language - this only suggest to me that those people writing the code need a crash course in SAS/IML (the matrix language). SAS is designed to work on records and so is inapproprorriate for matrices - macros are only an efficient code copying device. Doing matrix computations in this way is pretty mad and the code would be impossible never mind the memory problems. SAS recognise that but a lot of SAS users remain familiar with IML. In IML by contrast there are inner, cross and outer products and a raft of other useful methods for matrix work that R users would be familiar with. OLS for example is one line: b = solve(X`X, X`y) ; rss = sqrt(ssq(y - Xb)) ; And to give you a flavour of IML's capabilities I implemented a SAS version of the MARS program in it about 6 or 7 years ago. BTW SPSS also has a matrix language. Gerard But try this: PROC IML; ... some custom user code ... ... loop over j=1 to 10 ... ... PROC GENMOD, output results back to IML ... IML is only a partial solution since it is not integrated with the PROC step. Frank Frank E Harrell Jr f.harr...@vander To bilt.edu R list r-h...@stat.math.ethz.ch Sent by: cc r-help-boun...@r- project.org Subject [R] Inefficiency of SAS Programming 26/02/2009 22:57 If anyone wants to see a prime example of how inefficient it is to program in SAS, take a look at the SAS programs provided by the US Agency for Healthcare Research and Quality for risk adjusting and reporting for hospital outcomes at http://www.qualityindicators.ahrq.gov/software.htm . The PSSASP3.SAS program is a prime example. Look at how you do a vector product in the SAS macro language to evaluate predictions from a logistic regression model. I estimate that using R would easily cut the programming time of this set of programs by a factor of 4. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ** The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer. It is the policy of the Department of Justice, Equality and Law Reform and the Agencies and Offices using its IT services to disallow the sending of
[R] Will ctv package work on ubuntu?
Hi ho: I had used the ctv package on a Windows setup of R and I was wondering about Ubuntu. Certainly under Windows it has an easy time of it because there is only one library folder to scan for existing packages. Would its install.views and update.views functions work in Ubuntu where the packages are split up between the library established by R-cran downloads from synaptic and the default library used by 'conventional' downloads using install.packages? If it can't handle that distinction between a Windows and a Linux situation, is it a package I should remove for now? Regards... -- Brian Lunergan Nepean, Ontario Canada __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inefficiency of SAS Programming
on 02/27/2009 07:57 AM Frank E Harrell Jr wrote: Ajay ohri wrote: I would like to know if we can create a package in which r functions are renamed closer to sas language.doing so will help people familiar to SAS to straight away take to R for their work,thus decreasing the threshold for acceptance - and then get into deeper understanding later. since it is a package it would be optional only for people wanting to try out R from SAS.. Do we have such a package right now..it basically masks R functions to the equivalent function in another language just for user ease /beginners for example creating function for means procmeans-function(x,y) + { summary ( subset(x,select=c(x,y)) + ) creating function for importing csv procimport -function(x,y) + { read.csv( textConnection(x),row.names=y,na.strings= + ) creating function fo describing data procunivariate-function(x) + { summary(x) + ) regards, ajay Ajay, This will generate major confusion among users of all types and be hard to maintain. A better approach is to get Bob Muenchen's excellent book and keep it nearby. Frank I whole heartedly agree with Frank here. It may be one thing to have a translation process in place based upon some form of logical mapping between the two languages (as Bob's book provides). But is another thing entirely to actually start writing functions that provide wrappers modeled on SAS based PROCs. If you do this, then you only serve to obfuscate the fundamental philosophical and functional differences between the two languages and doom a new useR to missing all of R's benefits. They will continue to try to figure out how to use R based upon their SAS intuition rather than developing a new set of coding and even statistical paradigms. Having been through the SAS to S/R transition myself, having used SAS for much of the 90's and now having used R for over 7 years, I can speak from personal experience and state that the only way to achieve the requisite proficiency with R is immersion therapy. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Download daily weather data
Geonames unfortunately doesn't have weather forecasts. This is a problem. GRIB looks better. There is an interface between GRIB and R. On Fri, Feb 27, 2009 at 4:14 AM, Pfaff, Bernhard Dr. bernhard_pf...@fra.invesco.com wrote: Dear Thomas, more for the sake of completeness and as an alternative to R. There are GRIB data [1] sets available (some for free) and there is the GPL software Grads [2]. Because the Grib-Format is well documented it should be possible to get it into R easily and make up your own plots/weather analyis. I do not know and have not checked if somebody has already done so. I use this information/tools aside of others during longer-dated off-shore sailing. Best, Bernhard [1] http://www.grib.us/ [2] http://www.iges.org/grads/ -Ursprüngliche Nachricht- Von: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Im Auftrag von Scillieri, John Gesendet: Donnerstag, 26. Februar 2009 22:58 An: 'James Muller'; 'r-help@r-project.org' Betreff: Re: [R] Download daily weather data Looks like you can sign up to get XML feed data from Weather.com http://www.weather.com/services/xmloap.html Hope it works out! -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of James Muller Sent: Thursday, February 26, 2009 3:57 PM To: r-help@r-project.org Subject: Re: [R] Download daily weather data Thomas, Have a look at the source code for the webpage (ctrl-u in firefox, don't know in internet explorer, etc.). That is what you'd have to parse in order to get the forecast from this page. Typically when I parse webpages such as this I use regular expressions to do so (and I would never downplay the usefulness of regular expressions, but they take a little getting used to). There are two parts to the task: find patterns that allow you to pull out the datum/data you're after; and then write a program to pull it/them out. Also, of course, download the webpage (but that's no issue). I bet you'd be able to find a comma separated value (CSV) file containing the weather report somewhere, which would probably involve a little less labor in order to produce your automatic wardrobe advice. James On Thu, Feb 26, 2009 at 3:47 PM, Thomas Levine thomas.lev...@gmail.com wrote: I'm writing a program that will tell me whether I should wear a coat, so I'd like to be able to download daily weather forecasts and daily reports of recent past weather conditions. The NOAA has very promising tabular forecasts (http://forecast.weather.gov/MapClick.php?CityName=Ithacastate =NYsite=BGMtextField1=42.4422textField2=-76.5002e=0FcstType=digital), but I can't figure out how to import them. Someone must have needed to do this before. Suggestions? Thomas Levine! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This e-mail and any attachments are confidential, may contain legal, professional or other privileged information, and are intended solely for the addressee. If you are not the intended recipient, do not use the information in this e-mail in any way, delete this e-mail and notify the sender. CEG-IP1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. * Confidentiality Note: The information contained in this ...{{dropped:10}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave doesn't do csv.get()
It works now. Your help is much appreciated Christiaan 2009/2/27 Frank E Harrell Jr f.harr...@vanderbilt.edu christiaan pauw wrote: Hi Everybody I use R2.8.0 on Mac OS X. I set up LyX 1.6.1 to use Sweave today. I can compile the test file I found on CRAN ( http://cran.r-project.org/contrib/extra/lyx/) without a problem and the output looks very nice. In the test file the following R code is used. myFirstChunkInLyX= xObs - 100; xMean - 10; xVar - 9 x - rnorm(n=xObs, mean=xMean, sd=sqrt(xVar)) mean(x) @ that should be the same as: xObs - 100 xMean - 10 xVar - 9 x - rnorm(n=xObs, mean=xMean, sd=sqrt(xVar)) mean(x) in the R console. My problem is that I want to import data to use in my report. In the R source I currently use to analyse my data I import it through csv.get(). I have found that I cannot use csv.get() or write.csv() or that matter. I don't seem to be able to use load() to get a .rda file in either Is this issue related to LyX, LaTeX or R? Thanks in advance Christiaan I didn't see the library(Hmisc) statement in your code that would give you access to csv.get. This should be unrelated to lyx, Sweave, etc. Frank [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Download daily weather data
Can I just say, it's great to see the R community really come out in support of such a noble and worthy cause as this :). Downfall of civilization, all that. Not here, no! James On Thu, Feb 26, 2009 at 3:47 PM, Thomas Levine thomas.lev...@gmail.com wrote: I'm writing a program that will tell me whether I should wear a coat, so I'd like to be able to download daily weather forecasts and daily reports of recent past weather conditions. The NOAA has very promising tabular forecasts (http://forecast.weather.gov/MapClick.php?CityName=Ithacastate=NYsite=BGMtextField1=42.4422textField2=-76.5002e=0FcstType=digital), but I can't figure out how to import them. Someone must have needed to do this before. Suggestions? Thomas Levine! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inefficiency of SAS Programming
I've actually used AHRQ's software to create Inpatient Quality Indicator reports. I can confirm pretty much what we already know; it is inefficient. Running on about 1.8 - 2 million cases, it would take just about a whole day to run the entire process from start to finish. That isn't all processing time and includes some time for the analyst to check results between substeps, but I still knew that my day was full when I was working on IQI reports. To be fair though, there are a lot of other factors (beside efficiency considerations) that go into AHRQ's program design. First, there are a lot of changes to that software every year. In some cases it is easier and less error prone to hardcode a few points in the data so that it is blatantly obvious what to change next year should another analyst need to do so. Second, the organizations that use this software often require transparency and may not have high level programmers on staff. Writing code so that it is accessible, editable, and interpretable by intermediate level programmers or analysts is a plus. Third, given that IQI reports are often produced on a yearly basis, there's no real need to sacrifice clarity, etc. for efficiency - you're only doing this process once a year. There are other points that could be made, but the main idea is I don't think it's fair to hold this software up, out of context, as an example of SAS's (or even AHRQs) inefficiencies. I agree that SAS syntax is nowhere near as elegant or as powerful as R from a programming standpoint, that's why after 7 years of using SAS I switched to R. But comparing the two at that level is like a racing a Ferrari and a Bentley to see which is the better car. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ftp fetch using RCurl?
I am using RCurl, version 0.9-4, under windows. I can not find the function getURLContent(). Is it being renamed ? or is it in a different version? Also, in the reference manual on CRAN R under package RCurl, I found a function getBinaryURL() documented but can not be found in the package as well. I would use something like content = getURLContent(ftp://./foo.zip;) attributes(content) = NULL writeBin(content, /tmp/foo.zip) and that should be sufficient. (You have to strip the attributes or writeBin() complains.) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/ftp-fetch-using-RCurl--tp8067p22247131.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cross tabulation: convert frequencies to percentages
Hello, might be rather easy for R pros, but I've been searching to the dead end to ... twsource.area - table(twsource, area, useNA=ifany) gives me a nice cross tabulation of frequencies of two factors, but now I want to convert to pecentages of those absolute values. In addition I'd like an extra column and an extra row with absolute sums. I know, Excel or the likes will produce it more easily, but how would the procedure look like in R? Thanks, Sören __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inefficiency of SAS Programming
I had enrolled in a statistics course this semester, but after the first class, I dropped it because it uses SAS. This thread makes me quite glad. Tom! On Fri, Feb 27, 2009 at 8:48 AM, Frank E Harrell Jr f.harr...@vanderbilt.edu wrote: Wensui Liu wrote: Thanks for pointing me to the SAS code, Dr Harrell After reading codes, I have to say that the inefficiency is not related to SAS language itself but the SAS programmer. An experienced SAS programmer won't use much of hard-coding, very adhoc and difficult to maintain. I agree with you that in the SAS code, it is a little too much to evaluate predictions. such complex data step actually can be replaced by simpler iml code. Agreed that the SAS code could have been much better. I programmed in SAS for 23 years and would have done it much differently. But you will find that the most elegant SAS program re-write will still be a far cry from the elegance of R. Frank On Thu, Feb 26, 2009 at 5:57 PM, Frank E Harrell Jr f.harr...@vanderbilt.edu wrote: If anyone wants to see a prime example of how inefficient it is to program in SAS, take a look at the SAS programs provided by the US Agency for Healthcare Research and Quality for risk adjusting and reporting for hospital outcomes at http://www.qualityindicators.ahrq.gov/software.htm . The PSSASP3.SAS program is a prime example. Look at how you do a vector product in the SAS macro language to evaluate predictions from a logistic regression model. I estimate that using R would easily cut the programming time of this set of programs by a factor of 4. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inefficiency of SAS Programming
Immersion therapy can be done at a later stage after the newly baptized R corporate user is happy with the fact that he can do most of his legacy code in R easily now . I have treading water in the immersion for over a year now. Most SAS consultants and corporate users are eager to try out R ..but they are scared of immersion especially in these cut back times ...so this could be a middle step...let me go ahead and create the wrapper SAS package as a middle ware between r and sas .. and we will let the invisible hands of free market decide :)) regards, ajay www.decisionstats.com I am not a Marxist. Karl Marx http://www.brainyquote.com/quotes/quotes/k/karlmarx131048.html On Fri, Feb 27, 2009 at 8:01 PM, Marc Schwartz marc_schwa...@comcast.netwrote: on 02/27/2009 07:57 AM Frank E Harrell Jr wrote: Ajay ohri wrote: I would like to know if we can create a package in which r functions are renamed closer to sas language.doing so will help people familiar to SAS to straight away take to R for their work,thus decreasing the threshold for acceptance - and then get into deeper understanding later. since it is a package it would be optional only for people wanting to try out R from SAS.. Do we have such a package right now..it basically masks R functions to the equivalent function in another language just for user ease /beginners for example creating function for means procmeans-function(x,y) + { summary ( subset(x,select=c(x,y)) + ) creating function for importing csv procimport -function(x,y) + { read.csv( textConnection(x),row.names=y,na.strings= + ) creating function fo describing data procunivariate-function(x) + { summary(x) + ) regards, ajay Ajay, This will generate major confusion among users of all types and be hard to maintain. A better approach is to get Bob Muenchen's excellent book and keep it nearby. Frank I whole heartedly agree with Frank here. It may be one thing to have a translation process in place based upon some form of logical mapping between the two languages (as Bob's book provides). But is another thing entirely to actually start writing functions that provide wrappers modeled on SAS based PROCs. If you do this, then you only serve to obfuscate the fundamental philosophical and functional differences between the two languages and doom a new useR to missing all of R's benefits. They will continue to try to figure out how to use R based upon their SAS intuition rather than developing a new set of coding and even statistical paradigms. Having been through the SAS to S/R transition myself, having used SAS for much of the 90's and now having used R for over 7 years, I can speak from personal experience and state that the only way to achieve the requisite proficiency with R is immersion therapy. Regards, Marc Schwartz [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Making tapply code more efficient
Previously, I posed the question pasted down below to the list and received some very helpful responses. While the code suggestions provided in response indeed work, they seem to only work with *very* small data sets and so I wanted to follow up and see if anyone had ideas for better efficiency. I was quite embarrased on this as our SAS programmers cranked out programs that did this in the blink of an eye (with a few variables), but R was spinning for days on my Ubuntu machine and ultimately I saw a message that R was killed. The data I am working with has 800967 total rows and 31 total columns. The ID variable I use as the index variable in tapply() has 326397 unique cases. length(unique(qq$student_unique_id)) [1] 326397 To give a sense of what my data look like and the actual problem, consider the following: qq - data.frame(student_unique_id = factor(c(1,1,2,2,2)), teacher_unique_id = factor(c(10,10,20,20,25))) This is a student achievement database where students occupy multiple rows in the data and the variable teacher_unique_id denotes the class the student was in. What I am doing is looking to see if the teacher is the same for each instance of the unique student ID. So, if I implement the following: same - function(x) length( unique(x) ) == 1 results - data.frame( freq = tapply(qq$student_unique_id, qq$student_unique_id, length), tch = tapply(qq$teacher_unique_id, qq$student_unique_id, same) ) I get the following results. I can see that student 1 appears in the data twice and the teacher is always the same. However, student 2 appears three times and the teacher is not always the same. results freq tch 12 TRUE 23 FALSE Now, implementing this same procedure to a large data set with the characteristics described above seems to be problematic in this implementation. Does anyone have reactions on how this could be more efficient such that it can run with large data as I described? Harold sessionInfo() R version 2.8.1 (2008-12-22) x86_64-pc-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.U TF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME= C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATI ON=C attached base packages: [1] stats graphics grDevices utils datasets methods base # Original question posted on 1/13/09 Suppose I have a dataframe as follows: dat - data.frame(id = c(1,1,2,2,2), var1 = c(10,10,20,20,25), var2 = c('foo', 'foo', 'foo', 'foobar', 'foo')) Now, if I were to subset by id, such as: subset(dat, id==1) id var1 var2 1 1 10 foo 2 1 10 foo I can see that the elements in var1 are exactly the same and the elements in var2 are exactly the same. However, subset(dat, id==2) id var1 var2 3 2 20foo 4 2 20 foobar 5 2 25foo Shows the elements are not the same for either variable in this instance. So, what I am looking to create is a data frame that would be like this id freqvar1var2 1 2 TRUETRUE 2 3 FALSE FALSE Where freq is the number of times the ID is repeated in the dataframe. A TRUE appears in the cell if all elements in the column are the same for the ID and FALSE otherwise. It is insignificant which values differ for my problem. The way I am thinking about tackling this is to loop through the ID variable and compare the values in the various columns of the dataframe. The problem I am encountering is that I don't think all.equal or identical are the right functions in this case. So, say I was wanting to compare the elements of var1 for id ==1. I would have x - c(10,10) Of course, the following works all.equal(x[1], x[2]) [1] TRUE As would a similar call to identical. However, what if I only have a vector of values (or if the column consists of names) that I want to assess for equality when I am trying to automate a process over thousands of cases? As in the example above, the vector may contain only two values or it may contain many more. The number of values in the vector differ by id. Any thoughts? Harold __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Setting initial starting conditions in scripts
Hello, I'm writing a variety of R scripts and want to code the loadhistory and workspace from within the script. I found the loadhistory function but do not see a comparable function for load workspace. Is there one ? Working with R 2.8.1 (2008-12-22) on a windows platform. Thanks for any and all suggestions. Steve Steve Friedman Ph. D. Spatial Statistical Analyst Everglades and Dry Tortugas National Park 950 N Krome Ave (3rd Floor) Homestead, Florida 33034 steve_fried...@nps.gov Office (305) 224 - 4282 Fax (305) 224 - 4147 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Adjusting confidence intervals for paired t-tests of multiple endpoints
Dear R-users, In a randomized placebo-controlled within-subject design, subjects recieved a psycho-active drug and placebo. Subjects filled out a questionnaire containing 15 scales on four different time points after drug administration. In order to detect drug effects on each time point, I compared scale values between placebo and drug for all time conditions and scales, which sums up to 4*15=60 comparisons. I have summarized the results in a data.frame with columns for t test results including confidence intervals and mean-differences: df1-data.frame(trt=gl(2,35),matrix(rnorm(4200),70,60)) df2-as.data.frame(matrix(NA,60,6)) names(df2)-c('t','df','p','lower','upper','mean.diff') for (i in 1:60) {df2[i,1:6]-as.numeric( unlist(t.test(df1[,i+1]~df1$trt,paired=T))[1:6])} Now, I want to adjust the confidence intervals for multiple comparisons. For a Bonferroni-adjustment, I did the following: df2$std.error.of.diff-df2$mean.diff/df2$t ci-qt(p=1-(0.05/nrow(df2)),df=df2$df)*df2$std.error.of.diff ci.bonf-data.frame(lower=df2$mean.diff-ci,upper=df2$mean.diff+ci) I hope this is the correct method. However, I think, the Bonferroni-adjustment would be much too conservative. I need a less conservative approach, perhaps, something like Holm's method, which I can easily apply to the p-value with p.adjust(df2$p,method='holm'). Is there package, which can do this for the confidence-interval or could someone provide a simple script to calculate this? Thanks a lot! Erich [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Changing Ylab and scale in hclust plots
Hello, Running R 2.8.1 (2008-12-22) on Windows. I running a series (25) of clustering procedures using the hclust function and would like each of the plots to have the same yaxis label and scale in all of the plots. Is there a procedure to change the scale on these plots? Or is there an alternative clustering function that can give me broader control Here is my very simple code: par(mfrow=c(2,1)) NSM5172004 - read.csv(H:\\HRH-Data_Files\\FrequencyScenarios\\NMS.csv, header=TRUE, sep=,) NMS - NSM5172004[-(1)] NMS.dist - dist(NMS) plot(hclust(NMS.dist, method = ward), xlab=, labels=NMS$Year, main = Cape Sable Seaside Sparrow, sub = Hydro Scenario NMS5172004) ECB2_65_01 - read.csv(H:\\HRH-Data_Files\\FrequencyScenarios\\ECB2_65_01.csv, header=TRUE, sep=,) ECB2 - ECB2_65_01[-(1)] ECB2.dist - dist(ECB2) plot(hclust(ECB2.dist, method=ward), xlab=,labels=ECB2$Year, main=Cape Sable Seaside Sparrow, sub= Hydro Scenario ECB2_65-01) Thanks Steve Steve Friedman Ph. D. Spatial Statistical Analyst Everglades and Dry Tortugas National Park 950 N Krome Ave (3rd Floor) Homestead, Florida 33034 steve_fried...@nps.gov Office (305) 224 - 4282 Fax (305) 224 - 4147 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cross tabulation: convert frequencies to percentages
on 02/27/2009 08:43 AM soeren.vo...@eawag.ch wrote: Hello, might be rather easy for R pros, but I've been searching to the dead end to ... twsource.area - table(twsource, area, useNA=ifany) gives me a nice cross tabulation of frequencies of two factors, but now I want to convert to pecentages of those absolute values. In addition I'd like an extra column and an extra row with absolute sums. I know, Excel or the likes will produce it more easily, but how would the procedure look like in R? See ?prop.table which is referenced in the See Also section of ?table. This will give you proportions, so if you want percentages, just multiply by 100. To add row and column totals, see ?addmargins which is also in the See Also for ?table TAB - table(state.division, state.region) TAB state.region state.division Northeast South North Central West New England6 0 00 Middle Atlantic3 0 00 South Atlantic 0 8 00 East South Central 0 4 00 West South Central 0 4 00 East North Central 0 0 50 West North Central 0 0 70 Mountain 0 0 08 Pacific0 0 05 # Overall table proportions prop.table(TAB) state.region state.division Northeast South North Central West New England 0.12 0.00 0.00 0.00 Middle Atlantic 0.06 0.00 0.00 0.00 South Atlantic 0.00 0.16 0.00 0.00 East South Central 0.00 0.08 0.00 0.00 West South Central 0.00 0.08 0.00 0.00 East North Central 0.00 0.00 0.10 0.00 West North Central 0.00 0.00 0.14 0.00 Mountain0.00 0.00 0.00 0.16 Pacific 0.00 0.00 0.00 0.10 # Column proportions prop.table(TAB, 2) state.region state.division Northeast South North Central West New England0.667 0.000 0.000 0.000 Middle Atlantic0.333 0.000 0.000 0.000 South Atlantic 0.000 0.500 0.000 0.000 East South Central 0.000 0.250 0.000 0.000 West South Central 0.000 0.250 0.000 0.000 East North Central 0.000 0.000 0.417 0.000 West North Central 0.000 0.000 0.583 0.000 Mountain 0.000 0.000 0.000 0.6153846 Pacific0.000 0.000 0.000 0.3846154 addmargins(TAB) state.region state.division Northeast South North Central West Sum New England6 0 00 6 Middle Atlantic3 0 00 3 South Atlantic 0 8 00 8 East South Central 0 4 00 4 West South Central 0 4 00 4 East North Central 0 0 50 5 West North Central 0 0 70 7 Mountain 0 0 08 8 Pacific0 0 05 5 Sum91612 13 50 HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inefficiency of SAS Programming
Ajay ohri wrote: Immersion therapy can be done at a later stage after the newly baptized R corporate user is happy with the fact that he can do most of his legacy code in R easily now . I have treading water in the immersion for over a year now. Most SAS consultants and corporate users are eager to try out R ..but they are scared of immersion especially in these cut back times ...so this could be a middle step...let me go ahead and create the wrapper SAS package as a middle ware between r and sas .. and we will let the invisible hands of free market decide :)) This is futile and will make it more difficult for other R users to help you in the future. As Marc said this is really a bad idea and will backfire. Frank regards, ajay www.decisionstats.com http://www.decisionstats.com I am not a Marxist. Karl Marx http://www.brainyquote.com/quotes/quotes/k/karlmarx131048.html On Fri, Feb 27, 2009 at 8:01 PM, Marc Schwartz marc_schwa...@comcast.net mailto:marc_schwa...@comcast.net wrote: on 02/27/2009 07:57 AM Frank E Harrell Jr wrote: Ajay ohri wrote: I would like to know if we can create a package in which r functions are renamed closer to sas language.doing so will help people familiar to SAS to straight away take to R for their work,thus decreasing the threshold for acceptance - and then get into deeper understanding later. since it is a package it would be optional only for people wanting to try out R from SAS.. Do we have such a package right now..it basically masks R functions to the equivalent function in another language just for user ease /beginners for example creating function for means procmeans-function(x,y) + { summary ( subset(x,select=c(x,y)) + ) creating function for importing csv procimport -function(x,y) + { read.csv( textConnection(x),row.names=y,na.strings= + ) creating function fo describing data procunivariate-function(x) + { summary(x) + ) regards, ajay Ajay, This will generate major confusion among users of all types and be hard to maintain. A better approach is to get Bob Muenchen's excellent book and keep it nearby. Frank I whole heartedly agree with Frank here. It may be one thing to have a translation process in place based upon some form of logical mapping between the two languages (as Bob's book provides). But is another thing entirely to actually start writing functions that provide wrappers modeled on SAS based PROCs. If you do this, then you only serve to obfuscate the fundamental philosophical and functional differences between the two languages and doom a new useR to missing all of R's benefits. They will continue to try to figure out how to use R based upon their SAS intuition rather than developing a new set of coding and even statistical paradigms. Having been through the SAS to S/R transition myself, having used SAS for much of the 90's and now having used R for over 7 years, I can speak from personal experience and state that the only way to achieve the requisite proficiency with R is immersion therapy. Regards, Marc Schwartz -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Making tapply code more efficient
Hi Harold, What about this? You one have to make the crosstabulation once. qq - data.frame(student = factor(c(1,1,2,2,2)), teacher = factor(c(10,10,20,20,25))) tab - table(qq$student, qq$teacher) data.frame(Student = rownames(tab), Freq = rowSums(tab), tch = rowSums(tab 0) == 1) Student Freq tch 1 12 TRUE 2 23 FALSE HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Doran, Harold Verzonden: vrijdag 27 februari 2009 15:47 Aan: r-help@r-project.org Onderwerp: [R] Making tapply code more efficient Previously, I posed the question pasted down below to the list and received some very helpful responses. While the code suggestions provided in response indeed work, they seem to only work with *very* small data sets and so I wanted to follow up and see if anyone had ideas for better efficiency. I was quite embarrased on this as our SAS programmers cranked out programs that did this in the blink of an eye (with a few variables), but R was spinning for days on my Ubuntu machine and ultimately I saw a message that R was killed. The data I am working with has 800967 total rows and 31 total columns. The ID variable I use as the index variable in tapply() has 326397 unique cases. length(unique(qq$student_unique_id)) [1] 326397 To give a sense of what my data look like and the actual problem, consider the following: qq - data.frame(student_unique_id = factor(c(1,1,2,2,2)), teacher_unique_id = factor(c(10,10,20,20,25))) This is a student achievement database where students occupy multiple rows in the data and the variable teacher_unique_id denotes the class the student was in. What I am doing is looking to see if the teacher is the same for each instance of the unique student ID. So, if I implement the following: same - function(x) length( unique(x) ) == 1 results - data.frame( freq = tapply(qq$student_unique_id, qq$student_unique_id, length), tch = tapply(qq$teacher_unique_id, qq$student_unique_id, same) ) I get the following results. I can see that student 1 appears in the data twice and the teacher is always the same. However, student 2 appears three times and the teacher is not always the same. results freq tch 12 TRUE 23 FALSE Now, implementing this same procedure to a large data set with the characteristics described above seems to be problematic in this implementation. Does anyone have reactions on how this could be more efficient such that it can run with large data as I described? Harold sessionInfo() R version 2.8.1 (2008-12-22) x86_64-pc-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.U TF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME= C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATI ON=C attached base packages: [1] stats graphics grDevices utils datasets methods base # Original question posted on 1/13/09 Suppose I have a dataframe as follows: dat - data.frame(id = c(1,1,2,2,2), var1 = c(10,10,20,20,25), var2 = c('foo', 'foo', 'foo', 'foobar', 'foo')) Now, if I were to subset by id, such as: subset(dat, id==1) id var1 var2 1 1 10 foo 2 1 10 foo I can see that the elements in var1 are exactly the same and the elements in var2 are exactly the same. However, subset(dat, id==2) id var1 var2 3 2 20foo 4 2 20 foobar 5 2 25foo Shows the elements are not the same for either variable in this instance. So, what I am looking to create is a data frame that would be like this id freqvar1var2 1 2 TRUETRUE 2 3 FALSE FALSE Where freq is the number of times the ID is repeated in the dataframe. A TRUE appears in the cell if all elements in the column are the same for the ID and FALSE otherwise. It is insignificant which values differ for my problem. The way I am thinking about tackling this is to loop through the ID variable and compare the values in the various columns of the dataframe. The problem I am encountering is that I don't think all.equal or identical are the right functions in
Re: [R] Inefficiency of SAS Programming
Three comments I actually think you can write worse code in R than in SAS: more tools = more scope for innovatively bad ideas. The ability to write bad code should not damm a language. I found almost all of the improvements to the multi-line SAS recode to be regressions, both the SAS and the S suggestions. a. Everyone, even those of you with no SAS backround whatsoever, immediately understood the code. Most of the replacements are obscure. Compilers are very good these days and computers are fast, fewer typed characters != better. b. If I were writing the S code for such an application, it would look much the same. I worked as a programmer in medical research for several years, and one of the things that moved me on to graduate studies in statistics was the realization that doing my best work meant being as UN-clever as possible in my code. Frank's comments imply that he was reading SAS macro code at the moment of peak frustration. And if you want to criticise SAS code, this is the place to look. SAS macro started out as some simple expansions, then got added on to, then added on again, and again, and with no overall blueprint. It is much like the farmhouse of some neighbors of mine growing up: 4 different expansions in 4 eras, and no overall guiding plan. The interior layout was interesting to say the least. I was once a bona fide SAS 'wizard' (and Frank was much better than me), and I can't read the stuff without grinding my teeth. S was once headed down the same road. One of the best things ever with the language was documented in the blue book The New S Language, where Becker et al had the wisdom to scrap the macro processor. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ordinal Mantel-Haenszel type inference
I suspect that what you need will be in S-PLUS (and R) Manual to Accompany Agresti’s Categorical Data Analysis (2002) 2nd edition by Laura A. Thompson, 2007 which I have always been able to find with a Google search. Yep, it's still there: https://home.comcast.net/~lthompson221/Splusdiscrete2.pdf Its Chapter 7, Logit Models for Multinomial Responses discusses various cumulative logit models. The polr function (proportional odds logistic regression) in MASS will return the regression equivalent of what you are asking for. Thompson says the lrm in the Desing library will also do it, by which she really means that the lrm in the Design package by Harrell will do it. The link she offers is outdated and it doesn't really matter for obtaining the Hmisc/Design packages, since they are on CRAN, but online available documentation is currently at: http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/StatComp She then also mentions lcr (library ordinal), and nordr (library gnlm). Later in the chapter she illustrates the use of the vglm function in in the vgam package. -- David Winsemius On Feb 27, 2009, at 9:04 AM, Jourdan Gold wrote: Hello, I am searching for an R-Package that does an exentsion of the Mantel- Haenszel test for ordinal data as described in Liu and Agresti (1996) A Mantel-Haenszel type inference for cummulative odds ratios. in Biometrics. I see packages such as Epi that perform it for binary data and derives a varaince for it using the Robbins and Breslow variance method. As well as another pacakge that derives it for nominal variables but does not provide a variance or confidence limit. Does a package exist that does this? I have searched the list archives and can't seem to see such a package but I could be missing something. thank you. yours sincerely, Jourdan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inefficiency of SAS Programming
Terry Therneau wrote: Three comments I actually think you can write worse code in R than in SAS: more tools = more scope for innovatively bad ideas. The ability to write bad code should not damm a language. I found almost all of the improvements to the multi-line SAS recode to be regressions, both the SAS and the S suggestions. a. Everyone, even those of you with no SAS backround whatsoever, immediately understood the code. Most of the replacements are obscure. Compilers are very good these days and computers are fast, fewer typed characters != better. b. If I were writing the S code for such an application, it would look much the same. I worked as a programmer in medical research for several years, and one of the things that moved me on to graduate studies in statistics was the realization that doing my best work meant being as UN-clever as possible in my code. If I were writing S code for this it would be dramatically different. I would try to be efficient and elegant but would need to remember to be a teacher at the same time. For example this kind of recode is super efficient and quick to program but would need good comments or a handbook to all of my code: c(cat=1, dog=2, giraffe=3)[animal] But I think the code is quite intuitive once you have used that construct once. There also a lot of factoring of code that could be done as others have pointed out. Frank's comments imply that he was reading SAS macro code at the moment of peak frustration. And if you want to criticise SAS code, this is the place to look. SAS macro started out as some simple expansions, then got added on to, then added on again, and again, and with no overall blueprint. It is much like the farmhouse of some neighbors of mine growing up: 4 different expansions in 4 eras, and no overall guiding plan. The interior layout was interesting to say the least. I was once a bona fide SAS 'wizard' (and Frank was much better than me), and I can't read the stuff without grinding my teeth. S was once headed down the same road. One of the best things ever with the language was documented in the blue book The New S Language, where Becker et al had the wisdom to scrap the macro processor. Well put. I am amazed there hasn't been a revolt among SAS users decades ago. The S approach is also easier to debug one line at a time. Cheers, Frank Terry Therneau -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inefficiency of SAS Programming
Terry's remarks (see below) are well received however, I take issue with one part of his comments. As a long time programmer (in both statistical programming languages and traditional programming languages), I miss the ability to write native-languages in R. While macros can make for difficult to read code, when used properly, they can also make flexible code that, if properly written (including good documentation, which should be a part of any code) can be easy to read. Finally, everyone must remember that SAS code can be difficult to understand or inefficient just as R code can be difficult to understand or inefficient. In the end, both programming systems have their advantages and disadvantage. No programming language is perfect. It is not fair, nor correct to damn one or the other. Accept the fact that some things are more easily and more clearly done in one language, other things are more clearly and more easily done in another language. Let's move on to more important issues, viz. improving R so it is as good as it possibly can be. John John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Terry Therneau thern...@mayo.edu 2/27/2009 10:23 AM Three comments I actually think you can write worse code in R than in SAS: more tools = more scope for innovatively bad ideas. The ability to write bad code should not damm a language. I found almost all of the improvements to the multi-line SAS recode to be regressions, both the SAS and the S suggestions. a. Everyone, even those of you with no SAS backround whatsoever, immediately understood the code. Most of the replacements are obscure. Compilers are very good these days and computers are fast, fewer typed characters != better. b. If I were writing the S code for such an application, it would look much the same. I worked as a programmer in medical research for several years, and one of the things that moved me on to graduate studies in statistics was the realization that doing my best work meant being as UN-clever as possible in my code. Frank's comments imply that he was reading SAS macro code at the moment of peak frustration. And if you want to criticise SAS code, this is the place to look. SAS macro started out as some simple expansions, then got added on to, then added on again, and again, and with no overall blueprint. It is much like the farmhouse of some neighbors of mine growing up: 4 different expansions in 4 eras, and no overall guiding plan. The interior layout was interesting to say the least. I was once a bona fide SAS 'wizard' (and Frank was much better than me), and I can't read the stuff without grinding my teeth. S was once headed down the same road. One of the best things ever with the language was documented in the blue book The New S Language, where Becker et al had the wisdom to scrap the macro processor. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Confidentiality Statement: This email message, including any attachments, is for th...{{dropped:6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R crash on Mac
If I define this function R ask - function (message = Type in datum) eval(parse(prompt = paste(message, : , sep = ))) the following is produced as expected on a Linux/debian machine R ask(input) input: 3 [1] 3 R ask(input) input: 3:6 [1] 3 4 5 6 R ask(input) input: c(3,6) [1] 3 6 If I run exactly the same on a Mac (OS X 10.5.6), it still works provided R is run in a Terminal window. The outcome changes if R is run in its own window, started by clicking on its icon; the first two examples are still Ok, the third one produces: *** caught segfault *** address 0x4628c854, cause 'memory not mapped' R sessionInfo() # before crash! R version 2.8.1 (2008-12-22) i386-apple-darwin8.11.1 locale: en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats utils datasets grDevices graphics methods base R R.version _ platform i386-apple-darwin8.11.1 arch i386 os darwin8.11.1 system i386, darwin8.11.1 status major 2 minor 8.1 year 2008 month 12 day22 svn rev47281 language R version.string R version 2.8.1 (2008-12-22) -- Adelchi Azzalini azzal...@stat.unipd.it Dipart.Scienze Statistiche, Università di Padova, Italia tel. +39 049 8274147, http://azzalini.stat.unipd.it/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] formula formatting/grammar for regression
Hi all, I am doing some basic regression analysis, and am getting a bit confused on how to enter non-polynomial formulas to be used. For example, consider that I want to find A and r such that the formula y = A*exp(r*x) provides the the best fit to the line y=x on the interval [0,50]. I can set: xpts - seq(0, 50, by=0.1) ypts - seq(0, 50, by=0.1) I know I can find a fitted polynomial of a given degree using lm(ypts ~ poly(xpts, degree=5, raw=TRUE)) But am confused on what the formula should be for trying to find a fit to y = A*exp(r*x). If anyone knows of a resource that describes the grammar behind assembling these formulas, I would really appreciate being pointed in that direction as I can't seem to find much beyond basic polynomials. Thanks for the help! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] formula formatting/grammar for regression
Brigid Mooney bkmooney at gmail.com writes: I am doing some basic regression analysis, and am getting a bit confused on how to enter non-polynomial formulas to be used. .. But am confused on what the formula should be for trying to find a fit to y = A*exp(r*x). If this example is just a placeholder for more complex than poly, you should check function nls which works for non-linear functions. However, if you really want to solve this problem only, doing a log on you data and fitting a log of the above function with lm() is the easiest way out. Results can be a bit different from the nonlinear case depending on noise, because in one case weight are log-weighted, in the other linearly. Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] Package DAKS for knowledge space theory, on CRAN now
Version 1.0-0 of DAKS (Data Analysis and Knowledge Spaces) has been released to CRAN. Knowledge space theory is a recent psychometric test theory based on combinatorial mathematical structures (order and lattice theory). Solvability dependencies between dichotomous test items play an important role in knowledge space theory. Utilizing hypothesized dependencies between items, knowledge space theory has been successfully applied for the computerized, adaptive assessment and training of knowledge. The package DAKS implements inductive item tree analysis methods for deriving surmise relations from binary data. It provides functions for computing population and estimated asymptotic variances of the used fit measures, and for switching between test item and knowledge state representations. Other features are a Hasse diagram drawing device, a data simulation tool based on a finite mixture latent variable model, and a function for computing response pattern and knowledge state frequencies. Best regards, Anatol Sargin Ali Uenlue -- Department of Computer-Oriented Statistics and Data Analysis Institute of Mathematics University of Augsburg http://stats.math.uni-augsburg.de/ [[alternative HTML version deleted]] ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using package ROCR
For question 1: Can you please report to the package maintainer (well, I am CCing Tobias now) who will certainly be happy to improve the package (particularly the demo behaviour). For question 2 (and your latest message): does not happen for me. Which versions are you using, i.e. have you updated to the most recent ones? In any case, using Namespaces is another thing that might be worth considering for Tobias as the ROCR maintainer. Tobias, a last point for you: Your package gives WARNINGs in the checks for ages now, can you please fix that also. Thank you, Uwe Ligges wiener30 wrote: Just an update concerning an error message in using ROCR package. Error in as.double(y) : cannot coerce type 'S4' to vector of type 'double' I have changed the sequence of loading the packages and the problem has gone: library(ROCR) library(randomForest) The loading sequence that caused an error was: library(randomForest) library(ROCR) May be this info could be useful for somebody else who is getting the same error. wiener30 wrote: Thank you very much for the response! The plot(1,1) helped to resolve the first problem. But I am still getting a second error message when running demo(ROCR) Error in as.double(y) : cannot coerce type 'S4' to vector of type 'double' It seems it has something to do with compatibility of S4 objects. My versions of R and ROCR package are the same as you listed. But it seems something other is missing in my installation. William Doane wrote: Responding to question 1... it seems the demo assumes you already have a plot window open. library(ROCR) plot(1,1) demo(ROCR) seems to work. For question 2, my environment produces the expected results... plot doesn't generate an error: * R 2.8.1 GUI 1.27 Tiger build 32-bit (5301) * OS X 10.5.6 * ROCR 1.0-2 -Wil wiener30 wrote: I am trying to use package ROCR to analyze classification accuracy, unfortunately there are some problems right at the beginning. Question 1) When I try to run demo I am getting the following error message library(ROCR) demo(ROCR) if(dev.cur() = 1) [TRUNCATED] Error in get(getOption(device)) : wrong first argument When I issue the command dev.cur() it returns null device 1 It seems something is wrong with my R-environment ? Could somebody provide a hint, what is wrong. Question 2) When I run an example commands from the manual library(ROCR) data(ROCR.simple) pred - prediction( ROCR.simple$predictions, ROCR.simple$labels ) perf - performance( pred, tpr, fpr ) plot( perf ) the plot command issues the following error message Error in as.double(y) : cannot coerce type 'S4' to vector of type 'double' How this could be fixed ? Thanks for the support __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levelplot help needed
To reorder the y-labels, simply reorder the factor levels: df - data.frame(x_label = factor(x_label), y_label = factor(y_label, rev(y_label)), values = as.vector(my.data)) Not sure about putting the strips at the bottom. A quick scan of ?xyplot and ?strip.default suggests that this is not possible, but I'm sure Deepayan will correct me if I'm wrong (he often does). --sundar On Fri, Feb 27, 2009 at 5:51 AM, Antje niederlein-rs...@yahoo.de wrote: Hi there, I'm looking for someone who can give me some hints how to make a nice levelplot. As an example, I have the following code: # create some example data # -- xl - 4 yl - 10 my.data - sapply(1:xl, FUN = function(x) { rnorm( yl, mean = x) }) x_label - rep(c(X Label 1, X Label 2, X Label 3, X Label 4), each = yl) y_label - rep(paste(Y Label , 1:yl, sep=), xl) df - data.frame(x_label = factor(x_label),y_label = factor(y_label), values = as.vector(my.data)) df1 - data.frame(df, group = rep(Group 1, xl*yl)) df2 - data.frame(df, group = rep(Group 2, xl*yl)) df3 - data.frame(df, group = rep(Group 3, xl*yl)) mdf - rbind(df1,df2,df3) # plot # -- graph - levelplot(mdf$values ~ mdf$x_label * mdf$y_label | mdf$group, aspect = xy, layout = c(3,1), scales = list(x = list(labels = substr(levels(factor(mdf$x_label)),0,5), rot = 45))) print(graph) # -- (I need to put this strange x-labels, because in my real data the values of the x-labels are too long and I just want to display the first 10 characters as label) My questions: * I'd like to start with Y Label 1 in the upper row (that's a more general issue, how can I have influence on the order of x,y, and groups?) * I'd like to put the groups at the bottom Can anybody give me some help? Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] combining identify() and locator()
awesome. Thank you very much for the quick response. I think this is exactly what I was looking for. -Brian On Feb 27, 2009, at 1:10 AM, Barry Rowlingson wrote: 2009/2/27 Brian Bolt bb...@kalypsys.com: Hi, I am wondering if there might be a way to combine the two functions identify() and locator() such that if I use identify() and then click on a point outside the set tolerance, the x,y coordinates are returned as in locator(). Does anyone know of a way to do this? Thanks in advance for any help Since identify will only return the indexes of selected points, and it only takes on-screen clicks for coordinates, you'll have to leverage locator and duplicate some of the identify work. So call locator(1), then compute the distancez to your points, and if any are below your tolerance mark them using text(), otherwise keep the coordinates of the click. You can use dist() to compute a distance matrix, but if you want to totally replicate identify's tolerance behaviour I think you'll have to convert from your data coordinates to device coordinates. The grconvertX and Y functions look like they'll do that for you. Okay, that's the flatpack delivered, I think you've got all the parts, some assembly required! Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levelplot help needed
Try using the alternating=FALSE option. -- David Winsemius On Feb 27, 2009, at 12:07 PM, Sundar Dorai-Raj wrote: To reorder the y-labels, simply reorder the factor levels: df - data.frame(x_label = factor(x_label), y_label = factor(y_label, rev(y_label)), values = as.vector(my.data)) Not sure about putting the strips at the bottom. A quick scan of ?xyplot and ?strip.default suggests that this is not possible, but I'm sure Deepayan will correct me if I'm wrong (he often does). --sundar On Fri, Feb 27, 2009 at 5:51 AM, Antje niederlein-rs...@yahoo.de wrote: Hi there, I'm looking for someone who can give me some hints how to make a nice levelplot. As an example, I have the following code: # create some example data # -- xl - 4 yl - 10 my.data - sapply(1:xl, FUN = function(x) { rnorm( yl, mean = x) }) x_label - rep(c(X Label 1, X Label 2, X Label 3, X Label 4), each = yl) y_label - rep(paste(Y Label , 1:yl, sep=), xl) df - data.frame(x_label = factor(x_label),y_label = factor(y_label), values = as.vector(my.data)) df1 - data.frame(df, group = rep(Group 1, xl*yl)) df2 - data.frame(df, group = rep(Group 2, xl*yl)) df3 - data.frame(df, group = rep(Group 3, xl*yl)) mdf - rbind(df1,df2,df3) # plot # -- graph - levelplot(mdf$values ~ mdf$x_label * mdf$y_label | mdf $group, aspect = xy, layout = c(3,1), scales = list(x = list(labels = substr(levels(factor(mdf$x_label)),0,5), rot = 45))) print(graph) # -- (I need to put this strange x-labels, because in my real data the values of the x-labels are too long and I just want to display the first 10 characters as label) My questions: * I'd like to start with Y Label 1 in the upper row (that's a more general issue, how can I have influence on the order of x,y, and groups?) * I'd like to put the groups at the bottom Can anybody give me some help? Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Re : Have a function like the _n_ in R ? (Automatic count function )
If you are in the context of a data frame (which is closest to the concept of a data set in SAS), the 1:nrow(df) is closest to what you may look for. For instance: data(iris) .n. - 1:nrow(iris) You may notice that this number is not very idiomatic in R. If you have something like: if(_N_ 50) then output; in R you can simply put iris[-(1:50),] without using an explicit counter variable. In the context of a matrix, the row() and col() functions may do what you want. Am 25.02.2009 um 15:34 schrieb justin bem: R is more flexible that SAS. You have many functions for loop e.g. for, while, repeat. You also have dim and length functions to get objects dimensions. i-0 dat-matrix(c(1, runif(1), .Random.seed[1]),nr=1) repeat{ i=i+1 dat-rbind(dat, matrix(c(1+i, runif(1), .Random.seed[1]),nr=1)) if (i==4) break } colnames(dat)-c(counter, x,seed) dat Justin BEM BP 1917 Yaoundé Tél (237) 99597295 (237) 22040246 De : Nash morri...@ibms.sinica.edu.tw À : r-help r-help@r-project.org Envoyé le : Mercredi, 25 Février 2009, 13h25mn 18s Objet : [R] Have a function like the _n_ in R ? (Automatic count function ) Have the counter function in R ? if we use the software SAS /*** SAS Code **/ data tmp(drop= i); retain seed x 0; do i = 1 to 5; call ranuni(seed,x); output; end; run; data new; counter=_n_; * this keyword _n_ ; set tmp; run; /* _n_ (Automatic variables) are created automatically by the DATA step or by DATA step statements. */ /*** Output counter seed x 1 584043288 0.27197 2 935902963 0.43581 3 301879523 0.14057 4 753212598 0.35074 5 1607264573 0.74844 / Have a function like the _n_ in R ? -- Nash - morri...@ibms.sinica.edu.tw __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Filtering a dataset's columns by another dataset's column names
Hello all, I hope some of you can come to my rescue, yet again. I have two genetic datasets, and I want one of the datasets to have only the columns that are in common with the other dataset. Here is a toy example (my real datasets have hundreds of columns): Dataset 1: IndividualSNP1SNP2SNP3SNP4SNP5 1AGTCA 2TCAGT 3ACTCA Dataset 2: IndividualSNP1SNP3SNP5SNP6SNP7 4ATTGC 5TAAGG 6AATCG I want Dataset1 to have only columns that are also represented in Dataset 2, i.e., I want to generate a new Dataset 3 that looks like this: IndividualSNP1SNP3SNP5 1ATA 2TAT 3ATA Does anyone know how I could do this? Keep in mind that this is not a simple merge, as in the merge function. Thanks very much for your help everyone. Josh B. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Filtering a dataset's columns by another dataset's column names
Try this: d1[,intersect(names(d1),names(d2))] HTH, Brian -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Josh B Sent: Friday, February 27, 2009 12:28 PM To: R Help Subject: [R] Filtering a dataset's columns by another dataset's column names Hello all, I hope some of you can come to my rescue, yet again. I have two genetic datasets, and I want one of the datasets to have only the columns that are in common with the other dataset. Here is a toy example (my real datasets have hundreds of columns): Dataset 1: IndividualSNP1SNP2SNP3SNP4SNP5 1AGTCA 2TCAGT 3ACTCA Dataset 2: IndividualSNP1SNP3SNP5SNP6SNP7 4ATTGC 5TAAGG 6AATCG I want Dataset1 to have only columns that are also represented in Dataset 2, i.e., I want to generate a new Dataset 3 that looks like this: IndividualSNP1SNP3SNP5 1ATA 2TAT 3ATA Does anyone know how I could do this? Keep in mind that this is not a simple merge, as in the merge function. Thanks very much for your help everyone. Josh B. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- This message w/attachments (message) may be privileged, confidential or proprietary, and if you are not an intended recipient, please notify the sender, do not use or share it and delete it. Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Merrill Lynch. Subject to applicable law, Merrill Lynch may monitor, review and retain e-communications (EC) traveling through its networks/systems. The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or error-free. References to Merrill Lynch are references to any company in the Merrill Lynch Co., Inc. group of companies, which are wholly-owned by Bank of America Corporation. Secu! rities and Insurance Products: * Are Not FDIC Insured * Are Not Bank Guaranteed * May Lose Value * Are Not a Bank Deposit * Are Not a Condition to Any Banking Service or Activity * Are Not Insured by Any Federal Government Agency. Attachments that are part of this E-communication may have additional important disclosures and disclaimers, which you should read. This message is subject to terms available at the following link: http://www.ml.com/e-communications_terms/. By messaging with Merrill Lynch you consent to the foregoing. -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Filtering a dataset's columns by another dataset's column names
on 02/27/2009 11:27 AM Josh B wrote: Hello all, I hope some of you can come to my rescue, yet again. I have two genetic datasets, and I want one of the datasets to have only the columns that are in common with the other dataset. Here is a toy example (my real datasets have hundreds of columns): Dataset 1: IndividualSNP1SNP2SNP3SNP4SNP5 1AGTCA 2TCAGT 3ACTCA Dataset 2: IndividualSNP1SNP3SNP5SNP6SNP7 4ATTGC 5TAAGG 6AATCG I want Dataset1 to have only columns that are also represented in Dataset 2, i.e., I want to generate a new Dataset 3 that looks like this: IndividualSNP1SNP3SNP5 1ATA 2TAT 3ATA Does anyone know how I could do this? Keep in mind that this is not a simple merge, as in the merge function. Thanks very much for your help everyone. Josh B. Same.Cols - intersect(names(DF1), names(DF2)) Same.Cols [1] Individual SNP1 SNP3 SNP5 rbind(DF1[, Same.Cols], DF2[, Same.Cols]) Individual SNP1 SNP3 SNP5 1 1ATA 2 2TAT 3 3ATA 4 4ATT 5 5TAA 6 6AAT See ?intersect, which gives you the common column names, which you can then use in rbind(). HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Filtering a dataset's columns by another dataset's column names
Dear Josh, Try this: dataset1[,colnames(dataset1) %in% colnames(dataset2)] Take a look at ?colnames and ?%in% for more information. HTH, Jorge On Fri, Feb 27, 2009 at 12:27 PM, Josh B josh...@yahoo.com wrote: Hello all, I hope some of you can come to my rescue, yet again. I have two genetic datasets, and I want one of the datasets to have only the columns that are in common with the other dataset. Here is a toy example (my real datasets have hundreds of columns): Dataset 1: IndividualSNP1SNP2SNP3SNP4SNP5 1AGTCA 2TCAGT 3ACTCA Dataset 2: IndividualSNP1SNP3SNP5SNP6SNP7 4ATTGC 5TAAGG 6AATCG I want Dataset1 to have only columns that are also represented in Dataset 2, i.e., I want to generate a new Dataset 3 that looks like this: IndividualSNP1SNP3SNP5 1ATA 2TAT 3ATA Does anyone know how I could do this? Keep in mind that this is not a simple merge, as in the merge function. Thanks very much for your help everyone. Josh B. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Filtering a dataset's columns by another dataset's column names
So you want the data that is in Dataset 1 but only the column names that are also in Dataset 2: How about: subset(DS1, select = names(DS1) %in% names(DS2) ) DS1 -read.table(textConnection(IndividualSNP1SNP2 SNP3SNP4SNP5 + 1AGTCA + 2TCAGT + 3ACTCA),header=TRUE) DS2 -read.table(textConnection(IndividualSNP1SNP3 SNP5SNP6SNP7 + 4ATTGC + 5TAAGG + 6AATCG),header=TRUE) subset(DS1, select= names(DS1) %in% names(DS2) ) Individual SNP1 SNP3 SNP5 1 1ATA 2 2TAT 3 3ATA Tested! -- David Winsemius Heritage Labs On Feb 27, 2009, at 12:27 PM, Josh B wrote: Hello all, I hope some of you can come to my rescue, yet again. I have two genetic datasets, and I want one of the datasets to have only the columns that are in common with the other dataset. Here is a toy example (my real datasets have hundreds of columns): Dataset 1: IndividualSNP1SNP2SNP3SNP4SNP5 1AGTCA 2TCAGT 3ACTCA Dataset 2: IndividualSNP1SNP3SNP5SNP6SNP7 4ATTGC 5TAAGG 6AATCG I want Dataset1 to have only columns that are also represented in Dataset 2, i.e., I want to generate a new Dataset 3 that looks like this: IndividualSNP1SNP3SNP5 1ATA 2TAT 3ATA Does anyone know how I could do this? Keep in mind that this is not a simple merge, as in the merge function. Thanks very much for your help everyone. Josh B. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] combining identify() and locator()
2009/2/27 Brian Bolt bb...@kalypsys.com: awesome. Thank you very much for the quick response. I think this is exactly what I was looking for. Here's a basic framework: `idloc` - function(xy,n=1, tol=0.25){ tol2=tol^2 icoords = cbind(grconvertX(xy[,1],to=inches),grconvertY(xy[,2],to=inches)) hit = c() missed = matrix(ncol=2,nrow=0) for(i in 1:n){ ptU = locator(1) pt = c(grconvertX(ptU$x,to='inches'),grconvertY(ptU$y,to=inches)) d2 = (icoords[,1]-pt[1])^2 + (icoords[,2]-pt[2])^2 if (any(d2 tol2)){ print(clicked) hit = c(hit, (1:dim(xy)[1])[d2 tol2]) }else{ print(missed) missed=rbind(missed,c(ptU$x,ptU$y)) } } return(list(hit=hit,missed=missed)) } Test: xy = cbind(1:10,runif(10)) plot(xy) idloc(xy,10) now click ten times, on points or off points. You get back: $hit [1] 4 6 7 10 $missed [,1] [,2] [1,] 5.698940 0.6835392 [2,] 6.216171 0.6144229 [3,] 5.877982 0.5752569 [4,] 6.773190 0.2895761 [5,] 7.210847 0.3126149 [6,] 9.239985 0.5614337 - $hit is the indices of the points you hit (in order, including duplicates) and $missed are the coordinates of the misses. It crashes out if you hit the middle button for the locator, but that should be easy enough to fixup. It doesn't label hit points, but that's also easy enough to do. Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Filtering a dataset's columns by another dataset's column names
Hi Josh B, this looks like homework to me. Please obey the posting rules. I.e., provide self-contained code/examples and show what the point is at which you are stuck. To solve your problem, you need the which and the names function as well as the %in% operator. It is then easy to rbind the two datasets once you have figured out what the common column names are. Please try on your own first and report back if and where you are stuck along with the self-contained code. If this is indeed homework, please ask your professor or teacher. Example for two simulated datasets: x=rnorm(30) dim(x)=c(5,6) x=data.frame(x) names(x)=c(a,b,c,x,y,z) y=rnorm(30) dim(y)=c(5,6) y=data.frame(y) names(y)=c(a,b,d,v,w,x) Daniel - cuncta stricte discussurus - -Ursprüngliche Nachricht- Von: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Im Auftrag von Josh B Gesendet: Friday, February 27, 2009 12:28 PM An: R Help Betreff: [R] Filtering a dataset's columns by another dataset's column names Hello all, I hope some of you can come to my rescue, yet again. I have two genetic datasets, and I want one of the datasets to have only the columns that are in common with the other dataset. Here is a toy example (my real datasets have hundreds of columns): Dataset 1: IndividualSNP1SNP2SNP3SNP4SNP5 1AGTCA 2TCAGT 3ACTCA Dataset 2: IndividualSNP1SNP3SNP5SNP6SNP7 4ATTGC 5TAAGG 6AATCG I want Dataset1 to have only columns that are also represented in Dataset 2, i.e., I want to generate a new Dataset 3 that looks like this: IndividualSNP1SNP3SNP5 1ATA 2TAT 3ATA Does anyone know how I could do this? Keep in mind that this is not a simple merge, as in the merge function. Thanks very much for your help everyone. Josh B. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inefficiency of SAS Programming
spam me wrote: I've actually used AHRQ's software to create Inpatient Quality Indicator reports. I can confirm pretty much what we already know; it is inefficient. Running on about 1.8 - 2 million cases, it would take just about a whole day to run the entire process from start to finish. That isn't all processing time and includes some time for the analyst to check results between substeps, but I still knew that my day was full when I was working on IQI reports. To be fair though, there are a lot of other factors (beside efficiency considerations) that go into AHRQ's program design. First, there are a lot of changes to that software every year. In some cases it is easier and less error prone to hardcode a few points in the data so that it is blatantly obvious what to change next year should another analyst need to do so. Second, the organizations that use this software often require transparency and may not have high level programmers on staff. Writing code so that it is accessible, editable, and interpretable by intermediate level programmers or analysts is a plus. Third, given that IQI reports are often produced on a yearly basis, there's no real need to sacrifice clarity, etc. for efficiency - you're only doing this process once a year. There are other points that could be made, but the main idea is I don't think it's fair to hold this software up, out of context, as an example of SAS's (or even AHRQs) inefficiencies. I agree that SAS syntax is nowhere near as elegant or as powerful as R from a programming standpoint, that's why after 7 years of using SAS I switched to R. But comparing the two at that level is like a racing a Ferrari and a Bentley to see which is the better car. Dear Anonymous, Nice points. I would just add that it would be better if government-sponsored projects would result in software that could be run without expensive licenses. Thanks Frank [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inefficiency of SAS Programming
John Sorkin wrote: Terry's remarks (see below) are well received however, I take issue with one part of his comments. As a long time programmer (in both statistical programming languages and traditional programming languages), I miss the ability to write native-languages in R. While macros can make for difficult to read code, when used properly, they can also make flexible code that, if properly written (including good documentation, which should be a part of any code) can be easy to read. Finally, everyone must remember that SAS code can be difficult to understand or inefficient just as R code can be difficult to understand or inefficient. In the end, both programming systems have their advantages and disadvantage. No programming language is perfect. It is not fair, nor correct to damn one or the other. Accept the fact that some things are more easily and more clearly done in one language, other things are more clearly and more easily done in another language. Let's move on to more important issues, viz. improving R so it is as good as it possibly can be. John Nice points John. My only response is that I learned SAS in 1969 and used it intensively until 1991. I wrote some of the first user-contributed SAS procedures (PROCs PCTL, GRAPH, DATACHK, LOGIST, PHGLM) and wrote extensively in the macro language. After using S-Plus for only one month my productivity was far ahead of my productivity using SAS. Frank John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Terry Therneau thern...@mayo.edu 2/27/2009 10:23 AM Three comments I actually think you can write worse code in R than in SAS: more tools = more scope for innovatively bad ideas. The ability to write bad code should not damm a language. I found almost all of the improvements to the multi-line SAS recode to be regressions, both the SAS and the S suggestions. a. Everyone, even those of you with no SAS backround whatsoever, immediately understood the code. Most of the replacements are obscure. Compilers are very good these days and computers are fast, fewer typed characters != better. b. If I were writing the S code for such an application, it would look much the same. I worked as a programmer in medical research for several years, and one of the things that moved me on to graduate studies in statistics was the realization that doing my best work meant being as UN-clever as possible in my code. Frank's comments imply that he was reading SAS macro code at the moment of peak frustration. And if you want to criticise SAS code, this is the place to look. SAS macro started out as some simple expansions, then got added on to, then added on again, and again, and with no overall blueprint. It is much like the farmhouse of some neighbors of mine growing up: 4 different expansions in 4 eras, and no overall guiding plan. The interior layout was interesting to say the least. I was once a bona fide SAS 'wizard' (and Frank was much better than me), and I can't read the stuff without grinding my teeth. S was once headed down the same road. One of the best things ever with the language was documented in the blue book The New S Language, where Becker et al had the wisdom to scrap the macro processor. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Confidentiality Statement: This email message, including any attachments, is for ...{{dropped:14}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help with projection pursuit
Hi all, I have some difficulties with the function ppr for projection pursuit regression. I obtained the results for a projection pursuit regression and now I would like to compute some predictions for new data. I tried the function predict in the following way predict(res.ppr, newdata) but it seems that it is not right. The data rock is given for illustration of the function ppr. attach(rock) rock.ppr - ppr(log(perm) ~ area1 + peri1 + shape, data = rock, nterms = 2, max.terms = 5) So suppose I want to make a prediction for the point area1=10,peri1=3 and shape=2. I tried the command predict(rock.ppr, c(10,3,2)) but it returns an error message. So, could you indicate me the right way for this prediction? Thanks for your help. Olivier. -- - Martin Olivier INRA - Unité Biostatistique Processus Spatiaux Domaine St Paul, Site Agroparc 84914 Avignon Cedex 9, France Tel : 04 32 72 21 57 Fax : 04 32 72 21 82 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help: locfit (local logistic regression)
Hi, I am running a local logistic regression using locfit. Now, I want to choose the bandwidth using cross-validation. I don't know if there is an additional command to do so or if I can do it in the locfit. I would appreciate any help about this matter. Thank you. Regards, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] add absolute value to bars in barplot
Note that putting numbers near the top of the bars (either inside or outside) tends to create 'fuzzy' tops to the bars that make it harder for the viewer to quickly interpret the graph. If the numbers are important, put them in a table. If you really need to have the numbers and graph together then look at alternatives (some type of combined table/graph) or put the numbers in a margin of the graph where they will not distract from the graph itself. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of soeren.vo...@eawag.ch Sent: Friday, February 27, 2009 5:33 AM To: r-help@r-project.org Subject: [R] add absolute value to bars in barplot Hello, r-h...@r-project.orgbarplot(twcons.area, beside=T, col=c(green4, blue, red3, gray), xlab=estate, ylab=number of persons, ylim=c(0, 110), legend.text=c(treated, mix, untreated, NA)) produces a barplot very fine. In addition, I'd like to get the bars' absolute values on the top of the bars. How can I produce this in an easy way? Thanks Sören __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] mefa 3.0-0
Dear R Community, I am pleased to announce that a new version of the mefa R package is available at the CRAN. mefa is a package for multivariate data handling in ecology and biogeography. It provides object classes to represent the data coded by samples, taxa and segments (i.e., subpopulations, repeated measures). It supports easy processing of the data along with relational data tables for samples and taxa. An object of class mefa is a project specific compendium of the dataset and can be easily used in further analyses. Methods are provided for extraction, aggregation, conversion, plotting, summary and reporting of mefa objects. Reports can be generated in plain text or LaTex. The current version has been published in JSS ( http://www.jstatsoft.org/v29/i08 ). The paper presents worked examples on a variety of ecological analyses. Best wishes, Péter Péter Sólymos, PhD Postdoctoral Fellow Department of Mathematical and Statistical Sciences University of Alberta Edmonton, Alberta, T6G 2G1 Canada email - paste(solymos, ualberta.ca, sep = @) ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] testing two-factor anova effects using model comparison approach with lm() and anova()
I wonder if someone could explain the behavior of the anova() and lm() functions in the following situation: I have a standard 3x2 factorial design, factorA has 3 levels, factorB has 2 levels, they are fully crossed. I have a dependent variable DV. Of course I can do the following to get the usual anova table: anova(lm(DV~factorA+factorB+factorA:factorB)) Analysis of Variance Table Response: DV Df Sum Sq Mean Sq F value Pr(F) factorA 2 7.4667 3.7333 4.9778 0.015546 * factorB 1 2.1333 2.1333 2.8444 0.104648 factorA:factorB 2 9.8667 4.9333 6.5778 0.005275 ** Residuals 24 18. 0.7500 This is perfectly satisfactory for my situation, but as a pedagogical exercise, I wanted to demonstrate the model comparison approach to analysis of variance by using anova() to compare a full model that contains all effects, to restricted models that contain all effects save for the effect of interest. The test of the interaction effect seems to be as I expected: fullmodel-lm(DV~factorA+factorB+factorA:factorB) restmodel-lm(DV~factorA+factorB) anova(fullmodel,restmodel) Analysis of Variance Table Model 1: DV ~ factorA + factorB + factorA:factorB Model 2: DV ~ factorA + factorB Res.Df RSS Df Sum of Sq F Pr(F) 1 24 18. 2 26 27.8667 -2 -9.8667 6.5778 0.005275 ** As you can see the value of F (6.5778) is the same as in the anova table above. All is well. However, if I try to test a main effect, e.g. factorA, by testing the full model against a restricted model that doesn't contain the main effect factorA, I get something strange: restmodel-lm(DV~factorB+factorA:factorB) anova(fullmodel,restmodel) Analysis of Variance Table Model 1: DV ~ factorA + factorB + factorA:factorB Model 2: DV ~ factorB + factorA:factorB Res.Df RSS Df Sum of Sq F Pr(F) 1 24 18 2 24 18 0 0 upon inspection of each model I see that the Residuals are identical, which is not what I was expecting: anova(fullmodel) Analysis of Variance Table Response: DV Df Sum Sq Mean Sq F value Pr(F) factorA 2 7.4667 3.7333 4.9778 0.015546 * factorB 1 2.1333 2.1333 2.8444 0.104648 factorA:factorB 2 9.8667 4.9333 6.5778 0.005275 ** Residuals 24 18. 0.7500 This looks fine, but then the restricted model is where things are not as I expected: anova(restmodel) Analysis of Variance Table Response: DV Df Sum Sq Mean Sq F value Pr(F) factorB 1 2.1333 2.1333 2.8444 0.104648 factorB:factorA 4 17. 4. 5.7778 0.002104 ** Residuals 24 18. 0.7500 I was expecting the Residuals in the restricted model (the one not containing main effect of factorA) to be larger than in the full model containing all three effects. In other words, the variance accounted for by the main effect factorA should be added to the Residuals. Instead, it looks like the variance accounted for by the main effect of factorA is being soaked up by the factorA:factorB interaction term. Strangely, the degrees of freedom are also affected. I must be misunderstanding something here. Can someone point out what is happening? Thanks, -Paul -- Paul L. Gribble, Ph.D. Associate Professor Dept. Psychology The University of Western Ontario London, Ontario Canada N6A 5C2 Tel. +1 519 661 2111 x82237 Fax. +1 519 661 3961 pgrib...@uwo.ca http://gribblelab.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] formula formatting/grammar for regression
This is just (or should be) just a simple example of what I would like to extend to further regression - which is why I was looking for a resource on the grammar. If I try: lm(ypts ~ exp(xpts)), I only get an intercept and one coefficient. And for the coefficient, I am not sure where that should go? (ie is that A or r in the formula y=A*exp(r*x) ) Also, when I tried to use nls, I get an error: nls(ypts ~ exp(xpts)) Error in getInitial.default(func, data, mCall = as.list(match.call(func, : no 'getInitial' method found for function objects If someone could please point out what I am doing wrong, or point me to a good resource on this, I would greatly appreciate it. Thanks! Dieter Menne wrote: Brigid Mooney bkmooney at gmail.com writes: I am doing some basic regression analysis, and am getting a bit confused on how to enter non-polynomial formulas to be used. .. But am confused on what the formula should be for trying to find a fit to y = A*exp(r*x). If this example is just a placeholder for more complex than poly, you should check function nls which works for non-linear functions. However, if you really want to solve this problem only, doing a log on you data and fitting a log of the above function with lm() is the easiest way out. Results can be a bit different from the nonlinear case depending on noise, because in one case weight are log-weighted, in the other linearly. Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/formula-formatting-grammar-for-regression-tp22249014p22251094.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inefficiency of SAS Programming
A further example of software pricing dynamics is the complete lack of awareness of WPS , a UK based software which is basically a base SAS clone with all the features of SAS ( coding read ,write and data read /write) and priced only at 660$ per desktop and 1400$ for server licenses ..very very cheap compared to SAS Base..and it has a Bridge to R for higher level statistics... You would think a corporate user would not have any hesitation to switch to a clone software priced at 10 % ... yet there are hardly any takers for it..in the federal government... :)) people worried about their government's spending should use the new website http://www.recovery.gov/?q=content/contact it is supposed to chronicle this and it would be a good test and control for the Web 2.0 initiatives.. On Fri, Feb 27, 2009 at 11:18 PM, Frank E Harrell Jr f.harr...@vanderbilt.edu wrote: spam me wrote: I've actually used AHRQ's software to create Inpatient Quality Indicator reports. I can confirm pretty much what we already know; it is inefficient. Running on about 1.8 - 2 million cases, it would take just about a whole day to run the entire process from start to finish. That isn't all processing time and includes some time for the analyst to check results between substeps, but I still knew that my day was full when I was working on IQI reports. To be fair though, there are a lot of other factors (beside efficiency considerations) that go into AHRQ's program design. First, there are a lot of changes to that software every year. In some cases it is easier and less error prone to hardcode a few points in the data so that it is blatantly obvious what to change next year should another analyst need to do so. Second, the organizations that use this software often require transparency and may not have high level programmers on staff. Writing code so that it is accessible, editable, and interpretable by intermediate level programmers or analysts is a plus. Third, given that IQI reports are often produced on a yearly basis, there's no real need to sacrifice clarity, etc. for efficiency - you're only doing this process once a year. There are other points that could be made, but the main idea is I don't think it's fair to hold this software up, out of context, as an example of SAS's (or even AHRQs) inefficiencies. I agree that SAS syntax is nowhere near as elegant or as powerful as R from a programming standpoint, that's why after 7 years of using SAS I switched to R. But comparing the two at that level is like a racing a Ferrari and a Bentley to see which is the better car. Dear Anonymous, Nice points. I would just add that it would be better if government-sponsored projects would result in software that could be run without expensive licenses. Thanks Frank [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with projection pursuit
In my experience (and per the help pages now that I look) the predict functions need named arguments that match up with the column names in the model and generally this needs to be supplied as a dataframe or a list. (note: at least on my machine the rock dataframe does *not* have the names you offered) predict(rock.ppr, list(area=10, peri= 3, shape=2)) # or... predict(rock.ppr, data.frame(area=10, peri= 3, shape=2)) predict(rock.ppr, list(area=10, peri= 3, shape=2)) 1 7.118094 -- David Winsemius On Feb 27, 2009, at 10:09 AM, Olivier MARTIN wrote: Hi all, I have some difficulties with the function ppr for projection pursuit regression. I obtained the results for a projection pursuit regression and now I would like to compute some predictions for new data. I tried the function predict in the following way predict(res.ppr, newdata) but it seems that it is not right. The data rock is given for illustration of the function ppr. attach(rock) rock.ppr - ppr(log(perm) ~ area1 + peri1 + shape, data = rock, nterms = 2, max.terms = 5) So suppose I want to make a prediction for the point area1=10,peri1=3 and shape=2. I tried the command predict(rock.ppr, c(10,3,2)) but it returns an error message. So, could you indicate me the right way for this prediction? Thanks for your help. Olivier. -- - Martin Olivier INRA - Unité Biostatistique Processus Spatiaux Domaine St Paul, Site Agroparc 84914 Avignon Cedex 9, France Tel : 04 32 72 21 57 Fax : 04 32 72 21 82 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inefficiency of SAS Programming
Frank, A programming language's efficience is a function of several items, including what you are trying to program. Without using SAS proc IML, I have found that it is more efficient to code algorithms (e.g. a least squares linear regression) using R than SAS; we all know that matrix notation leads to more compact syntax than can be had when using non-matrix notation and R implements matrix notation. On the other hand, searching, sub-setting, merging etc. can a times be coded more efficiently, more easily, and in a more easily understood fashion is SAS. I am sure you people who use SAS to set up their datasets and then use R when they are developing an algorithm. Just as French may be a better language to express love, Italian a better language in which to write opera, and English the most efficient language for communication (at least for the last 50 years), so too do both R and SAS have a place in the larger world. John John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Frank E Harrell Jr f.harr...@vanderbilt.edu 2/27/2009 12:52 PM John Sorkin wrote: Terry's remarks (see below) are well received however, I take issue with one part of his comments. As a long time programmer (in both statistical programming languages and traditional programming languages), I miss the ability to write native-languages in R. While macros can make for difficult to read code, when used properly, they can also make flexible code that, if properly written (including good documentation, which should be a part of any code) can be easy to read. Finally, everyone must remember that SAS code can be difficult to understand or inefficient just as R code can be difficult to understand or inefficient. In the end, both programming systems have their advantages and disadvantage. No programming language is perfect. It is not fair, nor correct to damn one or the other. Accept the fact that some things are more easily and more clearly done in one language, other things are more clearly and more easily done in another language. Let's move on to more important issues, viz. improving R so it is as good as it possibly can be. John Nice points John. My only response is that I learned SAS in 1969 and used it intensively until 1991. I wrote some of the first user-contributed SAS procedures (PROCs PCTL, GRAPH, DATACHK, LOGIST, PHGLM) and wrote extensively in the macro language. After using S-Plus for only one month my productivity was far ahead of my productivity using SAS. Frank John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Terry Therneau thern...@mayo.edu 2/27/2009 10:23 AM Three comments I actually think you can write worse code in R than in SAS: more tools = more scope for innovatively bad ideas. The ability to write bad code should not damm a language. I found almost all of the improvements to the multi-line SAS recode to be regressions, both the SAS and the S suggestions. a. Everyone, even those of you with no SAS backround whatsoever, immediately understood the code. Most of the replacements are obscure. Compilers are very good these days and computers are fast, fewer typed characters != better. b. If I were writing the S code for such an application, it would look much the same. I worked as a programmer in medical research for several years, and one of the things that moved me on to graduate studies in statistics was the realization that doing my best work meant being as UN-clever as possible in my code. Frank's comments imply that he was reading SAS macro code at the moment of peak frustration. And if you want to criticise SAS code, this is the place to look. SAS macro started out as some simple expansions, then got added on to, then added on again, and again, and with no overall blueprint. It is much like the farmhouse of some neighbors of mine growing up: 4 different expansions in 4 eras, and no overall guiding plan. The interior layout was interesting to say the least. I was once a bona fide SAS 'wizard' (and Frank was much better than me), and I can't read the stuff without grinding my teeth. S was once headed down the same road. One of the best things ever with the language was documented in the blue book The New S Language, where Becker et al had the wisdom to scrap the macro processor. Terry Therneau
Re: [R] testing two-factor anova effects using model comparison approach with lm() and anova()
Notice the degrees of freedom as well in the different models. With factors A and B, the 2 models: A + B + A:B And A + A:B Are actually the same overall model, just different parameterizations (you can also see this by using x=TRUE in the call to lm and looking at the x matrix used). Testing if the main effect A should be in the model given that the interaction is in the model does not make sense in most cases, therefore the notation gives a different parameterization rather than the generally uninteresting test. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Paul Gribble Sent: Friday, February 27, 2009 11:01 AM To: r-help@r-project.org Subject: [R] testing two-factor anova effects using model comparison approach with lm() and anova() I wonder if someone could explain the behavior of the anova() and lm() functions in the following situation: I have a standard 3x2 factorial design, factorA has 3 levels, factorB has 2 levels, they are fully crossed. I have a dependent variable DV. Of course I can do the following to get the usual anova table: anova(lm(DV~factorA+factorB+factorA:factorB)) Analysis of Variance Table Response: DV Df Sum Sq Mean Sq F value Pr(F) factorA 2 7.4667 3.7333 4.9778 0.015546 * factorB 1 2.1333 2.1333 2.8444 0.104648 factorA:factorB 2 9.8667 4.9333 6.5778 0.005275 ** Residuals 24 18. 0.7500 This is perfectly satisfactory for my situation, but as a pedagogical exercise, I wanted to demonstrate the model comparison approach to analysis of variance by using anova() to compare a full model that contains all effects, to restricted models that contain all effects save for the effect of interest. The test of the interaction effect seems to be as I expected: fullmodel-lm(DV~factorA+factorB+factorA:factorB) restmodel-lm(DV~factorA+factorB) anova(fullmodel,restmodel) Analysis of Variance Table Model 1: DV ~ factorA + factorB + factorA:factorB Model 2: DV ~ factorA + factorB Res.Df RSS Df Sum of Sq F Pr(F) 1 24 18. 2 26 27.8667 -2 -9.8667 6.5778 0.005275 ** As you can see the value of F (6.5778) is the same as in the anova table above. All is well. However, if I try to test a main effect, e.g. factorA, by testing the full model against a restricted model that doesn't contain the main effect factorA, I get something strange: restmodel-lm(DV~factorB+factorA:factorB) anova(fullmodel,restmodel) Analysis of Variance Table Model 1: DV ~ factorA + factorB + factorA:factorB Model 2: DV ~ factorB + factorA:factorB Res.Df RSS Df Sum of Sq F Pr(F) 1 24 18 2 24 18 0 0 upon inspection of each model I see that the Residuals are identical, which is not what I was expecting: anova(fullmodel) Analysis of Variance Table Response: DV Df Sum Sq Mean Sq F value Pr(F) factorA 2 7.4667 3.7333 4.9778 0.015546 * factorB 1 2.1333 2.1333 2.8444 0.104648 factorA:factorB 2 9.8667 4.9333 6.5778 0.005275 ** Residuals 24 18. 0.7500 This looks fine, but then the restricted model is where things are not as I expected: anova(restmodel) Analysis of Variance Table Response: DV Df Sum Sq Mean Sq F value Pr(F) factorB 1 2.1333 2.1333 2.8444 0.104648 factorB:factorA 4 17. 4. 5.7778 0.002104 ** Residuals 24 18. 0.7500 I was expecting the Residuals in the restricted model (the one not containing main effect of factorA) to be larger than in the full model containing all three effects. In other words, the variance accounted for by the main effect factorA should be added to the Residuals. Instead, it looks like the variance accounted for by the main effect of factorA is being soaked up by the factorA:factorB interaction term. Strangely, the degrees of freedom are also affected. I must be misunderstanding something here. Can someone point out what is happening? Thanks, -Paul -- Paul L. Gribble, Ph.D. Associate Professor Dept. Psychology The University of Western Ontario London, Ontario Canada N6A 5C2 Tel. +1 519 661 2111 x82237 Fax. +1 519 661 3961 pgrib...@uwo.ca http://gribblelab.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] Inefficiency of SAS Programming
My apologies, this obviously doubles as my for registration purposes account and so I don't often send from it - I was not intentionally being so secretive : ) At any rate, I completely agree, but of course it's a reciprocal relationship. The software is written in SAS because that's what the organizations use, the organizations use SAS because that's what the programs are written in... For better or worse, SAS's integration in big bureaucracies is the main thing that keeps it competitive in the marketplace and viable. There aren't a lot of other contexts in which their pricing structure would work. Bryan On Fri, Feb 27, 2009 at 12:48 PM, Frank E Harrell Jr f.harr...@vanderbilt.edu wrote: spam me wrote: I've actually used AHRQ's software to create Inpatient Quality Indicator reports. I can confirm pretty much what we already know; it is inefficient. Running on about 1.8 - 2 million cases, it would take just about a whole day to run the entire process from start to finish. That isn't all processing time and includes some time for the analyst to check results between substeps, but I still knew that my day was full when I was working on IQI reports. To be fair though, there are a lot of other factors (beside efficiency considerations) that go into AHRQ's program design. First, there are a lot of changes to that software every year. In some cases it is easier and less error prone to hardcode a few points in the data so that it is blatantly obvious what to change next year should another analyst need to do so. Second, the organizations that use this software often require transparency and may not have high level programmers on staff. Writing code so that it is accessible, editable, and interpretable by intermediate level programmers or analysts is a plus. Third, given that IQI reports are often produced on a yearly basis, there's no real need to sacrifice clarity, etc. for efficiency - you're only doing this process once a year. There are other points that could be made, but the main idea is I don't think it's fair to hold this software up, out of context, as an example of SAS's (or even AHRQs) inefficiencies. I agree that SAS syntax is nowhere near as elegant or as powerful as R from a programming standpoint, that's why after 7 years of using SAS I switched to R. But comparing the two at that level is like a racing a Ferrari and a Bentley to see which is the better car. Dear Anonymous, Nice points. I would just add that it would be better if government-sponsored projects would result in software that could be run without expensive licenses. Thanks Frank [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inefficiency of SAS Programming
Also because no one wants to put their neck out on a chopping block to suggest R without technical support and the like. If you use SAS, there's a cascade of blame available, but it's not immediately available for R. On Fri, Feb 27, 2009 at 10:36 AM, Bryan thespamho...@gmail.com wrote: My apologies, this obviously doubles as my for registration purposes account and so I don't often send from it - I was not intentionally being so secretive : ) At any rate, I completely agree, but of course it's a reciprocal relationship. The software is written in SAS because that's what the organizations use, the organizations use SAS because that's what the programs are written in... For better or worse, SAS's integration in big bureaucracies is the main thing that keeps it competitive in the marketplace and viable. There aren't a lot of other contexts in which their pricing structure would work. Bryan On Fri, Feb 27, 2009 at 12:48 PM, Frank E Harrell Jr f.harr...@vanderbilt.edu wrote: spam me wrote: I've actually used AHRQ's software to create Inpatient Quality Indicator reports. I can confirm pretty much what we already know; it is inefficient. Running on about 1.8 - 2 million cases, it would take just about a whole day to run the entire process from start to finish. That isn't all processing time and includes some time for the analyst to check results between substeps, but I still knew that my day was full when I was working on IQI reports. To be fair though, there are a lot of other factors (beside efficiency considerations) that go into AHRQ's program design. First, there are a lot of changes to that software every year. In some cases it is easier and less error prone to hardcode a few points in the data so that it is blatantly obvious what to change next year should another analyst need to do so. Second, the organizations that use this software often require transparency and may not have high level programmers on staff. Writing code so that it is accessible, editable, and interpretable by intermediate level programmers or analysts is a plus. Third, given that IQI reports are often produced on a yearly basis, there's no real need to sacrifice clarity, etc. for efficiency - you're only doing this process once a year. There are other points that could be made, but the main idea is I don't think it's fair to hold this software up, out of context, as an example of SAS's (or even AHRQs) inefficiencies. I agree that SAS syntax is nowhere near as elegant or as powerful as R from a programming standpoint, that's why after 7 years of using SAS I switched to R. But comparing the two at that level is like a racing a Ferrari and a Bentley to see which is the better car. Dear Anonymous, Nice points. I would just add that it would be better if government-sponsored projects would result in software that could be run without expensive licenses. Thanks Frank [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] select Intercept coefficients only
Hi friends, Is there a function to select intercept coefficients only ? When I use coeficients it shows me all the coefficients, but I only want a specific coefficients. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Making tapply code more efficient
On something the size of your data it took about 30 seconds to determine the number of unique teachers per student. x - cbind(sample(326397, 800967, TRUE), sample(20, 800967, TRUE)) # split the data so you have the number of teachers per student system.time(t.s - split(x[,2], x[,1])) user system elapsed 0.920.010.94 t.s[1:7] # sample data $`1` [1] 16 $`2` [1] 3 $`3` [1] 1 $`4` [1] 17 $`6` [1] 9 9 19 $`7` [1] 20 $`9` [1] 3 16 16 10 8 17 # count number of unique teachers per student system.time(t.a - sapply(t.s, function(x) length(unique(x user system elapsed 20.170.10 20.26 t.a[1:10] 1 2 3 4 6 7 9 10 11 12 1 1 1 1 2 1 5 1 1 1 On Fri, Feb 27, 2009 at 9:46 AM, Doran, Harold hdo...@air.org wrote: Previously, I posed the question pasted down below to the list and received some very helpful responses. While the code suggestions provided in response indeed work, they seem to only work with *very* small data sets and so I wanted to follow up and see if anyone had ideas for better efficiency. I was quite embarrased on this as our SAS programmers cranked out programs that did this in the blink of an eye (with a few variables), but R was spinning for days on my Ubuntu machine and ultimately I saw a message that R was killed. The data I am working with has 800967 total rows and 31 total columns. The ID variable I use as the index variable in tapply() has 326397 unique cases. length(unique(qq$student_unique_id)) [1] 326397 To give a sense of what my data look like and the actual problem, consider the following: qq - data.frame(student_unique_id = factor(c(1,1,2,2,2)), teacher_unique_id = factor(c(10,10,20,20,25))) This is a student achievement database where students occupy multiple rows in the data and the variable teacher_unique_id denotes the class the student was in. What I am doing is looking to see if the teacher is the same for each instance of the unique student ID. So, if I implement the following: same - function(x) length( unique(x) ) == 1 results - data.frame( freq = tapply(qq$student_unique_id, qq$student_unique_id, length), tch = tapply(qq$teacher_unique_id, qq$student_unique_id, same) ) I get the following results. I can see that student 1 appears in the data twice and the teacher is always the same. However, student 2 appears three times and the teacher is not always the same. results freq tch 1 2 TRUE 2 3 FALSE Now, implementing this same procedure to a large data set with the characteristics described above seems to be problematic in this implementation. Does anyone have reactions on how this could be more efficient such that it can run with large data as I described? Harold sessionInfo() R version 2.8.1 (2008-12-22) x86_64-pc-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.U TF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME= C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATI ON=C attached base packages: [1] stats graphics grDevices utils datasets methods base # Original question posted on 1/13/09 Suppose I have a dataframe as follows: dat - data.frame(id = c(1,1,2,2,2), var1 = c(10,10,20,20,25), var2 = c('foo', 'foo', 'foo', 'foobar', 'foo')) Now, if I were to subset by id, such as: subset(dat, id==1) id var1 var2 1 1 10 foo 2 1 10 foo I can see that the elements in var1 are exactly the same and the elements in var2 are exactly the same. However, subset(dat, id==2) id var1 var2 3 2 20 foo 4 2 20 foobar 5 2 25 foo Shows the elements are not the same for either variable in this instance. So, what I am looking to create is a data frame that would be like this id freq var1 var2 1 2 TRUE TRUE 2 3 FALSE FALSE Where freq is the number of times the ID is repeated in the dataframe. A TRUE appears in the cell if all elements in the column are the same for the ID and FALSE otherwise. It is insignificant which values differ for my problem. The way I am thinking about tackling this is to loop through the ID variable and compare the values in the various columns of the dataframe. The problem I am encountering is that I don't think all.equal or identical are the right functions in this case. So, say I was wanting to compare the elements of var1 for id ==1. I would have x - c(10,10) Of course, the following works all.equal(x[1], x[2]) [1] TRUE As would a similar call to identical. However, what if I only have a vector of values (or if the column consists of names) that I want to assess for equality when I am trying to automate a process over thousands of cases? As in the example above, the vector may contain only two values or it may contain many more. The number of values in the vector differ by id. Any thoughts? Harold
Re: [R] select Intercept coefficients only
choonhong ang wrote: Hi friends, Is there a function to select intercept coefficients only ? When I use coeficients it shows me all the coefficients, but I only want a specific coefficients. What about indexing, e.g. as in: coefficients(some_lm_object)[(Intercept)] Uwe Ligges [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.