Re: [R] Awk and Vilno
Rogerio Porto wrote: > Hey, > >> What we should really compare is the four situations: >> R alone >> R + awk >> R + vilno >> R + awk + vilno >> and maybe "R + SAS Data step" >> and see what scripts are more elegant (read 'short and understandable') I don't think that short and understandable necessarily go hand-in-hand. Sometimes longer scripts which are more explicit and use less tricky syntax shortcuts are much easier to understand a year or two later. Ease and speed of script writing (taking into account learning curve and time taken to consult scripting language documentation) are important, as is the ability to re-visit scripts or examine someone else's script and be able to work out what it does and how it works is vital, and speed of execution also counts with large datasets. Also ubiquity of the tool, whether it is freely available on many platforms, either pre-installed or in an easy-to-install form are also considerations. > what do you guys think of creating a R-wiki page for syntax > comparisons among the various options to enhance R use? > > I already have two sugestions: > > 1) syntax examples for using R and other tools to manipulate > and analyze large datasets (with a concise description of the > datasets); > > 2) syntax examples for using R and other tools (or R alone) to clean > and prepare datasets (simple and very small datasets, for didatic > purposes). The ability of the tools to scale to large or very large datasets is also a consideration, as is their speed when dealing with such large data. > I think this could be interesting for R users and to promote other > software tools, since it seems there is a lot of R users that use > other tools also. > > Besides that, questions on those two above subjects are prevalent > at this list. Thus a wiki page seems to be the right place to discuss > and teach this to other users. > > What do you think? Yes, happy to contribute R + Python examples to such wiki pages. Please post the URL. Tim C __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reduced Error Logistic Regression, and R?
This news item in a data mining newsletter makes various claims for a technique called "Reduced Error Logistic Regression": http://www.kdnuggets.com/news/2007/n08/12i.html In brief, are these (ambitious) claims justified and if so, has this technique been implemented in R (or does anyone have any plans to do so)? Tim C __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate similar to SPSS
Andrew Robinson <[EMAIL PROTECTED]> wrote: > can I suggest, without offending, that you purchase and read Peter > Dalgaard's "Introductory Statistics with R" or Michael Crawley's > "Statistics: An Introduction using R" or Venables and Ripley's "Modern > Applied Statistics with S" or Maindonald and Braun's "Data Analysis > and Graphics Using R: An Example-based Approach", > or download and read An Introduction to R > http://cran.r-project.org/doc/manuals/R-intro.pdf > or one of the numerous contributed documents at > http://cran.r-project.org/other-docs.html For Natalie, who is an SPSS user, may I strongly recommend "R FOR SAS AND SPSS USERS" by Bob Muenchen at http://oit.utk.edu/scc/RforSAS&SPSSusers.pdf This is a really, really excellent document which has proven to be an invaluable resource in introducing my SAS and SPSS using collegaues tot he delights or R. And it is free (as in available at no cost). Tim C > On Wed, Apr 25, 2007 at 03:32:11PM -0600, Natalie O'Toole wrote: > > Hi, > > > > Does anyone know if: with R can you take a set of numbers and > aggregate > > them like you can in SPSS? For example, could you calculate the > percentage > > of people who smoke based on a dataset like the following: > > > > smoke = 1 > > non-smoke = 2 > > > > variable > > 1 > > 1 > > 1 > > 2 > > 2 > > 1 > > 1 > > 1 > > 2 > > 2 > > 2 > > 2 > > 2 > > 2 > > > > > > When aggregated, SPSS can tell you what percentage of persons are > smokers > > based on the frequency of 1's and 2's. Can R statistical package do a > > similar thing? > > > > Thanks, > > > > Nat > > > > __ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > Andrew Robinson > Department of Mathematics and StatisticsTel: +61-3-8344-9763 > University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 > http://www.ms.unimelb.edu.au/~andrewpr > http://blogs.mbs.edu/fishing-in-the-bay/ > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sas.get problem
John Kane wrote: > How do I make this change? I naively have tried by > a) list sas.get and copy to editor > b) reload R without loading Hmisc > c) made recommended changes to sas.get > d) stuck a "sas.get <- " in front of the function and > ran it. Here is what I do, until Frank fixes the problem in the Hmisc package itself: a) list sas.get and copy to editor b) make the change to line 127 as described c) preface the function with "sas.get <- " d) save that as "sas_get_fixed.R" e) reload R and load Hmisc f) source("sas_get_fixed.R") The final step will mask the original, broken sas.get function with the fixed version. Tim C __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sas.get problem
John Kane wrote: > I have 3 SAS files all in the directory F:/sas, two > data files > and a format file : > form.ea1.sas7bdat > form.ea2.sas7bdat > sas.fmts.sas7bdat > > F is a USB. > > I am trying import them to R using "sas.get". > > I have not used SAS since I was downloading data from > mainframe > and having to write JCL. I had forgotten how bizarre > SAS can be. > I currently have not even figured out how to load the > files into SAS but > they look fine since I can import them with no problem > into SPSS. > > I am using R2.4.1 under Windows XP > SAS files were created with SAS 9.x > They convert easily into SPSS 14 > > I > n the example below I have tried various versions of > the file names with > with no luck. > Can anyone suggest some approach(s) that I might take. > > Example. > > library(Hmisc) > mydata <- sas.get(library="F:/sas", mem="form.ea1", > format.library="sas.fmts.sas7bdat", >sasprog = '"C:Program Files/SAS/SAS > 9.1/sas.exe"') > > Error message (one of several that I have gotten > while trying various things.) > The filename, directory name, or volume label syntax > is incorrect. > Error in sas.get(library = "F:/sas", mem = "form.ea1", > format.library = "sas.fmts.sas7bdat", : > SAS job failed with status 1 > In addition: Warning messages: > 1: sas.fmts.sas7bdat/formats.sc? or formats.sas7bcat > not found. Formatting ignored. > in: sas.get(library = "F:/sas", mem = "form.ea1", > format.library = "sas.fmts.sas7bdat", > 2: 'cmd' execution failed with error code 1 in: > shell(cmd, wait = TRUE, intern = output) The sas.get function in the Hmisc library is broken under Windows. Change line 127 from: status <- sys(paste(shQuote(sasprog), shQuote(sasin), "-log", shQuote(log.file)), output = FALSE) to: status <- system(paste(shQuote(sasprog), shQuote(sasin), "-log", shQuote(log.file))) I found this fix in the R-help archives, sorry, don't have the original to hand so I can't give proper attribution, but the fix is not due to me. But it does work for me. I believe Frank Harrell has been notified of the problem and the fix. Once patched and working correctly, the sas.get function in the Hmisc library is fantastic - thanks Frank! Tim C __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Datamining-package rattle() Errors
j.joshua thomas wrote: > Dear Group > > I have few errors while installing package rattle from CRAN > > i do the installing from the local zip files... > > I am using R 2.4.0 do i have to upgrade to R2.4.1 ? You *do* have to read the r-help posting guide and take exact heed of what it suggests: http://www.r-project.org/posting-guide.html Tim C > ~~ > > utils:::menuInstallLocal() > package 'rattle' successfully unpacked and MD5 sums checked > updating HTML package descriptions >> help(rattle) > No documentation for 'rattle' in specified packages and libraries: > you could try 'help.search("rattle")' >> library(rattle) > Rattle, Graphical interface for data mining using R, Version 2.2.0. > Copyright (C) 2006 [EMAIL PROTECTED], GPL > Type "rattle()" to shake, rattle, and roll your data. > Warning message: > package 'rattle' was built under R version 2.4.1 >> rattle() > Error in rattle() : could not find function "gladeXMLNew" > In addition: Warning message: > there is no package called 'RGtk2' in: library(package, lib.loc = lib.loc, > character.only = TRUE, logical = TRUE, >> local({pkg <- select.list(sort(.packages(all.available = TRUE))) > + if(nchar(pkg)) library(pkg, character.only=TRUE)}) >> update.packages(ask='graphics') > > > On 2/28/07, Roberto Perdisci <[EMAIL PROTECTED]> wrote: >> Hi, >> out of curiosity, what is the name of the package you found? >> >> Roberto >> >> On 2/27/07, j.joshua thomas <[EMAIL PROTECTED]> wrote: >>> Dear Group, >>> >>> I have found the package. >>> >>> Thanks very much >>> >>> >>> JJ >>> --- >>> >>> >>> On 2/28/07, j.joshua thomas <[EMAIL PROTECTED]> wrote: I couldn't locate package rattle? Need some one's help. JJ --- On 2/28/07, Daniel Nordlund <[EMAIL PROTECTED]> wrote: >> -Original Message- >> From: [EMAIL PROTECTED] [mailto: > [EMAIL PROTECTED] >> On Behalf Of j.joshua thomas >> Sent: Tuesday, February 27, 2007 5:52 PM >> To: r-help@stat.math.ethz.ch >> Subject: Re: [R] Datamining-package-? >> >> Hi again, >> The idea of preprocessing is mainly based on the need to prepare >> the > data >> before they are actually used in pattern extraction.or feed the >> data >> into EA's (Genetic Algorithm) There are no standard practice yet > however, >> the frequently used on are >> >> 1. the extraction of derived attributes that is quantities that > accompany >> but not directly related to the data patterns and may prove >> meaningful > or >> increase the understanding of the patterns >> >> 2. the removal of some existing attributes that should be of no > concern to >> the mining process and its insignificance >> >> So i looking for a package that can do this two above mentioned > points >> Initially i would like to visualize the data into pattern and > understand the >> patterns. >> >> > <<>> > > Joshua, > > You might take a look at the package rattle on CRAN for initially > looking at your data and doing some basic data mining. > > Hope this is helpful, > > Dan > > Daniel Nordlund > Bothell, WA, USA > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html< >> http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > -- Lecturer J. Joshua Thomas KDU College Penang Campus Research Student, University Sains Malaysia >>> >>> >>> -- >>> Lecturer J. Joshua Thomas >>> KDU College Penang Campus >>> Research Student, >>> University Sains Malaysia >>> >>> [[alternative HTML version deleted]] >>> >>> __ >>> R-help@stat.math.ethz.ch mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> > > > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RPy and the evil underscore
Alberto Vieira Ferreira Monteiro wrote: > I seems like I will join two threads :-) Please address RPy-specific questions to the Rpy mailing list, where they will be answered swiftly and without annoyance to everyone else on this general r-help mailing list. > Ok, RPy was installed (in Fedora Core 4, yum -y install rpy), and it > is running. However, I have a doubt, and the (meagre) documentation > doesn't seem to address it. > > In python, when I do this: > import rpy rpy.r.setwd("/mypath") rpy.r.source("myfile.r") > > Everything happens as expected. But now, there's > a problem if I try to use a function in myfile: > x = my_function(1) x = r.my_function(1) x = rpy.my_function(1) x = rpy.r.my_function(1) > > None of them work: the problem is that the _ is mistreated. > If the function has "." instead of "_", it works: > x = rpy.r.my_function(1) > > This is weird: I must write the R soutine with a ".", but then > rpy translates it to "_"! Object identifiers cannot begin with an underscore in R, but they can in Python. To avoid having to confusingly special-case this difference, the RPy designers elected to translate underscores in Python object names to dots in R object names. All this is clearly documented in the RPy manual at http://rpy.sourceforge.net/rpy/doc/rpy_html/R-objects-look-up.html#R-objects-look-up Tim C __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Google, hard disc drives and R
A recent paper from Google Labs, interesting in many respects, not the least the exclusive use of R for data analysis and graphics (alas not cited in the approved manner): http://labs.google.com/papers/disk_failures.pdf Perhaps some of the eminences grises of the R Foundation could prevail upon Google to make some the data reported in the paper available for inclusion in an R library or two, for pedagogical purposes? Tim C __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to speed up or avoid the for-loops in this example?
Marc Schwartz wrote: > OK, here is one possible solution, though perhaps with a bit more time, > there may be more optimal approaches. > > Using your example data above, but first noting that you do not want to > use: > > df <- data.frame(cbind(subject,year,event.of.interest)) > > Using cbind() first, creates a matrix and causes all columns to be > coerced to a common data type, obviating the benefit of data frames to > be able to handle multiple data types. Yes, quite right, the cbind() was unnecessary. I'm not making my real data frame that way, however. > So, now on to the solution: > > # First, order the data frame by increasing order of > # subject number and decreasing order for event.of.interest > # This ensures that these columns are properly sorted > # to facilitate the subsequent code. > > df <- df[order(df$subject, -df$event.of.interest), ] > > > So, 'df' will look like: > >> df >subject year event.of.interest > 21 1982 TRUE > 31 1996 TRUE > 11 1980 FALSE > 42 1985 FALSE > 52 1987 FALSE > 73 1991 TRUE > 93 1999 TRUE > 63 1990 FALSE > 83 1992 FALSE > 10 4 1972 TRUE > 11 4 1983 FALSE > > > # Now use the combinations of sapply(), rle(), seq() and unlist() to > # generate per subject sequences. Note that rle() returns: > # > # > rle(df$subject) > # Run Length Encoding > # lengths: int [1:4] 3 2 4 2 > # values : num [1:4] 1 2 3 4 > # > # See ?rle, ?seq, ?sapply and ?unlist > > df$subject.seq <- unlist(sapply(rle(df$subject)$lengths, > function(x) seq(x))) > > > So, 'df' now looks like: > >> df >subject year event.of.interest subject.seq > 21 1982 TRUE 1 > 31 1996 TRUE 2 > 11 1980 FALSE 3 > 42 1985 FALSE 1 > 52 1987 FALSE 2 > 73 1991 TRUE 1 > 93 1999 TRUE 2 > 63 1990 FALSE 3 > 83 1992 FALSE 4 > 10 4 1972 TRUE 1 > 11 4 1983 FALSE 2 > > > # Now set event.seq to all 0's > > df$event.seq <- 0 > > > So, 'df' now looks like: > >> df >subject year event.of.interest subject.seq event.seq > 21 1982 TRUE 1 0 > 31 1996 TRUE 2 0 > 11 1980 FALSE 3 0 > 42 1985 FALSE 1 0 > 52 1987 FALSE 2 0 > 73 1991 TRUE 1 0 > 93 1999 TRUE 2 0 > 63 1990 FALSE 3 0 > 83 1992 FALSE 4 0 > 10 4 1972 TRUE 1 0 > 11 4 1983 FALSE 2 0 > > > # Get the unique subject id's > # See ?unique > > subj.id <- unique(df$subject) > > > # Now get the indices for each subject where event.of.interest > # is TRUE. See ?which > > events <- sapply(subj.id, > function(x) which(df$subject == x & df$event.of.interest)) > > > So, 'events' looks like: > >> events > [[1]] > [1] 1 2 > > [[2]] > integer(0) > > [[3]] > [1] 6 7 > > [[4]] > [1] 10 > > > # Now use sapply() on the above list to create > # individual sequences per list element: > > seq <- sapply(events, function(x) seq(along = x)) > > > So 'seq' looks like: > >> seq > [[1]] > [1] 1 2 > > [[2]] > integer(0) > > [[3]] > [1] 1 2 > > [[4]] > [1] 1 > > > # So, for the final step, assign the event sequence values in 'seq' to > # the row indices in 'events': > > df$event.seq[unlist(events)] <- unlist(seq) > > > So, 'df' now looks like this: > >> df >subject year event.of.interest subject.seq event.seq > 21 1982 TRUE 1 1 > 31 1996 TRUE 2 2 > 11 1980 FALSE 3 0 > 42 1985 FALSE 1 0 > 52 1987 FALSE 2 0 > 73 1991 TRUE 1 1 > 93 1999 TRUE 2 2 > 63 1990 FALSE 3 0 > 83 1992 FALSE 4 0 > 10 4 1972 TRUE 1 1 > 11 4 1983 FALSE 2 0 > > > HTH, > > Marc SChwartz Wow, that's very trick, or tricky. It works but it is a bit slower and more complex than the Holtzman/Nielsen approach. But some interesting ides there which I shall bear in mind.
Re: [R] How to speed up or avoid the for-loops in this example?
jim holtman wrote: > On 2/14/07, Tim Churches <[EMAIL PROTECTED]> wrote: >> Any advice, tips, clues or pointers to resources on how best to speed up >> or, better still, avoid the loops in the following example code much >> appreciated. My actual dataset has several tens of thousands of rows and >> lots of columns, and these loops take a rather long time to run. >> Everything else which I need to do is done using vectors and those parts >> all run very quickly indeed. I spent quite a while doing searches on >> r-help and re-reading the various manuals, but couldn't find any >> existing relevant advice. I am sure the solution is obvious, but it >> escapes me. >> >> Tim C >> >> # create an example data frame, multiple events per subject >> >> year <- c(1980,1982,1996,1985,1987,1990,1991,1992,1999,1972,1983) >> event.of.interest <- c(F,T,T,F,F,F,T,F,T,T,F) >> subject <- c(1,1,1,2,2,3,3,3,3,4,4) >> df <- data.frame(cbind(subject,year,event.of.interest)) >> >> # add a per-subject sequence number >> >> df$subject.seq <- 1 >> for (i in 2:nrow(df)) { >> if (df$subject[i-1] == df$subject[i]) df$subject.seq[i] <- >> df$subject.seq[i-1] + 1 >> } >> df > > # add an event sequence number which is zero until the first >> # event of interest for that subject happens, and then increments >> # thereafter >> >> df$event.seq <- 0 >> for (i in 1:nrow(df)) { >> if (df$subject.seq[i] == 1 ) { >>current.event.seq <- 0 >> } >> if (event.of.interest[i] == 1 | current.event.seq > 0) >> current.event.seq <- current.event.seq + 1 >> df$event.seq[i] <- current.event.seq >> } >> df > > > > try: > >> df <- data.frame(cbind(subject,year,event.of.interest)) >> df <- do.call(rbind,by(df, df$subject, function(z){z$subject.seq <- > seq(nrow(z)); z})) >> df > subject year event.of.interest subject.seq > 1.11 1980 0 1 > 1.21 1982 1 2 > 1.31 1996 1 3 > 2.42 1985 0 1 > 2.52 1987 0 2 > 3.63 1990 0 1 > 3.73 1991 1 2 > 3.83 1992 0 3 > 3.93 1999 1 4 > 4.10 4 1972 1 1 > 4.11 4 1983 0 2 >> # determine first event >> df <- do.call(rbind, by(df, df$subject, function(x){ > + # determine first event > + .first <- cumsum(x$event.of.interest) > + # create sequence after first non-zero > + .first <- cumsum(.first > 0) > + x$event.seq <- .first > + x > + })) >> df >subject year event.of.interest subject.seq event.seq > 1.1.11 1980 0 1 0 > 1.1.21 1982 1 2 1 > 1.1.31 1996 1 3 2 > 2.2.42 1985 0 1 0 > 2.2.52 1987 0 2 0 > 3.3.63 1990 0 1 0 > 3.3.73 1991 1 2 1 > 3.3.83 1992 0 3 2 > 3.3.93 1999 1 4 3 > 4.4.10 4 1972 1 1 1 > 4.4.11 4 1983 0 2 2 Thanks Jim, that works a treat, over an order of magnitude faster than the for-loops. Anders Nielsen also provided this solution: df$subject.seq<-unlist(tapply(df$subject, df$subject, function(x)1:length(x) ) ) Doing it that way is about 5 times faster than using rbind(). But Jim's use of cumsum on the logical vector is very nifty. I have now combined Jim's function with Anders' column-oriented approach and the result is that my code now runs about two orders of magnitude faster. Many thanks, Tim C __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to speed up or avoid the for-loops in this example?
Any advice, tips, clues or pointers to resources on how best to speed up or, better still, avoid the loops in the following example code much appreciated. My actual dataset has several tens of thousands of rows and lots of columns, and these loops take a rather long time to run. Everything else which I need to do is done using vectors and those parts all run very quickly indeed. I spent quite a while doing searches on r-help and re-reading the various manuals, but couldn't find any existing relevant advice. I am sure the solution is obvious, but it escapes me. Tim C # create an example data frame, multiple events per subject year <- c(1980,1982,1996,1985,1987,1990,1991,1992,1999,1972,1983) event.of.interest <- c(F,T,T,F,F,F,T,F,T,T,F) subject <- c(1,1,1,2,2,3,3,3,3,4,4) df <- data.frame(cbind(subject,year,event.of.interest)) # add a per-subject sequence number df$subject.seq <- 1 for (i in 2:nrow(df)) { if (df$subject[i-1] == df$subject[i]) df$subject.seq[i] <- df$subject.seq[i-1] + 1 } df # add an event sequence number which is zero until the first # event of interest for that subject happens, and then increments # thereafter df$event.seq <- 0 for (i in 1:nrow(df)) { if (df$subject.seq[i] == 1 ) { current.event.seq <- 0 } if (event.of.interest[i] == 1 | current.event.seq > 0) current.event.seq <- current.event.seq + 1 df$event.seq[i] <- current.event.seq } df __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding error bars to a trellis barchart display
Chris Bergstresser wrote: > Hi all -- > >I'm using trellis to generate bar charts, but there's no built-in > function to generate error bars or confidence intervals, as far as I > can tell. I assumed I could just write my own panel function to add > them, so I searched the archive, and found a posting from the author > of the package stating "... placing multiple bars side by side needs > specialized calculations, which are done within panel.barchart. To add > bars to these, you will need to reproduce those calculations." >Just so I'm clear on this -- there's no capacity to add bars to the > plot, nor to find out the coordinates of the bars in the graphs > themselves. If you want them, you have to completely rewrite > panel.barchart. Is this correct? Are there really so few people > using error bars with bar charts? One of our projects does confidence intervals on bar cahrts produced using teh lattice library. It is quite feasible without too much effort - see: http://members.optusnet.com.au/tchur/NetEpi-Analysis-0-8-Screenshot-5.png Sorry I don't have time to extract the code which does this right now, but you can dissect it out yourself from the NetEpi-Analysis-0.8 tarball at http://sourceforge.net/project/showfiles.php?group_id=123700 - although the R code is embedded in Python classes, which might make extrication a bit more difficult (and which is why I don't have time to do it right now). But from memory the chunk of R code which overrides the default panel function is fairly self-contained and you should be able to identify it fairly easily - just grep the source code for likely strings such as "panel.barchart" to discover where it is. Other screenshots can be downloaded from http://sourceforge.net/project/showfiles.php?group_id=123700 if anyone is interested. Tim C __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] using GDD fonts
Luiz Rodrigo Tozzi wrote: > Hi > > I was searching for some X replacement for my job in R and i found the GDD > > I installed it and I match all the system requirements. My problem > (maybe a dumb one) is that every plot comes with no font and i cant > find a simgle example of a plot WITH FONT DETAIL in the list > > can anybody help me? > > a simple example: > > library(GDD) > GDD("something.png", type="png", width = 700, height = 500) > par(cex.axis=0.65,lab=c(12,12,0),mar=c(2.5, 2.5, 2.5, 2.5)) > plot(rnorm(100)) > mtext("Something",side=3,padj=-0.33,cex=1) > dev.off() > > thanks in advance! This might help - we found that we needed to install the MS TT fonts and make sure that GDD can find them, as per the README. : Simon Urbanek <[EMAIL PROTECTED]> wrote: > Tim, > > On Jun 9, 2005, at 3:51 AM, Tim CHURCHES wrote: > >> I tried GDD 0.1-7 with Lattice graphs in R 2.1.0 (on Linux). It >> doesn't segfault now but it is still not producing any usable output >> - the output png file is produced but nly with a few lines on it. >> Still the alpha channel problem? Have you been able to produce any >> Lattice graphs with it? > > I know of no such problem, I tested a few lattice graphics and they > worked. Can you, please, send me reproducible example and your output? > Also send me, please output of > library(GDD) > .Call("gdd_look_up_font", NULL) Sorry, my laziness. GDD was unable to find any fonts. After I installed the MS TT fonts and set their location as per the GDD README, it worked perfectly with both old-style R graphics and lattice graphics. The output looks very nice indeed. We'll do a bit more testing (and let you know if we find any problems), but it looks like we can at last drop the requirement for Xvfb when using R in a Web application. Great work! From our point of view, GDD solves one the biggest problem with R for Web applications. Cheers, Tim C __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Histogram over a Large Data Set (DB): How?
Eric Eide wrote: > "Sean" == Sean Davis <[EMAIL PROTECTED]> writes: > > Sean> Have you tried just grabbing the whole column using dbGetQuery? > Sean> Try doing this: > Sean> > Sean> spams <- dbGetQuery(con,"select unixtime from email limit > Sean> 100") > Sean> > Sean> Then increase from 1,000,000 to 1.5 million, to 2 million, etc. > Sean> until you break something (run out of memory), if you do at all. > > Yes, you are right. For the example problem that I posed, R can indeed > process > the entire query result in memory. (The R process grows to 240MB, though!) > > Sean> However, the BETTER way to do this, if you already have the data > Sean> in the database is to allow the database to do the histogram for > Sean> you. For example, to get a count of spams by day, in MySQL do > Sean> something like: [...] > > Yes, again you are right --- the particular problem that I posed is probably > better handled by formulating a more sophisticated SQL query. > > But really, my goal isn't to solve the the example problem that I posed --- > rather, it is to understand how people use R to process very large data sets. > The research project that I'm working on will eventually need to deal with > query results that cannot fit in main memory, and for which the built-in > statistical facilities of most DBMSs will be insufficient. > > Some of my colleagues have previously written their analyses "by hand," using > various scripting languages to read and process records from a DB in chunks. > Writing things in this way, however, can be tedious and error-prone. Instead > of taking this approach, I would like to be able to use existing statistics > packages that have the ability to deal with large datasets in good ways. > > So, I seek to understand the ways that people deal with these sorts of > situations in R. Your advice is very helpful --- one should solve problems in > the simplest ways available! --- but I would still like to understand the > harder cases, and how one can use "general" R functions in combination with > DBI's `dbApply' and `fetch' interfaces, which divide results into chunks. You might be interested in our project: "NetEpi Analysis", which aims to provide interactive exploratory data analysis and basic epidemiological analysis via both a Web front end and a Python programmatic API (forgive the redundancy in "programmatic API") for datests up to around 30 million rows (and as many columns as you like) on 32 bit platforms - hundreds of millions of rows should be feasible on 64-bit platforms. It stores data column-wise in memory-mapped on-disc arrays, and uses set operations on ordinal indexes to permit rapid subsetting and cross-tabulation of categorical (factored) data. It is written in Python, but uses R for graphics and some (but not all) statistical calculations (and for model fitting when we get round to providing facilities for same). See http://www.netepi.org - still in alpha, with an update coming out by December. Although it is aimed at epidemiological analysis (of large administrative health datasets), I dare say it might be useful for exploring large databases of spam too. Tim C __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] mid-p CIs for common odds ratio
mantelhaen.test() gives the exact conditional p-value (for independence) and confidence intervals (CIs)for the common odds ratio for a stratified 2x2 table. The epitools package by Tomas Aragon (available via CRAN) contains functions which use fisher.test() to calculate mid-p exact p-values and CIs for the CMLE odds ratio for a single 2x2 table. The mid-p p-value for independence for a stratified 2x2 table is easy to calculate using mantelhaen.test(), but can anyone suggest a method for calculation of mid-p CIs for the common odds ratio? A search in the usual places draws a blank (but I am sure someone will immediately prove me wrong on that point...). Thanks to Andy Dean (of Epi-Info fame), I have a copy of public domain Pascal code from 1991 by David Martin and Harland Austin which calculates mid-p CIs for the common odds ratio by finding polynomial roots. Before trying to replicate that code in R (or C), I was wondering if anyone could suggest a better or easier way? Tim C __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Efficient ways of finding functions and Breslow-Day test for homogeneity of the odds ratio
Marc Schwartz (via MN) <[EMAIL PROTECTED]> wrote: > > On Wed, 2005-10-19 at 06:47 +1000, Tim Churches wrote: > > Marc Schwartz (via MN) wrote: > > > > > There is also code for the Woolf test in ?mantelhaen.test > > > > Is there? How is it obtained? The documentation on mantelhaen.test in > R > > 2.2.0 contains a note: "Currently, no inference on homogeneity of the > > odds ratios is performed." and a quick scan of the source code for the > > function didn't reveal any meantion of Woolf's test. > > > > Tim C > > > Review the code in the examples on the cited help page... > > :-) OK, I see it now, thanks. Tim C > > HTH, > > Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Efficient ways of finding functions and Breslow-Day test for homogeneity of the odds ratio
Marc Schwartz (via MN) wrote: > There is also code for the Woolf test in ?mantelhaen.test Is there? How is it obtained? The documentation on mantelhaen.test in R 2.2.0 contains a note: "Currently, no inference on homogeneity of the odds ratios is performed." and a quick scan of the source code for the function didn't reveal any meantion of Woolf's test. Tim C __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Efficient ways of finding functions and Breslow-Day test for homogeneity of the odds ratio
MJ Price, Social Medicine wrote: > I have been trying to find a function to calculate the Breslow-Day test for > homogeneity of the odds ratio in R. I know the test can be preformed in SAS > but i was wondering if anyone could help me to perform this in r. I don't recall seeing the Breslow-Day test anywhere in an R package, but the VCD package (available via CRAN) has a function called woolf_test() to calculate Woolf's test for homogeneity of ORs. Tim C __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] running JPEG device on R 1.9.1 using xvfb-run on Linux
Prof Brian Ripley wrote: > On Wed, 12 Oct 2005, David Zhao wrote: > > >>Does anybody have experience in running jpeg device using xvfb-run on >>linux? I've been having sporadic problem with: /usr/X11/bin/xvfb-run >>/usr/bin/R --no-save < Rinput.txt, with error saying: error in X11 >>connection. Especially when I run it from a perl script. > > > Not sure what `xvfb-run on Linux' is, as it is not on my Linux (FC3). > If you Google it you will find a number of problems reported on Debian > lists. Here I would suspect timing. > > What I do is to run Xvfb on screen 5 by > > Xvfb :5 & > setenv DISPLAY :5 > > and do not have a problem with the jpeg() or png() devices. I do have a > problem with the rgl() package, but then I often do on-screen (on both 32- > and even more so 64-bit FC3). For R-embedded-in-Python (via RPy) on a Web server, we have been using a Python programme to automatically start Xvfb if it is not already running. You can find a copy of the programme in the NetEpi-Analysis tarball available at http://sourceforge.net/project/showfiles.php?group_id=123700 The tricky bit is managing the permissions for the Xvfb session, particularly in a Web server context - you need to take care. However, this use of Xvfb has been perfectly reliable (on Red Hat Enterprise Linux 2.1 and 3 with R2.0 and R 2.1) > >>Is there a better way of doing this? or how can I fix the problem. > > You really should update your R. Yes. We now use GDD, which is an alternative R graphics driver for raster graphics (Jpeg and PNG), available via CRAN. It allows R to directly generate jpeg and png files on a Linux or Unix machine without the need for an X server to be running (not even Xvfb). The quality of the output is also better than the standard R X11/png/jpeg graphics device due to the use of anti-aliased fonts by GDD. Earlier versions of GDD were a bit buggy, but so far we have found the latest version (0.1.7) to be fine. It is a bit fiddly to install all the libraries it requires as well as the recommended (no-cost) Microsoft TrueType fonts, but the effort is worth it. Many thanks to Simon Urbanek for his work on GDD. Tim C __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Leading in line-wrapped Lattice axis value and panel labels
Paul Murrell wrote: > Hi > > Deepayan Sarkar wrote: > > On 9/7/05, Tim Churches <[EMAIL PROTECTED]> wrote: > > > >> Version 2.1.1 Platforms: all > >> > >> What is the trellis parameter (or is there a trellis parameter) to > >> set the leading (the gap between lines) when long axis values > >> labels or panel header labels wrap over more than one line? By > >> default, there is a huge gap between lines, and much looking and > >> experimentation has not revealed to me a suitable parameter to > >> adjust this. > >> > > > > There is none. Whatever grid.text does happens. > > grid does have a "lineheight" graphical parameter. For example, > > library(grid) > grid.text("line one\nlinetwo", > x=rep(1:3/4, each=3), > y=rep(1:3/4, 3), > gp=gpar(lineheight=1:9/2)) > > Could you add this in relevant places in trellis.par Deepayan? > Is there a work around we could use in the meantime, or should we attempt to hack trellis.par as per Paul's suggestion (gulp!)? I suppose that is like asking "Should we attempt to climb Teichelmann?" - it depends... We have increased the depth of the panel headers, but this wastes plotting area and the tops of the tees and effs on the upper line and the bottoms of the gees and whys on the bottom line are still cut off, so large is the gap between the two lines. And increasing the panel header depth it doesn't help with y-axis labels - typically the second line of one label will abut the first line of the next label, giving a results which is rather like: Value - One Value - Two Value - Three Value - Four where the actual value labels are "Value One", "Value Two" etc and the "-" are the tick marks. Less than ideal. Suggestions for interim fixes (other than using abbreviated labels... we've thought of that) most welcome. Tim C __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Leading in line-wrapped Lattice value and panel labels
Version 2.1.1 Platforms: all What is the trellis parameter (or is there a trellis parameter) to set the leading (the gap between lines) when long axis values labels or panel header labels wrap over more than one line? By default, there is a huge gap between lines, and much looking and experimentation has not revealed to me a suitable parameter to adjust this. Tim C __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] The Perils of PowerPoint
(Ted Harding) wrote: >By the way, the Washington Post/Minneapolis Star Tribune article is >somewhat reminiscent of a short (15 min) broadcast on BBC Radio 4 >back on October 18 2004 15:45-16:00 called > > "Microsoft Powerpoint and the Decline of Civilisation" > >which explores similar themes and also frequently quotes Tufte. >Unfortunately it lapsed for ever from "Listen Again" after the >statutory week, so I can't point you to a replay. (However, I >have carefully preserved the cassette recording I made). > > Try http://sooper.org/misc/powerpoint.mp3 (copyright law notwithstanding...) Tim C __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Confidence interval bars on Lattice barchart with groups
I am trying to add confidence (error) bars to lattice barcharts (and dotplots, and xyplots). I found this helpful message from Deepayan Sarkar and based teh code below on it: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/50299.html However, I can't get it to work with groups, as illustrated. I am sure I am missing something elementary, but I am unsure what. Using R 2.1.1 on various platforms. I am aware of xYplot in the Hmisc library but would prefer to avoid any dependency on a non-core R library, if possible. Tim C ## # set up dummy test data testdata <- data.frame( dsr=c(1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9, 2,3,4,5,6,7,8,9,10,11,3,4,5,6,7,8,9,10,11,12), year=as.factor(c(1998,1998,1998,1998,1998,1998,1998,1998,1998,1998, 1999,1999,1999,1999,1999,1999,1999,1999,1999,1999, 2000,2000,2000,2000,2000,2000,2000,2000,2000,2000, 2001,2001,2001,2001,2001,2001,2001,2001,2001,2001)), geog_area=c('North','South','East','West','Middle', 'North','South','East','West','Middle', 'North','South','East','West','Middle', 'North','South','East','West','Middle', 'North','South','East','West','Middle', 'North','South','East','West','Middle', 'North','South','East','West','Middle', 'North','South','East','West','Middle'), sex=c('Male','Male','Male','Male','Male', 'Female','Female','Female','Female','Female', 'Male','Male','Male','Male','Male', 'Female','Female','Female','Female','Female', 'Male','Male','Male','Male','Male', 'Female','Female','Female','Female','Female', 'Male','Male','Male','Male','Male', 'Female','Female','Female','Female','Female'), age=c('Old','Old','Old','Old','Old', 'Young','Young','Young','Young','Young', 'Old','Old','Old','Old','Old', 'Young','Young','Young','Young','Young', 'Old','Old','Old','Old','Old', 'Young','Young','Young','Young','Young', 'Old','Old','Old','Old','Old', 'Young','Young','Young','Young','Young')) # add dummy lower and upper confidence limits testdata$dsr_ll <- testdata$dsr - 0.7 testdata$dsr_ul <- testdata$dsr + 0.5 # examine the test data testdata # check that a normal barchart with groups works OK - it does barchart(geog_area ~ dsr | year, testdata, groups=sex, origin = 0) # this works as expected, but not sure what teh error messages mean with(testdata,barchart(geog_area ~ dsr | year + sex, origin = 0, dsr_ul = dsr_ul, dsr_ll = dsr_ll, panel = function(x, y, ..., dsr_ll, dsr_ul, subscripts) { panel.barchart(x, y, subscripts, ...) dsr_ll <- dsr_ll[subscripts] dsr_ul <- dsr_ul[subscripts] panel.segments(dsr_ll, as.numeric(y), dsr_ul, as.numeric(y), col = 'red', lwd = 2)} )) # no idea what I am doing wrong here, but there is not one bar per group... something # to do with panel.groups??? with(testdata,barchart(geog_area ~ dsr | year, groups=sex, origin = 0, dsr_ul = dsr_ul, dsr_ll = dsr_ll, panel = function(x, y, ..., dsr_ll, dsr_ul, subscripts, groups) { panel.barchart(x, y, subscripts, groups, ...) dsr_ll <- dsr_ll[subscripts] dsr_ul <- dsr_ul[subscripts] panel.segments(dsr_ll, as.numeric(y), dsr_ul, as.numeric(y), col = 'red', lwd = 2)} )) ## __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Runnning R remotely
Laura Quinn wrote: I wasn't aware that it was possible to use postscript in the same fashion as png, eg: png(file,width=x,height=y,) image(map) text(text) title(title) box() dev.off() As there are a large number of iterations png has been working nicely (when not working remotely!), especially as it has proven easy to convery into gifs and then into movie gifs. Could anyone suggest an alternative approach in this case? Start an Xvfb (X11 virtual frame buffer) server in your remote ssh session. R will then use that as an X11 device to produce the PNG output. If you are running in a hostile network environment, consider using authentication and/or switching off network access to the Xvfb session - see the man pages for Xvfb. Xvfb is installed by default on most recent Linux distributions - if not, there should be an installable package available for it for your flavour of Linux. Tim C __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to make R faster?
ebashi wrote: Dear R users; I am using R for a project. I have some PHP forms that pass parameters to R for calculations, and publish the result in HTML format by CGIwithR. I'm using a Linux machine and every things work perfectly. However, it is too slow, it takes 5 to 10 seconds to run, and even if I start R from the Shell it takes the same amount of time, which is probably due to installing packages. My first question is that how can i make R run faster? and second if I am supposed to reduce the packages which are being loaded at initiation of R, how can I limit it to only the packages that i want? and third how can i make R not to get open each time, and let it sit on the server so that, when i pass something to it , i get result faster? Have a look at RSOAP, which does exactly what you suggest and allows you to commuicate with the R session via SOAP. I'm sure there are SOAP libraries available for PHP. See http://research.warnes.net/projects/rzope/rsoap/ Tim C __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Plotting with Statistics::R, Perl/R
Peter Dalgaard wrote: d) Use bitmap(). It requires a working Ghostscript install, but is otherwise much more convenient. Newer versions of Ghostscript have some quite decent antialiasing built into some of the png devices. Currently you need a small hack to pass the extra options to Ghostscript -- we should probably add a gsOptions argument in due course. This works for me on FC3 (Ghostscript 7.07): mybitmap(file="foo.png", type="png16m", gsOptions=" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 ") where mybitmap() is a modified bitmap() that just sticks the options into the command line. There are definitely better ways... [The antialiasing is not quite perfect. In particular, the axes stand out from the box around plots, presumably because an additive model is used (so that if you draw a line on top of itself, the result becomes darker). Also, text gets a little muddy at the default 9pt @ 72dpi, so you probably want to increase the pointsize or the resolution.] Apart from the significant quality issues which you mention, the other problem with using bitmap() in a Web server environment is the speed issue - it takes much longer to produce the output. Whether it takes too long depends on the users of your Web application, and how many simultaneous users there are. However, most users are more worried by the poor quality of the fonts in output produced by bitmap(). Tim C __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Plotting with Statistics::R, Perl/R
Joe Conway wrote: Dirk Eddelbuettel wrote: On Fri, Jan 21, 2005 at 06:06:45PM -0800, Leah Barrera wrote: I am trying to plot in R from a perl script using the Statistics::R package as my bridge. The following are the conditions: 0. I am running from a Linux server. Plotting certain formats requires the X11 server to be present as the font metrics for those formats can be supplied only the X11 server. Other drivers don;t the font metrics from X11 -- I think pdf is a good counterexample. When you run in 'batch' via a Perl script, you don't have the X11 server -- even though it may be on the machine and running, it is not associated with the particular session running your Perl job. There are two common fixes: a) if you must have png() as a format, you can start a virtual X11 server with the xvfb server -- this is a bit involved, but doable; Attached is an init script I use to start up xvfb on Linux. HTH, Joe #!/bin/bash # # syslogStarts Xvfb. # # # chkconfig: 2345 12 88 # description: Xvfb is a facility that applications requiring an X frame buffer \ # can use in place of actually running X on the server # Source function library. . /etc/init.d/functions [ -f /usr/X11R6/bin/Xvfb ] || exit 0 XVFB="/usr/X11R6/bin/Xvfb :5 -screen 0 1024x768x16" RETVAL=0 umask 077 start() { echo -n $"Starting Xvfb: " $XVFB& RETVAL=$? echo_success echo [ $RETVAL = 0 ] && touch /var/lock/subsys/Xvfb return $RETVAL } stop() { echo -n $"Shutting down Xvfb: " killproc Xvfb RETVAL=$? echo [ $RETVAL = 0 ] && rm -f /var/lock/subsys/Xvfb return $RETVAL } restart() { stop start } case "$1" in start) start ;; stop) stop ;; restart|reload) restart ;; condrestart) [ -f /var/lock/subsys/Xvfb ] && restart || : ;; *) echo $"Usage: $0 {start|stop|restart|condrestart}" exit 1 esac exit $RETVAL Hmm, the only problem with that is that, if I am not mistaken, you are starting Xvfb without any authentication, and I am told by people who know about such things that in the context of an Internet-accessible Web server, having an X server accepting unauthenticated connections is not a good idea. In other, less hostile environments, it might be OK. Maybe such concerns are unreasonable paranoia, but my motto is better safe than sorry when it comes to Internet-facing servers. I think there are also other switches you can pass to Xvfb to stop it listening on various TCP/IP ports etc. Tim C __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Plotting with Statistics::R, Perl/R
Dirk Eddelbuettel wrote: Plotting certain formats requires the X11 server to be present as the font metrics for those formats can be supplied only the X11 server. Other drivers don;t the font metrics from X11 -- I think pdf is a good counterexample. When you run in 'batch' via a Perl script, you don't have the X11 server -- even though it may be on the machine and running, it is not associated with the particular session running your Perl job. There are two common fixes: a) if you must have png() as a format, you can start a virtual X11 server with the xvfb server -- this is a bit involved, but doable; An example of a Python programme which manages the starting of an Xvfb server when one is required can be found in the xvfb_spawn.py file /SOOMv0 directory of the tarball for NetEpi Analysis, which can be downloaded by following the links at http://www.netepi.org xvfb_spawn.py was written for use with RPy, which is a Python-to-R bridge, when used in a Web server setting (hence no X11 display server available). It should be possible to translate the programme to Perl, or to write somethig similar in Perl. Comments in the code note some potential security traps for the unwary. Hopefully one day the dependency of the R raster graphics devices on an X11 server will be removed. R on Win32 doesn't have that dependency (but then, Windows machines, even servers, have displays running all the time as part of their kernel, and who would wish that on other operating system?). However, there are several graphics back-ends which produce very high quality raster graphics on POSIX platforms without the need for an X11 device to be present - Agg ("Anti-grain geometry", see http://www.antigrain.com/) and Cairo (see http://cairographics.org/) spring to mind (usually disclaimers about the foregoing comments not meaning to seem like ingratitude to the R development team etc apply). Tim C __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] SAS or R software
Shawn Way wrote: I've seen multiple comments about MS Excel's precision and accuracy. Can you please point me in the right direction in locating information about these? As always, Google is your friend, but see for example http://www.nwpho.org.uk/sadb/Poisson%20CI%20in%20spreadsheets.pdf Tim C __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Lattice graph with segments
Andrew Robinson wrote: Ruud, try something like the following (not debugged, no coffee yet): xyplot(coupon.period ~ median, data=prepayment, subscripts=T, panel=function(x,y,subscripts,...){ panel.xyplot(x,y) panel.segments(deel1$lcl[subscripts], deel$ucl[subscripts]) } ) Andrew Robinson wrote: > Ruud, > > try something like the following (not debugged, no coffee yet): > > > xyplot(coupon.period ~ median, data=prepayment, > subscripts=T, > panel=function(x,y,subscripts,...){ >panel.xyplot(x,y) >panel.segments(deel1$lcl[subscripts], deel$ucl[subscripts]) > } > ) > Not quite: library(lattice) prepayment <- data.frame(median=c(10.89,12.54,10.62,8.46,7.54,4.39), ucl=c(NA,11.66,9.98,8.05,7.27,4.28), lcl=c(14.26,13.34,11.04,8.72,7.90,4.59), coupon.period=c('a','b','c','d','e','f')) xyplot(coupon.period ~ median, data=prepayment, subscripts=T, panel=function(x,y,subscripts,...){ panel.xyplot(x,y) panel.segments(prepayment$lcl[subscripts], prepayment$ucl[subscripts]) } ) throws the error: Error in max(length(x0), length(x1), length(y0), length(y1)) : Argument "x1" is missing, with no default Tim C __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How about a mascot for R?
Damian Betebenner wrote: R users, How come R doesn't have a mascot? Perhaps someone with artistic flair could create a mascot based on this image? It would help to give newcomers to R-help the right idea: http://www.accesscom.com/~alvaro/alien/thepics/ripley1__.jpg Tim C signature.asc Description: PGP signature __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Unable to understand strptime() behaviour
R V2.0.1 on Windows XP. I have read the help pages on strptime() over and over, but can't understand why strptime() is producing the following results. > v <- format("2002-11-31", format="%Y-%m-%d") > v [1] "2002-11-31" > factor(v, levels=v) [1] 2002-11-31 Levels: 2002-11-31 > x <- strptime("2002-11-31", format="%Y-%m-%d") > x [1] "2002-12-01" > factor(x, levels=x) [1] Levels: 2002-12-01 NA NA NA NA NA NA NA NA Tim C __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: Re: [R] draft of posting guide. Sorry.
A.J. Rossini <[EMAIL PROTECTED]> wrote: > However, the amount (and quality) of > (freely-available, at least for the cost of download, which might not > be free) documentation for R is simply incredible. The closest that > I've seen, for freely available languages, is Python, for actual > quality of documentation. The Python documentation is truly excellent, but I agree, the R documentation is even better. Sometimes the R help is a bit terse, but that simply means that one has to think a bit to work out what is meant, but I have never found it to be insufficient. Tim C __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: RE: [R] R and Memory
Mulholland, Tom <[EMAIL PROTECTED]> wrote: > > I would suggest that you make a more thorough search of the > R-Archives. > (http://finzi.psych.upenn.edu/search.html) If you do you will find > this > discussion has been had several times and that the type of machine > you > are running will have an impact upon what you can do. My feeling is > that > you are going have to knuckle down with the documentation and > understand > how R works and then when you have specific issues that show you have > read all the appropriate documentation, you might try another message > to > the list. > > Ciao, Tom Another approach is to not try to bring all your data into R at once - it is unlikely that you actually need every column of every row in your dataset to undertake any particular analysis. The trick is to bring into R only those rows and columns which you need at a particular moment, and then discard them. The best way to do this is to manage your data in an SQl database such as MySQL or PostgreSQL, and then use one of the R database interfaces to issue queries against this database and to surface the query results as a data frame. Remeber to compose your queries in such as way as to only retreive the rows and columns you really need at any particular moment, and don't forget to delete these data frames as soon as you have finished with them (or at least, as soon as you need more space in your R session). There is also an (experimental I think) package which allows lazy or virtual loading of database queries into data frames, so that the query results are paged into memory as they are needed. But I doubt you will need that. Tim C > > _ > > Tom Mulholland > Senior Policy Officer > WA Country Health Service > Tel: (08) 9222 4062 > > The contents of this e-mail transmission are confidential and may be > protected by professional privilege. The contents are intended only > for > the named recipients of this e-mail. If you are not the intended > recipient, you are hereby notified that any use, reproduction, > disclosure or distribution of the information contained in this > e-mail > is prohibited. Please notify the sender immediately. > > > -Original Message- > From: Edward McNeil [mailto:[EMAIL PROTECTED] > Sent: Tuesday, 2 December 2003 8:45 AM > To: [EMAIL PROTECTED] > Subject: [R] R and Memory > > > Dear all, > This is my first post. > We have started to use R here and have also started teaching it to > our > PhD students. Our unit will be the HQ for developing R throughout > Thailand. > > I would like some help with a problem we are having. We have one > sample > of data that is quite large in fact - over 2 million records (ok ok > it's > more like a population!). The data is stored in SPSS. The file is > over > 350Mb but SPSS happily stores this much data. Now when I try to read > it > into R it grunts and groans for a few seconds and then reports that > there is not enough memory (the computer has 250MB RAM). I have tried > setting the memory in the command line (--max-vsize and > --max-mem-size) > but all to no avail. > > Any help would be muchly appreciated! > > Edward McNeil (son of Don) > Epidemiology Unit > Faculty of Medicine > Prince of Songkhla University > Hat Yai 90110 > THAILAND > > __ > [EMAIL PROTECTED] mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > __ > [EMAIL PROTECTED] mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] SAS transport files and the foreign package
On Sat, 2003-01-18 at 07:45, Frank E Harrell Jr wrote: > I had no idea how strange the XPORT format really is. Like the fact that the IBM double precision representation used in XPORT uses 7 bits for the exponent and 56 bits for the mantissa, whereas IEEE format uses 11 bits for the exponent and 52 bits for the mantissa. > Following Duncan Temple Lang's suggestion I am contacting one of our > clients to see what they think about moving towards XML for this. > My guess is that XML will take a while to be used routinely for > this and that the sometimes huge datasets involved will cause XML > files to be monstrous (compression will help but will tax memory > usage of R at least temporarily during processing). The nice things about the SAS XML engine are: a) all the metadata associated with a dataset is included in the generated XML file, including not just the names of the formats for each variable (column), but the actual format value labels themselves. b) more than one dataset can be included in a single generated XML export file c) like the XPORT format, close to foolproof from the SAS user's point of view, because the SAS XML engine does all the work. The generated files are indeed huge (relative to the amount of actual data they contain). For our purposes, this is not likely to be a huge problem - we select and/or summarise data in SAS, and then pass the subset or summary set to R. At the moment, we are experimenting with parsing the SAS XML files with Python and then passing the data to R via RPy (the Python-to-R bridge) - mainly because I am slightly more adept at writing Python than R. However, the ability of R to read SAS XML files directly, and to set up categorical SAS variables which have formats as factor columns in R data.frames, would be fabulous. Tim C __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] calling R from python (fwd)
Agustin Lobo wrote: > > A question for a (experienced) user of the RPython package on > linux. > > I'm trying to call R from python on a linux (Suse 7.3) box. Since you are calling R from Python, you could try Walter Moreira's excellent RPy module. I found it much easier to install than RSPython (provided you follow Waletr's instructions), and it has been very reliable. It is also very efficient at converting Numeric Python arrays to R, and has a very easy to use object model - much nicer than RSPython's. See http://rpy.sourceforge.net Of course, RSPython can also call Python from R, which RPy can't do. Tim C __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help