[R] write a png inside a pdf for large graphics?
I routinely write graphics into multi-page PDFs, but some graphics (i.e. plots of large spatial datasets using levelplot()) can result in enormous files. I'm curious if there is a better way. For example: #First, make some data: library(lattice) d=expand.grid(x=1:1000,y=1:1000) d$z=rnorm(nrow(d)) #Now, the PDF. The following produces a PDF that's ~50MB. pdf(width=11,height=8.5,file="test1.pdf") levelplot(z~x*y,data=d) dev.off() #If you write the same graphic to a png with reasonable resolution, the file size is ~500k: png(width=1024,height=768,file="test1.png") levelplot(z~x*y,data=d) dev.off() # I would prefer to embed a png (or other raster format) inside a PDF directly from R. # Is this possible? I'm looking for some way to achieve something like the following (of course this doesn't work): pdf(width=11,height=8.5,file="test1.pdf") png(width=1024,height=768,file="current device") levelplot(z~x*y,data=d) dev.off() dev.off() Of course the PDF preserves vector scalability, but there are times it's not worth the extra file size. And you can write out the png's as separate files and then merge them with imagemagick or ghostscript. I currently get around this by writing the graphics to a potentially very large (>>100MB) PDF, then use ghostscript to convert *only* the large pages of the pdf to png and put it back together as a PDF (a function I wrote for this is described here: http://planetflux.adamwilson.us/2010/06/shrinking-rs-pdf-output.html). I'm curious if there is a way to do it directly by instructing R to write a png and embed it within the already open PDF device. Any ideas? Adam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Out of memory crash on 64-bit Fedora
Greetings all, First of all, thanks to all of you for creating such a useful, powerful program. I regularly work with very large datasets (several GB) in R in 64-bit Fedora 8 (details below). I'm lucky to have 16GB RAM available. However if I am not careful and load too much into R's memory, I can crash the whole system. There does not seem to be a check in place that will stop R from trying to allocate all available memory (including swap space). I have system status plots in my task bar, which I can watch to see when all the ram is taken and R then reserves all the swap space. If I don't kill the R process before the swap hits 100%, it will freeze the machine. I don't know if this is an R problem or a Fedora problem (I suppose the kernal should be killing R before it crashes, but shouldn't R stop before it takes all the memory?). To replicate this behavior, I can crash the system by allocating more and more memory in R: v1=matrix(nrow=1e5,ncol=1e4) v2=matrix(nrow=1e5,ncol=1e4) v3=matrix(nrow=1e5,ncol=1e4) v4=matrix(nrow=1e5,ncol=1e4) etc. until R claims all RAM and swap space, and crashes the machine. If I try this on a windows machine eventually the allocation fails with an error in R, " Error: cannot allocate vector of size XX MB". This is much preferable to crashing the whole system. Why doesn't this happen in Linux? Is there some setting that will prevent this? I've looked though the archives and not found a similar problem. Thanks for any help. Adam The facts: > sessionInfo() R version 2.8.0 (2008-10-20) x86_64-redhat-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base > version _ platform x86_64-redhat-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 2 minor 8.0 year 2008 month 10 day20 svn rev46754 language R version.string R version 2.8.0 (2008-10-20) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R in the News
Greetings all, This isn't a request for help, but I thought the following article from the New York Times would be of interest to you all. Enjoy! Adam *Data Analysts Captivated by R's Power* By ASHLEE VANCE Published ONLINE: January 6, 2009 URL: http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html A version of this article appeared in print edition on January 7, 2009, on page B6 R is also the name of a popular programming language used by a growing number of data analysts inside corporations and academia. It is becoming their lingua franca partly because data mining has entered a golden age, whether being used to set ad prices, find new drugs more quickly or fine-tune financial models. Companies as diverse as Google, Pfizer, Merck, Bank of America, the InterContinental Hotels Group and Shell use it. But R has also quickly found a following because statisticians, engineers and scientists without computer programming skills find it easy to use. "R is really important to the point that it's hard to overvalue it," said Daryl Pregibon, a research scientist at Google, which uses the software widely. "It allows statisticians to do very intricate and complicated analyses without knowing the blood and guts of computing systems." It is also free. R is an open-source program, and its popularity reflects a shift in the type of software used inside corporations. Open-source software is free for anyone to use and modify. I.B.M., Hewlett-Packard and Dell make billions of dollars a year selling servers that run the open-source Linux operating system, which competes with Windows from Microsoft. Most Web sites are displayed using an open-source application called Apache, and companies increasingly rely on the open-source MySQL database to store their critical information. Many people view the end results of all this technology via the Firefox Web browser, also open-source software. R is similar to other programming languages, like C, Java and Perl, in that it helps people perform a wide variety of computing tasks by giving them access to various commands. For statisticians, however, R is particularly useful because it contains a number of built-in mechanisms for organizing data, running calculations on the information and creating graphical representations of data sets. Some people familiar with R describe it as a supercharged version of Microsoft's Excel spreadsheet software that can help illuminate data trends more clearly than is possible by entering information into rows and columns. What makes R so useful and helps explain its quick acceptance is that statisticians, engineers and scientists can improve the software's code or write variations for specific tasks. Packages written for R add advanced algorithms, colored and textured graphs and mining techniques to dig deeper into databases. Close to 1,600 different packages reside on just one of the many Web sites devoted to R, and the number of packages has grown exponentially. One package, called BiodiversityR, offers a graphical interface aimed at making calculations of environmental trends easier. Another package, called Emu, analyzes speech patterns, while GenABEL is used to study the human genome. The financial services community has demonstrated a particular affinity for R; dozens of packages exist for derivatives analysis alone. "The great beauty of R is that you can modify it to do all sorts of things," said Hal Varian, chief economist at Google. "And you have a lot of prepackaged stuff that's already available, so you're standing on the shoulders of giants." R first appeared in 1996, when the statistics professors Ross Ihaka and Robert Gentleman of the Universityof Auckland in New Zealand released the code as a free software package. According to them, the notion of devising something like R sprang up during a hallway conversation. They both wanted technology better suited for their statistics students, who needed to analyze data and produce graphical models of the information. Most comparable software had been designed by computer scientists and proved hard to use. Lacking deep computer science training, the professors considered their coding efforts more of an academic game than anything else. Nonetheless, starting in about 1991, they worked on R full time. "We were pretty much inseparable for five or six years," Mr. Gentleman said. "One person would do the typing and one person would do the thinking." Some statisticians who took an early look at the software considered it rough around the edges. But despite its shortcomings, R immediately gained a following with people who saw the possibilities in customizing the free software. John M. Chambers, a former Bell Labs researcher who is now a consulting professor of statistics at StanfordUniversity, was an early champion. At Bell Labs, Mr. Chambers had helped develop S, another statistics software project, which was meant to give researchers of all stripes an acce
[R] Compiling msm on Fedora Core Linux
Greetings all, I'm trying to install the msm package and it is failing on compilation. The problem seems to be the analyticp component? Any advice on how to get it to work? error message is below. I'm running R version 2.6.2 (2008-02-08) x86_64-redhat-linux-gnu on a dell precision 690 with Fedora Core 8. Thanks, Adam > install.packages("msm",lib="/usr/lib64/R/library") trying URL 'http://cran.mirrors.hoobly.com/src/contrib/msm_0.8.tar.gz' Content type 'application/x-tar' length 577213 bytes (563 Kb) opened URL == downloaded 563 Kb * Installing *source* package 'msm' ... ** libs gcc -m64 -std=gnu99 -I/usr/include/R -I/usr/include/R -I/usr/local/include-fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -c analyticp.c -o analyticp.o In file included from analyticp.c:9: msm.h:3:15: error: R.h: No such file or directory msm.h:4:26: error: R_ext/Applic.h: No such file or directory analyticp.c: In function 'AnalyticP': analyticp.c:51: warning: implicit declaration of function 'Calloc' analyticp.c:51: error: expected expression before 'double' analyticp.c:51: warning: cast to pointer from integer of different size analyticp.c:52: error: expected expression before 'double' analyticp.c:52: warning: cast to pointer from integer of different size analyticp.c:63: warning: implicit declaration of function 'error' analyticp.c:91: warning: implicit declaration of function 'Free' make: *** [analyticp.o] Error 1 ERROR: compilation failed for package 'msm' ** Removing '/usr/lib64/R/library/msm' The downloaded packages are in /tmp/RtmpaoUZiC/downloaded_packages Updating HTML index of packages in '.Library' Warning messages: 1: In install.packages("msm", lib = "/usr/lib64/R/library") : installation of package 'msm' had non-zero exit status 2: In tools:::unix.packages.html(.Library) : cannot create HTML package index > -- Adam Wilson http://hydrodictyon.eeb.uconn.edu/people/wilson/ Department of Ecology and Evolutionary Biology BioPharm 223 University of Connecticut Tel: 860.486.4157 [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] need automake/autoconf help to build RnetCDF and ncdf packages
if test "${USR_LOCAL_NETCDF_H}" = TRUE; then NETCDF_INCDIR="/usr/local/include" NETCDF_LIBDIR="/usr/local/lib" NETCDF_LIBNAME="netcdf" HAVE_NETCDF_H=TRUE elif test "${HAVE_NETCDF_H}" = FALSE; then AC_CHECK_FILE(/usr/include/netcdf.h, [USR_NETCDF_H=TRUE], [USR_NETCDF_H=FALSE]) if test "${USR_NETCDF_H}" = TRUE; then NETCDF_INCDIR="/usr/include" NETCDF_LIBDIR="/usr/lib" NETCDF_LIBNAME="netcdf" HAVE_NETCDF_H=TRUE fi fi else NETCDF_INCDIR="${NETCDF_PATH}/include" NETCDF_LIBDIR="${NETCDF_PATH}/lib" NETCDF_LIBNAME="netcdf" AC_CHECK_FILE(${NETCDF_INCDIR}/netcdf.h, [INCDIR_NETCDF_H=TRUE], [INCDIR_NETCDF_H=FALSE]) if test "${INCDIR_NETCDF_H}" = TRUE; then HAVE_NETCDF_H=TRUE fi fi I've tried fiddling around in this, and then typing #autoconf configure.ac > newconfigure sh ./newconfigure But it always ends the same: checking for main in -lnetcdf... no : error: netcdf library not found So, is there somebody here who know how configure scripts ought to be written to accomodate this? -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas -- Adam Wilson http://hydrodictyon.eeb.uconn.edu/people/wilson/ Department of Ecology and Evolutionary Biology BioPharm 223 University of Connecticut Tel: 860.486.4157 [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple Umacs example help..
Greetings, I've been playing with the umacs package for a few days and have worked out an example of a simple linear regression using gibbs samplers (included below). While extremely basic, I hope this might be helpful. I would love to see more examples of MH sampling as well. ## ## Simple linear regression using UMacs package ##load libraries library(Umacs);library(rv);library(MASS);library(mvtnorm) ##simulate data n=500 b1 = 10 b2 = 6 beta = matrix(c(b1,b2),2,1) sig = 4 x=cbind(1,rnorm(n,seq(1,20,length=n),sd=1)) y=matrix(rnorm(n,x%*%beta,sqrt(sig)),n,1) #priors on beta and beta variance b.prior = as.matrix(ncol=1,nrow=2,c(0,0)) v.prior = solve(diag(1000,2)) s1.prior = .1 # gamma prior s2.prior = .1 #set up the initial values and define the dimension of each variable b.init <- function () as.matrix(rnorm(2,0,1)) # b[1] is intercept, b[2] is slope v.init <- function () rnorm(1,0,1)^2# error #define the conditional probabilities for the gibbs sampler for the betas and sigma b.update <- function (){ #From Clark text "Models for Ecological Data" ISBN 0-691-12178-8 (page 192) and Clark workbook (pg 81) # the If() statements allows this function to be used for a full covariance matrix or just a single variance parameter if(length(v) == 1) { sx = crossprod(x) * 1/v sy = crossprod(x,y) * 1/v } if(length(v) > 1) { ss = t(x) %*% 1/v sx = ss %*% x sy = ss %*% y } bigv = solve(sx + v.prior) smallv = sy + v.prior %*% b.prior b = t(rmvnorm(1,bigv%*%smallv,bigv)) return(b) } v.update <- function () { sx = crossprod((y - x%*%b)) u1 = s1.prior + 0.5*n u2 = s2.prior + 0.5*sx return(1/rgamma(1,u1,u2)) } s <- Sampler( .title = "Linear Regression", x = x, y = y, b = Gibbs (b.update, b.init), v = Gibbs (v.update, v.init), Trace("b[1]"), Trace("b[2]"), Trace("v") ) s(n.iter=2000) #note original values are (approximately) returned ## Message: 48 Date: Wed, 31 Oct 2007 12:15:51 -0400 From: Ted Hart <[EMAIL PROTECTED]> Subject: [R] Simple Umacs example help.. To: r-help@r-project.org Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain Hello all... I am just starting to teach myself Bayesian methods, and am interested in learning how to use UMacs. I've read the documentation, but the single example is a bit over my head at the level I am at right now. I was wondering if anyone has any simple examples they'd like to share. I've successfully done a couple of simple gibbs examples, but have had a hard time modifying some of the home written metropolis hastings samplers I've made to work with Umacs. Does anyone have any pointers or simple 2 parameter examples? Thanks. Here is one of my simple MH samplers using a simple linear regression with a Cauchy error term. x <- c(1.808,1.799,1.179,0.574,3.727,0.946,3.719,1.566,3.596,3.253) y <- c(1.816,1.281,-1.382,0.573,3.793,0.935,1.775,1.474,3.679,3.889) fn = function(x,a=0,b=1){ a+b*x } sample.ab <-function(x,y,a,b,s,da,db){ bstar = runif(1,b-db,b+db) astar = runif(1,a-da,a+da) logalpha = sum(dcauchy(y,location=fn(x,astar,bstar),scale=s,log=T) - dcauchy(y,location=fn(x,a,b),scale=s,log=T)) logu = log(runif(1,0,1)) acc = (logu < logalpha) b = acc*bstar + (1-acc)*b a = acc*astar + (1-acc)*a list(b=b,a=a,acc=acc) } samples = function(x,y,a,b,s,ds){ sstar = runif(1,s-ds,s+ds) while(sstar <= 0){ sstar = runif(1,s-ds,s+ds) } logalpha = sum( dcauchy(y,location=fn(x,a,b),scale=sstar,log=T) - dcauchy(y,location=fn(x,a,b),scale=s,log=T)) - log(sstar/s) logu = log(runif(1,0,1)) acc = (logu < logalpha) s = acc*sstar + (1-acc)*s list(s=s,accs=acc) } sample.abs<-function(n=1,x,y,a=0,b=1,s=2,da=.2,db =.2,ds=1) { accab <- 0 accs <- 0 A = B = S = rep(NaN,n) for(i in 1:n){ z = sampleab(x,y,a,b,s,da,db) q <- samples(x,y,a,b,s,ds) A[i] = a = z$a B[i] = b = z$b S[i]=s=q$s accab = accab + z$acc accs <- accs +q$accs } invisible(list(a=A, b=B, s=S, accab=accab/n,accs=accs/n)) } Cheers, Ted Dept. of Biology, University of Vermont [[alternative HTML version deleted]] -- Adam Wilson http://hydrodictyon.eeb.uconn.edu/people/wilson/ Department of Ecology and Evolutionary Biology BioPharm 223 University of Connecticut Tel: 860.486.4157 [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RMySQL NA/NULL value storage error
Greetings all, I am running R 2.5.1, RMySQL 0.6 , and DBI 0.2-3 on Windows XP Like others, I am having trouble with NA/Null value conversions between R and a MySQL database via DBI, but I could not find my exact problem in the archives. Most of the time NA values in R get transferred correctly to the database and back again, unless (apparently) if they are in the last column of the table being saved. For example: > test=dbConnect(MySQL(), user="myname", host="localhost", dbname="test")# Connect to database > set.seed(1);x=data.frame(matrix(round(rnorm(16),1)*10,nrow=4)); x[c(1,4),c(2,4)]=NA; x #generate a table with some missing values X1 X2 X3 X4 1 -6 NA 6 NA 2 2 -8 -3 -22 3 -8 5 15 11 4 16 NA 4 NA > If I write that to the database and read it back: > dbWriteTable(test,"x",x,overwrite=T,row.names=F) [1] TRUE >dbGetQuery(test,"SELECT * FROM x") X1 X2 X3 X4 1 -6 NA 6 0 2 2 -8 -3 -22 3 -8 5 15 11 4 16 NA 4 0 The NAs in column 2 are successfully transferred, but the NAs in column 4 are changed to zeros. If I add another column and repeat: > set.seed(1);x=data.frame(matrix(round(rnorm(20),1)*10,nrow=4)); x[c(1,4),c(2,4)]=NA; x X1 X2 X3 X4 X5 1 -6 NA 6 NA 0 2 2 -8 -3 -22 9 3 -8 5 15 11 8 4 16 NA 4 NA 6 > >dbWriteTable(test,"x",x,overwrite=T,row.names=F) [1] TRUE >dbGetQuery(test,"SELECT * FROM x") X1 X2 X3 X4 X5 1 -6 NA 6 NA 0 2 2 -8 -3 -22 9 3 -8 5 15 11 8 4 16 NA 4 NA 6 > > The NAs in column 4 are maintained. The pattern continues if I add more NAs in the rightmost column, regardless of how many columns there are. Any ideas as to what is going on? Is this a bug? I did look at the table stored in the MySQL database via Toad and the inner columns correctly have {NULL} values in the NA fields, but the rightmost column has zeros. So it seems that the problem occurs when the data is written to the database and not when it is retrieved. Thanks for any help, Adam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.