[R] write a png inside a pdf for large graphics?

2012-04-23 Thread Adam Wilson
I routinely write graphics into multi-page PDFs, but some graphics (i.e.
plots of large spatial datasets using levelplot()) can result in enormous
files.  I'm curious if there is a better way.  For example:

#First, make some data:
library(lattice)
d=expand.grid(x=1:1000,y=1:1000)
d$z=rnorm(nrow(d))

#Now, the PDF.  The following produces a PDF that's ~50MB.
pdf(width=11,height=8.5,file="test1.pdf")
levelplot(z~x*y,data=d)
dev.off()

#If you write the same graphic to a png with reasonable resolution, the
file size is ~500k:
png(width=1024,height=768,file="test1.png")
levelplot(z~x*y,data=d)
dev.off()

#  I would prefer to embed a png (or other raster format) inside a PDF
directly from R.
#  Is this possible?  I'm looking for some way to achieve something like
the following (of course this doesn't work):
pdf(width=11,height=8.5,file="test1.pdf")
 png(width=1024,height=768,file="current device")
 levelplot(z~x*y,data=d)
 dev.off()
dev.off()


Of course the PDF preserves vector scalability, but there are times it's
not worth the extra file size.  And you can write out the png's as separate
files and then merge them with imagemagick or ghostscript.  I currently get
around this by writing the graphics to a potentially very large (>>100MB)
PDF, then use ghostscript to convert *only* the large pages of the pdf to
png and put it back together as a PDF (a function I wrote for this is
described here:
http://planetflux.adamwilson.us/2010/06/shrinking-rs-pdf-output.html).

I'm curious if there is a way to do it directly by instructing R to write a
png and embed it within the already open PDF device.  Any ideas?

Adam

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Out of memory crash on 64-bit Fedora

2009-03-27 Thread Adam Wilson
Greetings all,

First of all, thanks to all of you for creating such a useful, powerful
program.

I regularly work with very large datasets (several GB) in R in 64-bit Fedora
8 (details below).  I'm lucky to have 16GB RAM available.  However if I am
not careful and load too much into R's memory, I can crash the whole
system.  There does not seem to be a check in place that will stop R from
trying to allocate all available memory (including swap space).  I have
system status plots in my task bar, which I can watch to see when all the
ram is taken and R then reserves all the swap space. If I don't kill the R
process before the swap hits 100%, it will freeze the machine.  I don't know
if this is an R problem or a Fedora problem (I suppose the kernal should be
killing R before it crashes, but shouldn't R stop before it takes all the
memory?).

To replicate this behavior, I can crash the system by allocating more and
more memory in R:
v1=matrix(nrow=1e5,ncol=1e4)
v2=matrix(nrow=1e5,ncol=1e4)
v3=matrix(nrow=1e5,ncol=1e4)
v4=matrix(nrow=1e5,ncol=1e4)

etc. until R claims all RAM and swap space, and crashes the machine.  If I
try this on a windows machine eventually  the allocation fails with an error
in R, " Error: cannot allocate vector of size XX MB".  This is much
preferable to crashing the whole system.  Why doesn't this happen in Linux?

Is there some setting that will prevent this?  I've looked though the
archives and not found a similar problem.

Thanks for any help.

Adam



The facts:
> sessionInfo()
R version 2.8.0 (2008-10-20)
x86_64-redhat-linux-gnu

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base
> version
   _
platform   x86_64-redhat-linux-gnu
arch   x86_64
os linux-gnu
system x86_64, linux-gnu
status
major  2
minor  8.0
year   2008
month  10
day20
svn rev46754
language   R
version.string R version 2.8.0 (2008-10-20)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R in the News

2009-01-07 Thread Adam Wilson
Greetings all,

This isn't a request for help, but I thought the following article from the
New York Times would be of interest to you all.

Enjoy!

Adam



*Data Analysts Captivated by R's Power*
By ASHLEE VANCE
Published ONLINE: January 6, 2009
URL:
http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html
A version of this article appeared in print edition on January 7, 2009, on
page B6

R is also the name of a popular programming language used by a growing
number of data analysts inside corporations and academia. It is becoming
their lingua franca partly because data mining has entered a golden age,
whether being used to set ad prices, find new drugs more quickly or
fine-tune financial models. Companies as diverse as Google, Pfizer, Merck,
Bank of America, the InterContinental Hotels Group and Shell use it.

But R has also quickly found a following because statisticians, engineers
and scientists without computer programming skills find it easy to use.

"R is really important to the point that it's hard to overvalue it," said
Daryl Pregibon, a research scientist at Google, which uses the software
widely. "It allows statisticians to do very intricate and complicated
analyses without knowing the blood and guts of computing systems."

It is also free. R is an open-source program, and its popularity reflects a
shift in the type of software used inside corporations. Open-source software
is free for anyone to use and modify. I.B.M., Hewlett-Packard and Dell make
billions of dollars a year selling servers that run the open-source Linux
operating system, which competes with Windows from Microsoft. Most Web sites
are displayed using an open-source application called Apache, and companies
increasingly rely on the open-source MySQL database to store their critical
information. Many people view the end results of all this technology via the
Firefox Web browser, also open-source software.

R is similar to other programming languages, like C, Java and Perl, in that
it helps people perform a wide variety of computing tasks by giving them
access to various commands. For statisticians, however, R is particularly
useful because it contains a number of built-in mechanisms for organizing
data, running calculations on the information and creating graphical
representations of data sets.

Some people familiar with R describe it as a supercharged version of
Microsoft's Excel spreadsheet software that can help illuminate data trends
more clearly than is possible by entering information into rows and columns.

What makes R so useful — and helps explain its quick acceptance — is that
statisticians, engineers and scientists can improve the software's code or
write variations for specific tasks. Packages written for R add advanced
algorithms, colored and textured graphs and mining techniques to dig deeper
into databases.

Close to 1,600 different packages reside on just one of the many Web sites
devoted to R, and the number of packages has grown exponentially. One
package, called BiodiversityR, offers a graphical interface aimed at making
calculations of environmental trends easier.

Another package, called Emu, analyzes speech patterns, while GenABEL is used
to study the human genome.

The financial services community has demonstrated a particular affinity for
R; dozens of packages exist for derivatives analysis alone.

"The great beauty of R is that you can modify it to do all sorts of things,"
said Hal Varian, chief economist at Google. "And you have a lot of
prepackaged stuff that's already available, so you're standing on the
shoulders of giants."

R first appeared in 1996, when the statistics professors Ross Ihaka and
Robert Gentleman of the Universityof Auckland in New Zealand released the
code as a free software package.

According to them, the notion of devising something like R sprang up during
a hallway conversation. They both wanted technology better suited for their
statistics students, who needed to analyze data and produce graphical models
of the information. Most comparable software had been designed by computer
scientists and proved hard to use.

Lacking deep computer science training, the professors considered their
coding efforts more of an academic game than anything else. Nonetheless,
starting in about 1991, they worked on R full time. "We were pretty much
inseparable for five or six years," Mr. Gentleman said. "One person would do
the typing and one person would do the thinking."

Some statisticians who took an early look at the software considered it
rough around the edges. But despite its shortcomings, R immediately gained a
following with people who saw the possibilities in customizing the free
software.

John M. Chambers, a former Bell Labs researcher who is now a consulting
professor of statistics at StanfordUniversity, was an early champion. At
Bell Labs, Mr. Chambers had helped develop S, another statistics software
project, which was meant to give researchers of all stripes an acce

[R] Compiling msm on Fedora Core Linux

2008-04-23 Thread Adam Wilson
Greetings all,

I'm trying to install the msm package and it is failing on compilation.  The
problem seems to be the analyticp component?  Any advice on how to get it to
work?

error message is below.  I'm running R version 2.6.2 (2008-02-08)
x86_64-redhat-linux-gnu  on a dell precision 690 with Fedora Core 8.

Thanks,
Adam


> install.packages("msm",lib="/usr/lib64/R/library")
trying URL 'http://cran.mirrors.hoobly.com/src/contrib/msm_0.8.tar.gz'
Content type 'application/x-tar' length 577213 bytes (563 Kb)
opened URL
==
downloaded 563 Kb

* Installing *source* package 'msm' ...
** libs
gcc -m64 -std=gnu99 -I/usr/include/R -I/usr/include/R
-I/usr/local/include-fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
-fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic
-c analyticp.c -o analyticp.o
In file included from analyticp.c:9:
msm.h:3:15: error: R.h: No such file or directory
msm.h:4:26: error: R_ext/Applic.h: No such file or directory
analyticp.c: In function 'AnalyticP':
analyticp.c:51: warning: implicit declaration of function 'Calloc'
analyticp.c:51: error: expected expression before 'double'
analyticp.c:51: warning: cast to pointer from integer of different size
analyticp.c:52: error: expected expression before 'double'
analyticp.c:52: warning: cast to pointer from integer of different size
analyticp.c:63: warning: implicit declaration of function 'error'
analyticp.c:91: warning: implicit declaration of function 'Free'
make: *** [analyticp.o] Error 1
ERROR: compilation failed for package 'msm'
** Removing '/usr/lib64/R/library/msm'

The downloaded packages are in
/tmp/RtmpaoUZiC/downloaded_packages
Updating HTML index of packages in '.Library'
Warning messages:
1: In install.packages("msm", lib = "/usr/lib64/R/library") :
  installation of package 'msm' had non-zero exit status
2: In tools:::unix.packages.html(.Library) :
  cannot create HTML package index
>






-- 
Adam Wilson
http://hydrodictyon.eeb.uconn.edu/people/wilson/
Department of Ecology and Evolutionary Biology
BioPharm 223
University of Connecticut
Tel: 860.486.4157
[EMAIL PROTECTED]

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] need automake/autoconf help to build RnetCDF and ncdf packages

2008-03-13 Thread Adam Wilson
 if test "${USR_LOCAL_NETCDF_H}" = TRUE; then
 NETCDF_INCDIR="/usr/local/include"
 NETCDF_LIBDIR="/usr/local/lib"
 NETCDF_LIBNAME="netcdf"
 HAVE_NETCDF_H=TRUE
 elif test "${HAVE_NETCDF_H}" = FALSE; then
 AC_CHECK_FILE(/usr/include/netcdf.h,
 [USR_NETCDF_H=TRUE], [USR_NETCDF_H=FALSE])
 if test "${USR_NETCDF_H}" = TRUE; then
 NETCDF_INCDIR="/usr/include"
 NETCDF_LIBDIR="/usr/lib"
 NETCDF_LIBNAME="netcdf"
 HAVE_NETCDF_H=TRUE
 fi
 fi
else
 NETCDF_INCDIR="${NETCDF_PATH}/include"
 NETCDF_LIBDIR="${NETCDF_PATH}/lib"
 NETCDF_LIBNAME="netcdf"
 AC_CHECK_FILE(${NETCDF_INCDIR}/netcdf.h,
 [INCDIR_NETCDF_H=TRUE], [INCDIR_NETCDF_H=FALSE])
 if test "${INCDIR_NETCDF_H}" = TRUE; then
 HAVE_NETCDF_H=TRUE
 fi

fi

I've tried fiddling around in this, and then typing

#autoconf configure.ac > newconfigure

sh ./newconfigure

But it always ends the same:

checking for main in -lnetcdf... no
: error: netcdf library not found

So, is there somebody here who know how configure scripts ought to be
written to accomodate this?



--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas





-- 
Adam Wilson
http://hydrodictyon.eeb.uconn.edu/people/wilson/
Department of Ecology and Evolutionary Biology
BioPharm 223
University of Connecticut
Tel: 860.486.4157
[EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple Umacs example help..

2007-11-07 Thread Adam Wilson
Greetings,

I've been playing with the umacs package for a few days and have worked out
an example of a simple linear regression using gibbs samplers (included
below).  While extremely basic, I hope this might be helpful.  I would love
to see more examples of MH sampling as well.

##

## Simple linear regression using UMacs package

##load libraries
library(Umacs);library(rv);library(MASS);library(mvtnorm)

##simulate data
 n=500
 b1 = 10
 b2 = 6
 beta = matrix(c(b1,b2),2,1)
 sig = 4
 x=cbind(1,rnorm(n,seq(1,20,length=n),sd=1))
 y=matrix(rnorm(n,x%*%beta,sqrt(sig)),n,1)

#priors on beta and beta variance
b.prior = as.matrix(ncol=1,nrow=2,c(0,0))
v.prior = solve(diag(1000,2))
s1.prior = .1 # gamma prior
s2.prior = .1

#set up the initial values and define the dimension of each variable
b.init <- function () as.matrix(rnorm(2,0,1))   # b[1] is
intercept, b[2] is slope
v.init <- function () rnorm(1,0,1)^2# error

#define the conditional probabilities for the gibbs sampler for the betas
and sigma

b.update <- function (){
 #From Clark text "Models for Ecological Data" ISBN 0-691-12178-8 (page
192) and Clark workbook (pg 81)
 # the If() statements allows this function to be used for a full
covariance matrix or just a single variance parameter
 if(length(v) == 1) {
  sx = crossprod(x) * 1/v
  sy = crossprod(x,y) * 1/v
  }
 if(length(v) > 1) {
  ss = t(x) %*% 1/v
  sx = ss %*% x
  sy = ss %*% y
 }
 bigv = solve(sx + v.prior)
 smallv = sy + v.prior %*% b.prior
 b = t(rmvnorm(1,bigv%*%smallv,bigv))
 return(b)
}
v.update <- function () {
 sx = crossprod((y - x%*%b))
 u1 = s1.prior + 0.5*n
 u2 = s2.prior + 0.5*sx
 return(1/rgamma(1,u1,u2))
}

s <- Sampler(
 .title = "Linear Regression",
 x = x,
 y = y,
 b = Gibbs (b.update, b.init),
  v = Gibbs (v.update, v.init),
  Trace("b[1]"),
  Trace("b[2]"),
  Trace("v")
 )
s(n.iter=2000)
#note original values are (approximately) returned

##
Message: 48
Date: Wed, 31 Oct 2007 12:15:51 -0400
From: Ted Hart <[EMAIL PROTECTED]>
Subject: [R] Simple Umacs example help..
To: r-help@r-project.org
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain

Hello all...

I am just starting to teach myself Bayesian methods, and am
interested in learning how to use UMacs.  I've read the
documentation, but the single example is a bit over my head at the
level I am at right now.  I was wondering if anyone has any simple
examples they'd like to share.  I've successfully done a couple of
simple gibbs examples, but have had a hard time modifying some of the
home written metropolis hastings samplers I've made to work with
Umacs.  Does anyone have any pointers or simple 2 parameter
examples?  Thanks.

Here is one of my simple MH samplers using a simple linear regression
with a Cauchy error term.

x <- c(1.808,1.799,1.179,0.574,3.727,0.946,3.719,1.566,3.596,3.253)
y <- c(1.816,1.281,-1.382,0.573,3.793,0.935,1.775,1.474,3.679,3.889)

fn = function(x,a=0,b=1){
   a+b*x
}


sample.ab <-function(x,y,a,b,s,da,db){
bstar = runif(1,b-db,b+db)
astar = runif(1,a-da,a+da)

logalpha = sum(dcauchy(y,location=fn(x,astar,bstar),scale=s,log=T) -
dcauchy(y,location=fn(x,a,b),scale=s,log=T))
logu = log(runif(1,0,1))
acc = (logu < logalpha)
b = acc*bstar + (1-acc)*b
a = acc*astar + (1-acc)*a
list(b=b,a=a,acc=acc)
}


samples = function(x,y,a,b,s,ds){

   sstar = runif(1,s-ds,s+ds)
   while(sstar <= 0){
   sstar = runif(1,s-ds,s+ds)
   }
   logalpha = sum( dcauchy(y,location=fn(x,a,b),scale=sstar,log=T) -
dcauchy(y,location=fn(x,a,b),scale=s,log=T)) - log(sstar/s)
   logu = log(runif(1,0,1))
   acc = (logu < logalpha)
   s = acc*sstar + (1-acc)*s
   list(s=s,accs=acc)
}
sample.abs<-function(n=1,x,y,a=0,b=1,s=2,da=.2,db =.2,ds=1)
{

accab  <- 0
accs <- 0
A = B = S = rep(NaN,n)
for(i in 1:n){
z = sampleab(x,y,a,b,s,da,db)
q <- samples(x,y,a,b,s,ds)
A[i] = a = z$a
B[i] = b = z$b
S[i]=s=q$s
accab = accab + z$acc
accs <- accs +q$accs


}
invisible(list(a=A, b=B, s=S, accab=accab/n,accs=accs/n))
}



Cheers,
Ted

Dept. of Biology,
University of Vermont



   [[alternative HTML version deleted]]





-- 
Adam Wilson
http://hydrodictyon.eeb.uconn.edu/people/wilson/
Department of Ecology and Evolutionary Biology
BioPharm 223
University of Connecticut
Tel: 860.486.4157
[EMAIL PROTECTED]

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RMySQL NA/NULL value storage error

2007-09-27 Thread Adam Wilson
Greetings all,

I am running R 2.5.1, RMySQL 0.6 , and DBI 0.2-3 on Windows XP

Like others, I am having trouble with NA/Null value conversions between R
and a MySQL database via DBI, but I could not find my exact problem in the
archives.  Most of the time NA values in R get transferred correctly to the
database and back again, unless (apparently) if they are in the last column
of the table being saved.

For example:


>   test=dbConnect(MySQL(), user="myname", host="localhost",
dbname="test")# Connect to database
>   set.seed(1);x=data.frame(matrix(round(rnorm(16),1)*10,nrow=4));
x[c(1,4),c(2,4)]=NA;   x #generate a table with some missing values

  X1 X2 X3  X4
1 -6 NA  6  NA
2  2 -8 -3 -22
3 -8  5 15  11
4 16 NA  4  NA
>

If I write that to the database and read it back:

>  dbWriteTable(test,"x",x,overwrite=T,row.names=F)
[1] TRUE

>dbGetQuery(test,"SELECT * FROM x")
  X1 X2 X3  X4
1 -6 NA  6   0
2  2 -8 -3 -22
3 -8  5 15  11
4 16 NA  4   0


The NAs in column 2 are successfully transferred, but the NAs in column 4
are changed to zeros.  If I add another column and repeat:

>   set.seed(1);x=data.frame(matrix(round(rnorm(20),1)*10,nrow=4));
x[c(1,4),c(2,4)]=NA;   x
  X1 X2 X3  X4 X5
1 -6 NA  6  NA  0
2  2 -8 -3 -22  9
3 -8  5 15  11  8
4 16 NA  4  NA  6
>
>dbWriteTable(test,"x",x,overwrite=T,row.names=F)
[1] TRUE

>dbGetQuery(test,"SELECT * FROM x")

  X1 X2 X3  X4 X5
1 -6 NA  6  NA  0
2  2 -8 -3 -22  9
3 -8  5 15  11  8
4 16 NA  4  NA  6
>
>

The NAs in column 4 are maintained.


The pattern continues if I add more NAs in the rightmost column, regardless
of how many columns there are.  Any ideas as to what is going on?  Is this a
bug?  I did look at the table stored in the MySQL database via Toad and the
inner columns correctly have {NULL} values in the NA fields, but the
rightmost column has zeros.  So it seems that the problem occurs when the
data is written to the database and not when it is retrieved.

Thanks for any help,

Adam

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.